Closed derickl closed 2 years ago
Assigning to @craig-landry
I'll follow up with Gareth and Hareet. Regarding timing, everyone is on CHT 4.0 / Archv3 right now. If this is going to need a bunch of effort it'll come at the expense of that effort. If we can keep it small and simple though I don't think it'll be too tough.
@derickl - can you provide more information about exactly what data points the MoH wants to know?
Is this truly log analysis or is there a specific question about user activity that MoH wants to be known, regardless where the source of data is?
Finally, how will the data be consumed? Will the output of a grep
call be satisfactory, or does this need to be viewed in a Klipfolio or Superset dashboard?
Something to note here is that user activity auditing (In the traditional sense of the word) is tricky in an offline-first application such as ours. I think we'll probably have to educate them on this because most of the activity happens on the client side without having to send requests to the back end.
I think there are two parts to this ticket.
I think the first one falls into the support dashboard. I think it could be achieved via grep
within the log folder if they know what they are looking for.
The second one is more of an engineering task that would require listing candidates, evaluating them, and choosing a winner based on that evaluation.
Excellent points, thanks @henokgetachew !
Once we hear back from @derickl on what's needed, we can see what logical next steps are.
To kick things off, assuming output of grep
is a valid solution to this ticket, I explored how to capture some POST
calls which represent logins. While the rest of the app usage can be hidden from the server because of offline first architecture, logins must happen online:
First figure the name of your HA proxy container:
$ docker ps --format="{{.Names}}"|grep -i haprox
helper_test_haproxy_1
You can find successful logins with this docker logs
call and grep
looking for 200
s:
docker logs helper_test_haproxy_1 |grep "200,POST,/_session,-"
Mar 29 22:09:06 ce16e6d1f508 haproxy[25]: 172.18.0.3,200,POST,/_session,-,medic,'{"name":"medic","password":"***"}',403,1,46,'-'
Mar 29 22:09:46 ce16e6d1f508 haproxy[25]: 172.18.0.3,200,POST,/_session,-,medic,'{"name":"medic","password":"***"}',403,1,46,'-'
Mar 29 22:09:49 ce16e6d1f508 haproxy[25]: 172.18.0.3,200,POST,/_session,-,medic,'{"name":"medic","password":"***"}',403,1,46,'-'
Mar 29 22:10:38 ce16e6d1f508 haproxy[25]: 172.18.0.3,200,POST,/_session,-,foobar,'{"name":"foobar","password":"***"}',402,1,44,'-'
To find failed login attempts, you would look for a 401
instead of a 200
:
docker logs helper_test_haproxy_1 |grep "401,POST,/_session,-"
Mar 30 14:28:09 ce16e6d1f508 haproxy[25]: 172.18.0.3,401,POST,/_session,-,medic,'{"name":"jane","password":"***"}',390,1,67,'-'
Mar 30 14:28:21 ce16e6d1f508 haproxy[25]: 172.18.0.3,401,POST,/_session,-,medicd,'{"name":"jane","password":"***"}',390,1,67,'-'
Mar 30 14:28:23 ce16e6d1f508 haproxy[25]: 172.18.0.3,401,POST,/_session,-,medicd,'{"name":"jane","password":"***"}',390,1,67,'-'
Mar 30 14:28:35 ce16e6d1f508 haproxy[25]: 172.18.0.3,401,POST,/_session,-,medicd,'{"name":"jane","password":"***"}',390,1,67,'-'
If you want to query the CHT about a specific user, say who jane
who couldn't log in, you can use curl
plus jq
to get their UUID, phone number and role via the API:
curl -s https://medic:password@192-168-68-17.my.local-ip.co:8443/api/v1/users | jq '.[] |select(.username=="jane") | .username, .rev, .contact.phone, .contact.role'
"jane"
"1-695a58c8fd902f4eae06aae63edbe0b8"
"+254712345678"
"chw"
NB - if you want check all POST
as the docker logs
call may not return all of them, you can use an exec
call to grep the log file directly:
docker exec -it helper_test_haproxy_1 grep POST /srv/storage/audit/haproxy.log|grep "200,POST,/_session,-"
@mrjones-plip regarding https://github.com/medic/cht-infrastructure/issues/18#issuecomment-1082375785 (really apologise for the delay in getting back to you)
MOH wants to feel confident that they can tell what actions users took in the system. From the last call, they would need usernames, IP addresses and the actions that said users took.
Looking at https://github.com/medic/cht-infrastructure/issues/18#issuecomment-1083320237, this is too high level and doesn't quite capture what happened.
@henokgetachew from your comment here, CHT is being used to run a health system and government needs to be confident they can track changes to the data in the system. We need to have a slightly better solution than 'grep within the log folder if they know what they are looking for' - they are not CHT experts
Thanks for the feedback @derickl! Can you clarify what you mean by "actions that said users took"? Login, logout, every form submitted, edited or deleted? Or more abstracted, like "how many household visits"? Let's assume it is any time a document is synced to couch and then we'd use the name of the document as the "activity" (but let me know if I'm wrong!)
Because we're offline first a trio of "ip/username/action" may not be possible. If a CHW logs in once on WiFi, goes offline, then does 30 household visits and creates 60 docs and then syncs via cellular data, which IP should use for those actions? I think it would be helpful to educate MoH on how CHT works offline to set expectations.
I suggest a dashboard showing a high level table of users and their aggregated activity. This data is a mashup of HAProxy logs as well as couch2pg data in postgres:
User | Logins | Actions | Last Seen | Last IP |
---|---|---|---|---|
Lisa | 22 | 343 | 2 Feb 2022 | 192.168.1.1 |
Ann | 1 | 642 | 3 Feb 2022 | 10.0.1.1 |
Christina | 2 | 343 | 1 Feb 2022 | 2345:0425:2CA1::0567:5673:23b5 |
Clicking a row would give you a chronological list of activities based on document names as seen in postgres:
Item | Date | Detail |
---|---|---|
Login | 2 Feb 2022 | 192.168.1.1 |
Document | 3 Feb 2022 | Register Pregnancy |
Document | 3 Feb 2022 | Death Report |
Document | 4 Feb 2022 | U5 Checkup |
Login | 5 Feb 2022 | 110.0.1.1 |
Thanks for the follow up @mrjones-plip
Login, logout, every form submitted, edited or deleted?
This is closer to what would be needed. Assuming you wanted to review what we log and be able to tell what happened, what would you be looking for? We need to approach this with some empathy and try to understand where they are coming from. At this point, having the ability to audit is what they have as a need. We haven't gotten deeper on that ask but as I mentioned in the previous thread, they wanted to tie actins to users (and IP if possible). Web servers / proxies do log this. Right?
If a CHW logs in once on WiFi, goes offline, then does 30 household visits and creates 60 docs and then syncs via cellular data, which IP should use for those actions?
Interesting question. What do we currently log?
I think it would be helpful to educate MoH on how CHT works offline to set expectations.
Are you able to summarise this in a way that can be shared to MoH and also highlight our gaps in auditing and how it ties back to offline first? It would be great to highlight what we can and can't do and also call it out in our docs.
I suggest a dashboard showing a high level table of users and their aggregated activity. This data is a mashup of HAProxy logs as well as couch2pg data in postgres:
Would you be open to helping build out a proof of concept for this?
NB - this ticket is in a public repo, so all of this ticket is public
Web servers / proxies do log this. Right?
Yup! They're very literal though: when you connect, they log a GET
or a POST
and the IP. They don't know if who you are, that's the the job behind the proxy (CHT). We'd have to do some more work to join the actions together with usernames.
Interesting question. What do we currently log?
what it would log is the IP for when you did the bulk upload, encompassing many in the field forms you created over time and space. Here's a log entry of my offline user logging in, going offline for ~20 min and then bulk uploading some docs after being offline while creating them:
Apr 6 22:02:05 3c9e67d5b8be haproxy[26]: 172.18.0.3,200,POST,/_session,-,mrjones,'{"name":"mrjones","password":"***"}',405,2,45,'-'
Apr 6 22:24:04 3c9e67d5b8be haproxy[26]: 172.18.0.3,201,POST,/medic/_bulk_docs,-,mrjones,'{"docs":[{"form":"pregnancy_facility_visit_reminder","type":"data_record","content_type":"xml", [DATA-TRUNCATED]
The HAProxy log, having looked at it, is meant to record EVERYTHING, down to the raw data of each form submitted. It is, however, accordingly verbose and non-trivial to parse.
Are you able to summarise this in a way that can be shared to MoH and also highlight our gaps in auditing and how it ties back to offline first? It would be great to highlight what we can and can't do and also call it out in our docs. Would you be open to helping build out a proof of concept for this?
Yes, I'd be open to helping and to summarize offline functionality as needed. some fun discussion on this! I think it'd be helpful to go over this real time - I'll try and schedule some time with you!
After chatting with @garethbowen about this, we think the best approach for an MVP is to add 3 key pieces of information to the docs site in response to this ticket:
After this is completed, we can work with any interested deployments and MoHs to see if this is sufficient to meet their audit needs in a self hosted scenario.
I'm out of the office until Apr 18th at which point I'll resume working on this!
Awesome - thanks @derickl - I'll respond to that post you cited today with a similarly high level response: use HAProxy, don't use mutable couchdb, look to use a web friendly too like kibana, be very careful with PII/PIH. Then for this ticket and the forum post, I'll come back with some best practices after testing in our Kibana deployment to give some actionable steps.
There still may yet be some additional docs to publish around this!
Closing this with a link to the forum post outlining a solution for this. Feel free to re-open as needed!
We have a likely upcoming request from MoH UG, on summarising access to the system. They are keen to audit user activity. Given we haven't hooked up logtrail to it, what options or recommendations would you have towards this end?