medic / care-teams

For Product Management
0 stars 0 forks source link

Run SQL verification scripts for MoH Nepal (NSSD) apdex testing #65

Closed michaelkohn closed 2 months ago

michaelkohn commented 3 months ago

Once Apdex testing scripts executed on NSSD configuration #37 is complete, do the following 👇

latin-panda commented 3 months ago

Blocked: DBs aren't syncing (Slack)

latin-panda commented 3 months ago

Apdex from 27th to 30th

SQL Query ```sql WITH apdex_telemetry_data AS ( SELECT substring(metric from '^(.*):apdex:') AS event_category, CASE WHEN metric LIKE '%:satisfied' THEN 'satisfied' WHEN metric LIKE '%:tolerable' THEN 'tolerable' WHEN metric LIKE '%:frustrated' THEN 'frustrated' END AS event_type, SUM(count) AS event_count FROM useview_telemetry_metrics WHERE metric LIKE '%:apdex:%'AND period_start BETWEEN '2024-03-27' AND '2024-03-30' GROUP BY event_category, event_type ), apdex_scores AS ( SELECT event_category, SUM(CASE WHEN event_type = 'satisfied' THEN event_count ELSE 0 END) AS satisfied_count, SUM(CASE WHEN event_type = 'tolerable' THEN event_count ELSE 0 END) AS tolerable_count, SUM(CASE WHEN event_type = 'frustrated' THEN event_count ELSE 0 END) AS frustrated_count, SUM(event_count) AS total_event_count FROM apdex_telemetry_data GROUP BY event_category ) SELECT event_category, satisfied_count, tolerable_count, frustrated_count, ROUND(((satisfied_count + (tolerable_count / 2.0)) / total_event_count)::numeric, 2) AS apdex_score FROM apdex_scores ORDER BY apdex_score asc; ```

Image

latin-panda commented 3 months ago

This script was adapted based on my understanding of the test cases planned here + saving a task form.

Dates: 27th March to 30th March (taken from this comment) Total days: 4

The query has all the keys of the telemetry we expect from the automation tests.

SQL for Verification ```sql WITH constants (days, start_date, end_date) AS (VALUES (4, '2024-03-27', '2024-03-30')), apdex_metrics AS( SELECT substring(metric FROM '^(.*):apdex:') AS metric, SUM(COUNT) AS COUNT FROM useview_telemetry_metrics, constants WHERE metric LIKE '%:apdex:%' AND period_start BETWEEN constants.start_date::DATE AND constants.end_date::DATE GROUP BY metric ), apdex_result as ( SELECT metric, CASE WHEN apdex_metrics.metric = 'contact_list:load' THEN 30 * constants.days WHEN apdex_metrics.metric = 'contact_list:query' THEN 40 * constants.days WHEN apdex_metrics.metric = 'contact_detail:c70_fchv_area:load' THEN 10 * constants.days WHEN apdex_metrics.metric = 'contact_detail:c80_household:load' THEN 20 * constants.days WHEN apdex_metrics.metric = 'contact_detail:c82_person:load' THEN 20 * constants.days WHEN apdex_metrics.metric = 'enketo:contacts:form:contact:c82_person:create:add:render' THEN 10 * constants.days WHEN apdex_metrics.metric = 'enketo:contacts:form:contact:c82_person:create:add:save' THEN 10 * constants.days WHEN apdex_metrics.metric = 'enketo:contacts:pregnancy:add:render' THEN 10 * constants.days WHEN apdex_metrics.metric = 'enketo:contacts:pregnancy:add:save' THEN 10 * constants.days WHEN apdex_metrics.metric = 'report_list:load' THEN 10 * constants.days WHEN apdex_metrics.metric = 'report_list:query' THEN 10 * constants.days WHEN apdex_metrics.metric = 'report_detail:pregnancy:load' THEN 10 * constants.days WHEN apdex_metrics.metric = 'tasks:load' THEN 20 * constants.days WHEN apdex_metrics.metric = 'tasks:refresh' THEN 10 * constants.days WHEN apdex_metrics.metric = 'enketo:tasks:pregnancy_home_visit:add:render' THEN 10 * constants.days WHEN apdex_metrics.metric = 'enketo:tasks:mental_health_referral_follow_up:add:render' THEN 10 * constants.days WHEN apdex_metrics.metric = 'enketo:tasks:mental_health_referral_follow_up:add:save' THEN 10 * constants.days WHEN apdex_metrics.metric = 'message_list:load' THEN 10 * constants.days WHEN apdex_metrics.metric = 'messages_detail:load' THEN 10 * constants.days -- WHEN apdex_metrics.metric = 'analytics:targets:load' THEN 10 * constants.days Nepal doesn't have this tab WHEN apdex_metrics.metric = 'boot_time' THEN 10 * constants.days END AS expected_count, SUM(apdex_metrics.count) AS actual_count FROM apdex_metrics, constants GROUP BY apdex_metrics.metric, constants.days ) SELECT apdex_result.metric, apdex_result.expected_count, apdex_result.actual_count, CASE WHEN apdex_result.actual_count >= apdex_result.expected_count THEN 'TRUE' ELSE 'FALSE' END AS meet_expectation FROM apdex_result ORDER BY meet_expectation DESC, apdex_result.metric ASC ```
Expand to see expected telemetry records Screenshot 2024-04-04 at 1 40 41 PM Screenshot 2024-04-04 at 1 40 50 PM
Expand to see expected occurrences of those telemetry when automation runs once Screenshot 2024-04-04 at 1 43 30 PM


Outcome:

Screenshot 2024-04-04 at 1 00 52 PM
latin-panda commented 2 months ago

@michaelkohn @ralfudx please look at the previous comments for the SQLs results.

I couldn't verify the forms' names because I didn't find them in the automated tests, but I assume these are okay-ish.

@michaelkohn If you observe, we don't have an enketo:reports:<__app_form>...... That is because the only app form I could find was opened from the Contact detail page, and it's recorded as: enketo:contacts:<__app_form>.... I assume we want to measure the Apdex of Enketo's app forms, so this should be enough since both Reports and Contact pages work similarly when rendering forms.

Let me know your thoughts.

Should I move this ticket to next week and continue the conversation when @ralfudx is available?

latin-panda commented 2 months ago

@michaelkohn @ralfudx @Benmuiruri Another question: when you look at this volume of data and then at the Apdex report, do you feel it's representative enough?

Just to give you a referent, this is when we had a test party of 17 people using the Kenyan instance:

Image

Benmuiruri commented 2 months ago

Hi @latin-panda I think what's important is to agree on a baseline. Something like: we will run the tests for 10 days for 3 CHWs (30 days worth of data becomes our baseline). With that baseline then we can do the data analysis in a consistent manner.

latin-panda commented 2 months ago

Thanks @Benmuiruri! IIRC @michaelkohn suggested running the suite 10 times each day. Your suggestion is good. I have added it in the iterations section of this comment for our Q2 work.

latin-panda commented 2 months ago

@michaelkohn, we could close this one since I did the analysis and open a new one to follow up on the questions above when @ralfudx is available, or move it to next week. Either way is fine with me.

michaelkohn commented 2 months ago

We don't have telemetry about saving tasks, something like this

That should have been addressed by https://github.com/medic/care-teams/issues/57, though I'm not sure if this was added before the tests were run.

Same with the task:load and task:reload - it seems some data is missing.

Yeah I'd agree, seems like something is missing. I would think scenario 5 (open tasks page) would register task:load 2x each time it runs, and we are simulating 10 runs per day so this should be 20 per day.

I think tasks:refresh would only register when you create a contact or submit a report (or task cache expiry), and we are creating 1 contact and submitting 1 report (well 2 total, with #57 implemented), so I would expect tasks:refresh to have 2 or 3 tasks:refresh per test run, and we're simulating this happening 10x per day so 20 or 30 tasks:refresh per day simulated.

there is a big difference in contact_list:load

Does it perhaps behave differently when you (tap the back arrow in upper left corner) vs. clicking on the tab itself?

I assume we want to measure the Apdex of Enketo's app forms, so this should be enough since both Reports and Contact pages work similarly when rendering forms.

Agreed, I think it's fine. We'll also get some Enketo stuff from #57

Should I move this ticket to next week and continue the conversation when @ralfudx is available?

I'm actually fine closing it. The task was to run the scripts to verify the data... which you have successfully done, and have found some discrepancies. Let's close this issue and open another one for reviewing and/or fixing these issues next week. Part of the "fixing" will be to verify the data again.