opensrp / opensrp-scripts

Customised OpenMRS ETL for Report Requirements in OpenSRP Projects
Apache License 2.0
1 stars 2 forks source link

Metric 5 - Penta Dropout Rate #18

Closed csc5k closed 6 years ago

csc5k commented 7 years ago

Creating GitHub ticket to track ZEIR metric 5 - penta dropout rate. The complete list and detail can be found here: https://drive.google.com/open?id=1dOy5H_fy8bYaJFL-Oa53p_rNgR9gZOQi6qpFUE8UHCA

We'll use this ticket to track the status of and issues with the metric as we build it out in Superset.

Current dependency is ETL development.

csc5k commented 7 years ago

This metric is meant to show the percentage of patients who have received penta1 but not penta3 as of each date.

What the Time Series - Line Chart in Superset shows is that metric for the particular date along the x-axis. What we need it to show is the aggregate value of that rate as of that date on the x-axis.

E.g.: Day 1: P1 = 2, P3 = 2 Day 2: P1 = 2, P3 = 1

What we should see is: Day 1: dropout rate = 0% (2 - 2 / 2) Day 2: dropout rate = 25% (4 - 3 / 4)

What we do see is: Day 1: dropout rate = 0% (2 - 2 / 2) Day 2: dropout rate = 50% (2 - 1 / 2)

My SQL query is this: SELECT encounter_date AS __timestamp, CAST((SUM(encounter.pentavalent_dose_1) - SUM(encounter.pentavalent_dose_3)) as float) / CAST(SUM(encounter.pentavalent_dose_1) as float) AS "Penta 1 to Penta 3 Dropout", CAST((SUM(encounter.pentavalent_dose_1) - SUM(encounter.measles_mr_dose_1)) as float) / CAST(SUM(encounter.pentavalent_dose_1) as float) AS "Penta 1 to Measles Dropout" FROM encounter WHERE encounter_date >= '2016-11-06 00:00:00.000000' AND encounter_date <= '2017-11-06 17:00:29.000000' GROUP BY encounter_date LIMIT 50000 OFFSET 0

Currently investigating using a nested subquery to achieve this.

Also working with the BlueCode team to upgrade our SQL backend to PostgresSQL from SQLite.

Note this also affects #17 (the logic is the same, just using different columns)

csc5k commented 7 years ago

Able to get the dropout rate calculation going with this query:

select date(encounter_date) encounter_date, p1, p3, running_p1, running_p3, (running_p1 - running_p3) / running_p1 as dropout_rate from ( select date(encounter_date) encounter_date, p1, p3, (@p1 := @p1 + p1) as running_p1, (@p3 := @p3 + p3) as running_p3 from (select date(encounter_date) encounter_date, sum(pentavalent_dose_1) as p1, sum(pentavalent_dose_3) as p3 from encounter e where date(encounter_date) >= '2016-11-06 00:00:00.000000' and date(encounter_date) <= '2017-11-06 17:00:29.000000' group by date(encounter_date) order by encounter_date ) e cross join (select @p1 := 0, @p3 := 0) params) as s;

Need to convert it to Postgres and build out the view in Superset, but the calculation is accurate.

Big thanks to @ndegwamartin and @moshthepitt for giving me their time to help me comb through this!

csc5k commented 6 years ago

This metric is built in Superset