onaio / reveal-frontend

WebUI for the Reveal epidemiological surveillance platform
8 stars 4 forks source link

ZM Production - Discrepancies with CSV export vs dashboards #593

Closed craigappl closed 4 years ago

craigappl commented 4 years ago

Akros has requested a data export so they can perform the final reports for VectorLink. The team is comparing the CSV export against the Zambia dashboards and the numbers are not matching. Here's what we are experiencing (in order of importance)

CSV Export as of 3:23PM PST 13/11/19 (Total Sprayed = 23372) https://drive.google.com/open?id=1OrSkfqHFboBTFWWwY8rxxZCs6off-qDe

Links

gstuder-ona commented 4 years ago

@craigappl - one strange property of the csv export -

"WHERE zambia_irs_structures.business_status NOT IN ('Not Visited');" - this doesn't really line up with any of the other reporting logic the web views use. It seems more in line with the intended use to just do a full structure dump, whatever the business status, so that all the metrics could be compared and re-computed if desired. Right now, for example, it's not possible to know if the "found" structure count is accurate because of this, but generally I just think it adds complexity to filter this way?

craigappl commented 4 years ago

@gstuder-ona , The idea is to get an exact list of events in this export that match what's presented in the table counts in the Web UI dashboards. I don't know the underlying logic, nor have access to the queries. Ultimately, if the dashboard says 23,415 structures were sprayed for that province, the CSV export should have 23,415 rows of events (unless events are missing)

gstuder-ona commented 4 years ago

@craigappl oh, the export is structures, but each structure has a "latest" event attached, basically, which is how all the stats get configured. It's not so much a match/not match issue as the current export is a seemingly arbitrary subset. Doesn't seem necessary, as the subset isn't all that interesting afaik.

But this is more a side question, I actually don't think the count discrepancy comes from this.

gstuder-ona commented 4 years ago

So finally got some view updates - the core issue I believe is that views are updating on an arbitrary schedule, which can lead to (temporary) inconsistency in counts between the export and the web view. Eventually the two will converge, but while data is incoming there can be mismatches - the issue isn't an inconsistent calculation. Note that, for example, the current Luapula sprayed count is 23830 at 2PM EST 2019-11-14.

It is important to do the aggregation exactly the same way though - these two queries give the same results:

SELECT jurisdiction_name,
       plan_id,
       totstruct,
       sprayedstruct
FROM zambia_irs_jurisdictions
WHERE (plan_id = '9f1e0cfa-5313-49ff-af2c-f7dbf4fbdb9d')
    AND jurisdiction_name = 'Luapula';

-- from the export
SELECT COUNT(*)
FROM zambia_irs_export
LEFT JOIN jurisdictions_materialized_view AS jurisdictions_ex ON jurisdictions_ex.jurisdiction_id = zambia_irs_export.structure_jurisdiction_id
WHERE (plan_id = '9f1e0cfa-5313-49ff-af2c-f7dbf4fbdb9d')
    AND jurisdiction_parent2 = 'Luapula'
    AND business_status IN ('Partially Sprayed',
                            'Complete')
    AND jurisdictions_ex.jurisdiction_depth >= 4;
gstuder-ona commented 4 years ago

@craigappl the best fix here I believe should be to better-order and time our view refreshes - it's unavoidable to have transient inconsistency in the way that we're doing this computation. I'll mark the ticket for review when I change the NiFi flows.

gstuder-ona commented 4 years ago

whoops didn't mean to close.

gstuder-ona commented 4 years ago

The fix for this is in-place in NiFi (cascading view updates vs round-robin), but it is currently blocked by onaio/canopy#591

gstuder-ona commented 4 years ago

Fix is in place, under review by Akros

moshthepitt commented 4 years ago

@gstuder-ona @craigappl can we close this now?

gstuder-ona commented 4 years ago

This is superseded by onaio/canopy#611.