onaio / reveal-frontend

WebUI for the Reveal epidemiological surveillance platform
8 stars 4 forks source link

Investigate why Zambia IRS reports performance is bad #1289

Open moshthepitt opened 4 years ago

moshthepitt commented 4 years ago

Noticed that the Zambia reports have problems:

^^ this is on Reveal stage.

Need to look into this and try to fix.

pld commented 4 years ago

Is this the query for zambia_irs_structures?

And this for zambia_irs_export?

pld commented 4 years ago

Uh nevermind, verify is obvi not the right place... let me find the correct ones

pld commented 4 years ago

structures: https://github.com/OpenSRP/opensrp-reveal-datawarehouse/blob/master/3-reveal/migrations/5-IRS/2-Zambia-2019/deploy/zambia_irs_structures.psql

export: https://github.com/OpenSRP/opensrp-reveal-datawarehouse/blob/master/3-reveal/migrations/5-IRS/2-Zambia-2019/deploy/zambia_irs_export.psql

moshthepitt commented 4 years ago

its in the deploy directory there

and thats one part, all the queries here are interconnected

pld commented 4 years ago

k, is just the structures query taking hours? or is it and all it's dependent queries taking hours?

Posting the query plan would help.

Some specific thoughts/questions below, and they're gonna be naive b/c I haven't looked at these yet so... bear w/me

on structures:

pld commented 4 years ago

for export:

^ last sub-items apply to any query w/joins

moshthepitt commented 4 years ago

k, is just the structures query taking hours? or is it and all it's dependent queries taking hours?

The structure query zambia_irs_structures.psql would be the best one to attack as it is the foundation for the rest and also quite slow

why do we need a UUID v5?

We need to get a deterministic id based on structure id and task id. Maybe we can do it differently? Though I am not sure if this will move the needle very much


Will look into the rest

pld commented 4 years ago

Cool, that sounds good, these are brainstorming ideas and please approach as such, we'll have to approach analytically tweaking, re-running

pld commented 4 years ago

More on zambia_irs_structures

The other angle of attack is how do we restrict these queries so the are only operating on potential new data? Or at least only active plans?

Do plans even have an active status? But we're not doing a join on plans here it looks like, could be restrictive on tasks.plan_identifiers to avoid a join, but would need to have a pre-existing list of plans, which we'd need to compute

Would want to look at the explain (or maybe this is an obvi sql thing I've forgotten) but is this where restricting before all the joins?

moshthepitt commented 4 years ago

The other angle of attack is how do we restrict these queries so the are only operating on potential new data? Or at least only active plans?

I interpreted that two ways:

A

So along these lines that I have been thinking of is to make it something of a hybrid table/view/something i.e. not a true materialized view.

We only need to update for plans that are receiving data, and it should be somehow possible to compute the information for the old plans once and merge that with information for plans that are receiving data.

I think that something like that approach may be the most sustainable long term. e.g. we have something like 4x the data that we had last year, and the season hasn't even started.

B

Plans, and tasks, do have status field that indicates if active, retired etc. And this query doesn't check for that at the moment. It would help, and not hurt to leave out retired plans, and canceled tasks.

pld commented 4 years ago

Cool, let's do B for now, that's what I was thinking of.

I agree A is a good idea, we need to discuss approach here more though and it'll take longer. A heuristic way to do A through B is to create a have a last_used_at (or something, there's a better way than this I'm sure) flag that we set to now whenever something related to the plan is touched, then we modify B's approach to only consider plans w/last_used_at greater than the most recent time the view was updated -- needs more thought for sure, anyhow for now, B seems like quckish win?