opencrvs / opencrvs-core

A global solution to civil registration
https://www.opencrvs.org
Other
88 stars 71 forks source link

Research performance improvements for the Performance page #4371

Closed euanmillar closed 11 months ago

euanmillar commented 1 year ago
rikukissa commented 1 year ago

How does this slowness manifest itself? Is it the main performance view or one of the subpages? Do we know which GraphQL queries are taking the longest?

My recommendation is we nail down the biggest bottlenecks in our code and see if there's anything we can do to improve. Adding caching or new hardware works, but should be used as the final option as it increases TCO and complexity.

My hunch is we have many low-hanging fruits as InfluxDB itself should be blazing fast. These are most likely in between the InfluxDB query and the browser receiving the response.

I took some measurements in Nigeria demo server's InfluxDB just out of curiosity

Total number of registrations 13888

Took 0.169596s

select COUNT(*) from birth_registration
name: birth_registration
time count_ageInDays count_compositionId
---- --------------- -------------------
0    13786           13888

"main" query for the index page of Performance

Took 0.25815s

name: birth_registration
tags: eventLocationType=CAR_TAXI, gender=female, practitionerRole=LOCAL_REGISTRAR, timeLabel=within5Years
time                total
----                -----
1638425178702000001 127

name: birth_registration
tags: eventLocationType=CAR_TAXI, gender=female, practitionerRole=LOCAL_REGISTRAR, timeLabel=withinLate
time                total
----                -----
1638425178702000001 465

name: birth_registration
tags: eventLocationType=CAR_TAXI, gender=female, practitionerRole=LOCAL_REGISTRAR, timeLabel=withinTarget
time                total
----                -----
1638425178702000001 620

name: birth_registration
tags: eventLocationType=CAR_TAXI, gender=male, practitionerRole=LOCAL_REGISTRAR, timeLabel=within5Years
time                total
----                -----
1638425178702000001 145

name: birth_registration
tags: eventLocationType=CAR_TAXI, gender=male, practitionerRole=LOCAL_REGISTRAR, timeLabel=withinLate
time                total
----                -----
1638425178702000001 574

name: birth_registration
tags: eventLocationType=CAR_TAXI, gender=male, practitionerRole=LOCAL_REGISTRAR, timeLabel=withinTarget
time                total
----                -----
1638425178702000001 705

name: birth_registration
tags: eventLocationType=CAR_TAXI, gender=undefined, practitionerRole=LOCAL_REGISTRAR, timeLabel=within5Years
time                total
----                -----
1638425178702000001 9

name: birth_registration
tags: eventLocationType=CAR_TAXI, gender=undefined, practitionerRole=LOCAL_REGISTRAR, timeLabel=withinLate
time                total
----                -----
1638425178702000001 32

name: birth_registration
tags: eventLocationType=CAR_TAXI, gender=undefined, practitionerRole=LOCAL_REGISTRAR, timeLabel=withinTarget
time                total
----                -----
1638425178702000001 38

name: birth_registration
tags: eventLocationType=HEALTH_FACILITY, gender=female, practitionerRole=FIELD_AGENT, timeLabel=within5Years
time                total
----                -----
1638425178702000001 1

name: birth_registration
tags: eventLocationType=HEALTH_FACILITY, gender=female, practitionerRole=LOCAL_REGISTRAR, timeLabel=within5Years
time                total
----                -----
1638425178702000001 555

name: birth_registration
tags: eventLocationType=HEALTH_FACILITY, gender=female, practitionerRole=LOCAL_REGISTRAR, timeLabel=withinLate
time                total
----                -----
1638425178702000001 2210

name: birth_registration
tags: eventLocationType=HEALTH_FACILITY, gender=female, practitionerRole=LOCAL_REGISTRAR, timeLabel=withinTarget
time                total
----                -----
1638425178702000001 2638

name: birth_registration
tags: eventLocationType=HEALTH_FACILITY, gender=male, practitionerRole=FIELD_AGENT, timeLabel=withinTarget
time                total
----                -----
1638425178702000001 1

name: birth_registration
tags: eventLocationType=HEALTH_FACILITY, gender=male, practitionerRole=LOCAL_REGISTRAR, timeLabel=within5Years
time                total
----                -----
1638425178702000001 548

name: birth_registration
tags: eventLocationType=HEALTH_FACILITY, gender=male, practitionerRole=LOCAL_REGISTRAR, timeLabel=withinLate
time                total
----                -----
1638425178702000001 2133

name: birth_registration
tags: eventLocationType=HEALTH_FACILITY, gender=male, practitionerRole=LOCAL_REGISTRAR, timeLabel=withinTarget
time                total
----                -----
1638425178702000001 2713

name: birth_registration
tags: eventLocationType=HEALTH_FACILITY, gender=undefined, practitionerRole=LOCAL_REGISTRAR, timeLabel=within5Years
time                total
----                -----
1638425178702000001 25

name: birth_registration
tags: eventLocationType=HEALTH_FACILITY, gender=undefined, practitionerRole=LOCAL_REGISTRAR, timeLabel=withinLate
time                total
----                -----
1638425178702000001 104

name: birth_registration
tags: eventLocationType=HEALTH_FACILITY, gender=undefined, practitionerRole=LOCAL_REGISTRAR, timeLabel=withinTarget
time                total
----                -----
1638425178702000001 134

name: birth_registration
tags: eventLocationType=PRIVATE_HOME, gender=female, practitionerRole=FIELD_AGENT, timeLabel=after5Years
time                total
----                -----
1638425178702000001 1

name: birth_registration
tags: eventLocationType=PRIVATE_HOME, gender=female, practitionerRole=LOCAL_REGISTRAR, timeLabel=withinLate
time                total
----                -----
1638425178702000001 2

name: birth_registration
tags: eventLocationType=PRIVATE_HOME, gender=male, practitionerRole=FIELD_AGENT, timeLabel=withinLate
time                total
----                -----
1638425178702000001 1

name: birth_registration
tags: eventLocationType=PRIVATE_HOME, gender=male, practitionerRole=LOCAL_REGISTRAR, timeLabel=after5Years
time                total
----                -----
1638425178702000001 1

name: birth_registration
tags: eventLocationType=PRIVATE_HOME, gender=male, practitionerRole=LOCAL_REGISTRAR, timeLabel=within5Years
time                total
----                -----
1638425178702000001 2

name: birth_registration
tags: eventLocationType=PRIVATE_HOME, gender=male, practitionerRole=LOCAL_REGISTRAR, timeLabel=withinTarget
time                total
----                -----
1638425178702000001 2

Payments

Took 0.016235s

SELECT SUM(total) AS total FROM payment WHERE eventType = 'BIRTH' AND time > '2021-12-02T06:06:18.702Z' AND time <= '2022-12-02T06:06:18.703Z' GROUP BY paymentType

name: payment
tags: paymentType=certification
time                total
----                -----
1638425178702000001 0

name: payment
tags: paymentType=correction
time                total
----                -----
1638425178702000001 0

Total number of certifications

Took 0.217396s

SELECT COUNT(DISTINCT(compositionId)) AS total FROM certification WHERE time > '2021-12-02T06:06:18.702Z' AND time <= '2022-12-02T06:06:18.703Z' GROUP BY eventType
name: certification
tags: eventType=BIRTH
time                total
----                -----
1638425178702000001 6862

Completeness rates

Subpage /performance/operations/completenessRates/birth?time=withinTarget&timeEnd=2022-12-02T06%3A44%3A52.804Z&timeStart=2021-12-02T06%3A44%3A52.804Z

Took 0.559765s

SELECT COUNT(ageInDays) AS total FROM birth_registration WHERE time > '2021-12-02T06:44:52.804Z' AND time <= '2022-12-02T06:44:52.804Z' GROUP BY dateLabel, timeLabel
euanmillar commented 1 year ago

I wonder if this would help us understand the problem more: https://github.com/opencrvs/opencrvs-core/issues/4384

Loading time of performance was something that Ed mentioned needs to be looked at. He said it was very slow in Nigeria

euanmillar commented 1 year ago

Perhaps the issues are related not to Influx but to data that is in Hearth that is also required?

rikukissa commented 1 year ago

I'm pretty sure of that, yeah. Our GraphQL gateway does massive amounts of unnecessary processing (mainly requests to Hearth) in some other data fetching tasks as well.

An example I came across debugging another performance issue:

fetchBirthRegistration request makes 59 requests (as per OpenHIM) internally every time it's called. Of these, 37 are duplicates.

Requests Method Host Path Label
22x GET localhost 5001 /fhir/Location/4457f385-45d7-4b68-846b-a117df8ff052 Workflow passthrough
14x GET localhost 5001 /fhir/Location/54783dcd-4615-425a-882f-c80a911946eb Workflow passthrough
5x GET localhost 5001 /fhir/Composition/f2979b8a-709e-4aba-9edf-19e491ec5491 Workflow passthrough
3x GET localhost 5001 /fhir/Task focus=Composition/f2979b8a-709e-4aba-9edf-19e491ec5491 Workflow passthrough
3x GET localhost 5001 /fhir/RelatedPerson/83098a91-705a-4313-98b9-66333766d4c1 Workflow passthrough
2x GET localhost 5001 /fhir/QuestionnaireResponse subject=Encounter/6a0c12ef-a9ed-41d4-bb96-2a041e1f4bd8 Workflow passthrough
2x GET localhost 5001 /fhir/Patient/6576ef0f-b0e2-42d8-8b0f-aad854852a15 Workflow passthrough
2x GET localhost 5001 /fhir/Location/a89b5efa-e3d2-4172-8c74-00324a4361c7 Workflow passthrough
2x GET localhost 5001 /fhir/Encounter/6a0c12ef-a9ed-41d4-bb96-2a041e1f4bd8 Workflow passthrough
1x POST workflow 5050 /events/assigned Event Assignment
1x POST localhost 5001 /fhir Workflow passthrough
1x GET localhost 5001 /fhir/Task/ac5b8c4d-d685-45cb-8002-49a3a13c0329/_history _count=100 Workflow passthrough
1x GET localhost 5001 /fhir/Task/ac5b8c4d-d685-45cb-8002-49a3a13c0329/_history Workflow passthrough
1x GET localhost 5001 /fhir/Patient/70cf0b0a-c955-429c-b0ec-6d611793a9d6 Workflow passthrough
1x GET localhost 5001 /fhir/Patient/3c34eef1-cad9-4ac9-aa20-cb8cba466814 Workflow passthrough
1x GET localhost 5001 /fhir/Observation encounter=Encounter/6a0c12ef-a9ed-41d4-bb96-2a041e1f4bd8& code=73764-3 Workflow passthrough
1x GET localhost 5001 /fhir/Observation encounter=Encounter/6a0c12ef-a9ed-41d4-bb96-2a041e1f4bd8& code=57722-1 Workflow passthrough
1x GET localhost 5001 /fhir/Observation encounter=Encounter/6a0c12ef-a9ed-41d4-bb96-2a041e1f4bd8& code=3141-9 Workflow passthrough
1x GET localhost 5001 /fhir/Observation encounter=Encounter/6a0c12ef-a9ed-41d4-bb96-2a041e1f4bd8 Workflow passthrough
1x GET localhost 5001 /fhir/Location/a74c199e-c7f8-4494-aeab-53e775ca70e2 Workflow passthrough
1x GET localhost 5001 /fhir/Location/341c4ff4-d2a2-4e77-aeb7-887abdfb53a9 Workflow passthrough
1x GET localhost 5001 /fhir/Practitioner/e6590dd4-357f-4f06-93c6-8bcc79c3fa2c Workflow passthrough
euanmillar commented 1 year ago

Pasting in @rikukissa 's comment here:

Just putting some thoughts here if someone will work on this in the coming weeks: I haven’t taken any measurements, but my guess is the perceived slowness is a combination of slow / unreliable connection and time it takes to process the data coming from our database. As discussed earlier, it’s not that InfluxDB queries would be slow, but there’s quite a bit of unnecessary processing done when the raw data is enriched with location data etc. A few ideas, from easiest to more difficult, for the performance view: Implement similar pre-loading logic for Performance as we did with Dashboard + Statistics This makes loading the default view snappy, given they you’ve browsed other views first long enough for it to load Changing any filters or opening drill-down views still take as long as now Reduce repeated work done in GraphQL gateway. Some sort of caching can probably be achieved using the query contexts. Worth asking Tameem to take a look. This is something I believe is a bottleneck for almost all functionality we have in the app and we should come up with a strategy for improving different parts of the code overtime. Related ticket here. Refactor the performance view to use the analytics database we created for metabase. You probably would need my help in doing this and it’s quite a lot of work, so I don’t recommend doing this now. For Dashboard and Statistics we’ve implemented a pre-loading logic (same as point 1.), but optimising beyond that might not be possible as we do not have control over the code inside the embed If the charts themselves load slowly, I would look into adding a migration that creates MongoDB indexes for fields that are searched from when drawing the visualisations