Closed euanmillar closed 11 months ago
How does this slowness manifest itself? Is it the main performance view or one of the subpages? Do we know which GraphQL queries are taking the longest?
My recommendation is we nail down the biggest bottlenecks in our code and see if there's anything we can do to improve. Adding caching or new hardware works, but should be used as the final option as it increases TCO and complexity.
My hunch is we have many low-hanging fruits as InfluxDB itself should be blazing fast. These are most likely in between the InfluxDB query and the browser receiving the response.
I took some measurements in Nigeria demo server's InfluxDB just out of curiosity
13888
Took 0.169596s
select COUNT(*) from birth_registration name: birth_registration time count_ageInDays count_compositionId ---- --------------- ------------------- 0 13786 13888
Took 0.25815s
name: birth_registration
tags: eventLocationType=CAR_TAXI, gender=female, practitionerRole=LOCAL_REGISTRAR, timeLabel=within5Years
time total
---- -----
1638425178702000001 127
name: birth_registration
tags: eventLocationType=CAR_TAXI, gender=female, practitionerRole=LOCAL_REGISTRAR, timeLabel=withinLate
time total
---- -----
1638425178702000001 465
name: birth_registration
tags: eventLocationType=CAR_TAXI, gender=female, practitionerRole=LOCAL_REGISTRAR, timeLabel=withinTarget
time total
---- -----
1638425178702000001 620
name: birth_registration
tags: eventLocationType=CAR_TAXI, gender=male, practitionerRole=LOCAL_REGISTRAR, timeLabel=within5Years
time total
---- -----
1638425178702000001 145
name: birth_registration
tags: eventLocationType=CAR_TAXI, gender=male, practitionerRole=LOCAL_REGISTRAR, timeLabel=withinLate
time total
---- -----
1638425178702000001 574
name: birth_registration
tags: eventLocationType=CAR_TAXI, gender=male, practitionerRole=LOCAL_REGISTRAR, timeLabel=withinTarget
time total
---- -----
1638425178702000001 705
name: birth_registration
tags: eventLocationType=CAR_TAXI, gender=undefined, practitionerRole=LOCAL_REGISTRAR, timeLabel=within5Years
time total
---- -----
1638425178702000001 9
name: birth_registration
tags: eventLocationType=CAR_TAXI, gender=undefined, practitionerRole=LOCAL_REGISTRAR, timeLabel=withinLate
time total
---- -----
1638425178702000001 32
name: birth_registration
tags: eventLocationType=CAR_TAXI, gender=undefined, practitionerRole=LOCAL_REGISTRAR, timeLabel=withinTarget
time total
---- -----
1638425178702000001 38
name: birth_registration
tags: eventLocationType=HEALTH_FACILITY, gender=female, practitionerRole=FIELD_AGENT, timeLabel=within5Years
time total
---- -----
1638425178702000001 1
name: birth_registration
tags: eventLocationType=HEALTH_FACILITY, gender=female, practitionerRole=LOCAL_REGISTRAR, timeLabel=within5Years
time total
---- -----
1638425178702000001 555
name: birth_registration
tags: eventLocationType=HEALTH_FACILITY, gender=female, practitionerRole=LOCAL_REGISTRAR, timeLabel=withinLate
time total
---- -----
1638425178702000001 2210
name: birth_registration
tags: eventLocationType=HEALTH_FACILITY, gender=female, practitionerRole=LOCAL_REGISTRAR, timeLabel=withinTarget
time total
---- -----
1638425178702000001 2638
name: birth_registration
tags: eventLocationType=HEALTH_FACILITY, gender=male, practitionerRole=FIELD_AGENT, timeLabel=withinTarget
time total
---- -----
1638425178702000001 1
name: birth_registration
tags: eventLocationType=HEALTH_FACILITY, gender=male, practitionerRole=LOCAL_REGISTRAR, timeLabel=within5Years
time total
---- -----
1638425178702000001 548
name: birth_registration
tags: eventLocationType=HEALTH_FACILITY, gender=male, practitionerRole=LOCAL_REGISTRAR, timeLabel=withinLate
time total
---- -----
1638425178702000001 2133
name: birth_registration
tags: eventLocationType=HEALTH_FACILITY, gender=male, practitionerRole=LOCAL_REGISTRAR, timeLabel=withinTarget
time total
---- -----
1638425178702000001 2713
name: birth_registration
tags: eventLocationType=HEALTH_FACILITY, gender=undefined, practitionerRole=LOCAL_REGISTRAR, timeLabel=within5Years
time total
---- -----
1638425178702000001 25
name: birth_registration
tags: eventLocationType=HEALTH_FACILITY, gender=undefined, practitionerRole=LOCAL_REGISTRAR, timeLabel=withinLate
time total
---- -----
1638425178702000001 104
name: birth_registration
tags: eventLocationType=HEALTH_FACILITY, gender=undefined, practitionerRole=LOCAL_REGISTRAR, timeLabel=withinTarget
time total
---- -----
1638425178702000001 134
name: birth_registration
tags: eventLocationType=PRIVATE_HOME, gender=female, practitionerRole=FIELD_AGENT, timeLabel=after5Years
time total
---- -----
1638425178702000001 1
name: birth_registration
tags: eventLocationType=PRIVATE_HOME, gender=female, practitionerRole=LOCAL_REGISTRAR, timeLabel=withinLate
time total
---- -----
1638425178702000001 2
name: birth_registration
tags: eventLocationType=PRIVATE_HOME, gender=male, practitionerRole=FIELD_AGENT, timeLabel=withinLate
time total
---- -----
1638425178702000001 1
name: birth_registration
tags: eventLocationType=PRIVATE_HOME, gender=male, practitionerRole=LOCAL_REGISTRAR, timeLabel=after5Years
time total
---- -----
1638425178702000001 1
name: birth_registration
tags: eventLocationType=PRIVATE_HOME, gender=male, practitionerRole=LOCAL_REGISTRAR, timeLabel=within5Years
time total
---- -----
1638425178702000001 2
name: birth_registration
tags: eventLocationType=PRIVATE_HOME, gender=male, practitionerRole=LOCAL_REGISTRAR, timeLabel=withinTarget
time total
---- -----
1638425178702000001 2
Took 0.016235s
SELECT SUM(total) AS total FROM payment WHERE eventType = 'BIRTH' AND time > '2021-12-02T06:06:18.702Z' AND time <= '2022-12-02T06:06:18.703Z' GROUP BY paymentType
name: payment
tags: paymentType=certification
time total
---- -----
1638425178702000001 0
name: payment
tags: paymentType=correction
time total
---- -----
1638425178702000001 0
Took 0.217396s
SELECT COUNT(DISTINCT(compositionId)) AS total FROM certification WHERE time > '2021-12-02T06:06:18.702Z' AND time <= '2022-12-02T06:06:18.703Z' GROUP BY eventType name: certification tags: eventType=BIRTH time total ---- ----- 1638425178702000001 6862
Subpage /performance/operations/completenessRates/birth?time=withinTarget&timeEnd=2022-12-02T06%3A44%3A52.804Z&timeStart=2021-12-02T06%3A44%3A52.804Z
Took 0.559765s
SELECT COUNT(ageInDays) AS total FROM birth_registration WHERE time > '2021-12-02T06:44:52.804Z' AND time <= '2022-12-02T06:44:52.804Z' GROUP BY dateLabel, timeLabel
I wonder if this would help us understand the problem more: https://github.com/opencrvs/opencrvs-core/issues/4384
Loading time of performance was something that Ed mentioned needs to be looked at. He said it was very slow in Nigeria
Perhaps the issues are related not to Influx but to data that is in Hearth that is also required?
I'm pretty sure of that, yeah. Our GraphQL gateway does massive amounts of unnecessary processing (mainly requests to Hearth) in some other data fetching tasks as well.
An example I came across debugging another performance issue:
fetchBirthRegistration
request makes 59 requests (as per OpenHIM) internally every time it's called. Of these, 37 are duplicates.
Requests | Method | Host | Path | Label | |
---|---|---|---|---|---|
22x | GET | localhost 5001 | /fhir/Location/4457f385-45d7-4b68-846b-a117df8ff052 | Workflow passthrough | |
14x | GET | localhost 5001 | /fhir/Location/54783dcd-4615-425a-882f-c80a911946eb | Workflow passthrough | |
5x | GET | localhost 5001 | /fhir/Composition/f2979b8a-709e-4aba-9edf-19e491ec5491 | Workflow passthrough | |
3x | GET | localhost 5001 | /fhir/Task focus=Composition/f2979b8a-709e-4aba-9edf-19e491ec5491 | Workflow | passthrough |
3x | GET | localhost 5001 | /fhir/RelatedPerson/83098a91-705a-4313-98b9-66333766d4c1 | Workflow passthrough | |
2x | GET | localhost 5001 | /fhir/QuestionnaireResponse subject=Encounter/6a0c12ef-a9ed-41d4-bb96-2a041e1f4bd8 | Workflow passthrough | |
2x | GET | localhost 5001 | /fhir/Patient/6576ef0f-b0e2-42d8-8b0f-aad854852a15 | Workflow passthrough | |
2x | GET | localhost 5001 | /fhir/Location/a89b5efa-e3d2-4172-8c74-00324a4361c7 | Workflow passthrough | |
2x | GET | localhost 5001 | /fhir/Encounter/6a0c12ef-a9ed-41d4-bb96-2a041e1f4bd8 | Workflow passthrough | |
1x | POST | workflow 5050 | /events/assigned Event Assignment | ||
1x | POST | localhost 5001 | /fhir | Workflow passthrough | |
1x | GET | localhost 5001 | /fhir/Task/ac5b8c4d-d685-45cb-8002-49a3a13c0329/_history _count=100 | Workflow | passthrough |
1x | GET | localhost 5001 | /fhir/Task/ac5b8c4d-d685-45cb-8002-49a3a13c0329/_history | Workflow passthrough | |
1x | GET | localhost 5001 | /fhir/Patient/70cf0b0a-c955-429c-b0ec-6d611793a9d6 | Workflow passthrough | |
1x | GET | localhost 5001 | /fhir/Patient/3c34eef1-cad9-4ac9-aa20-cb8cba466814 | Workflow passthrough | |
1x | GET | localhost 5001 | /fhir/Observation encounter=Encounter/6a0c12ef-a9ed-41d4-bb96-2a041e1f4bd8& | code=73764-3 | Workflow passthrough |
1x | GET | localhost 5001 | /fhir/Observation encounter=Encounter/6a0c12ef-a9ed-41d4-bb96-2a041e1f4bd8& | code=57722-1 | Workflow passthrough |
1x | GET | localhost 5001 | /fhir/Observation encounter=Encounter/6a0c12ef-a9ed-41d4-bb96-2a041e1f4bd8& | code=3141-9 | Workflow passthrough |
1x | GET | localhost 5001 | /fhir/Observation encounter=Encounter/6a0c12ef-a9ed-41d4-bb96-2a041e1f4bd8 | Workflow | passthrough |
1x | GET | localhost 5001 | /fhir/Location/a74c199e-c7f8-4494-aeab-53e775ca70e2 | Workflow passthrough | |
1x | GET | localhost 5001 | /fhir/Location/341c4ff4-d2a2-4e77-aeb7-887abdfb53a9 | Workflow passthrough | |
1x | GET | localhost 5001 | /fhir/Practitioner/e6590dd4-357f-4f06-93c6-8bcc79c3fa2c | Workflow passthrough |
Pasting in @rikukissa 's comment here:
Just putting some thoughts here if someone will work on this in the coming weeks: I haven’t taken any measurements, but my guess is the perceived slowness is a combination of slow / unreliable connection and time it takes to process the data coming from our database. As discussed earlier, it’s not that InfluxDB queries would be slow, but there’s quite a bit of unnecessary processing done when the raw data is enriched with location data etc. A few ideas, from easiest to more difficult, for the performance view: Implement similar pre-loading logic for Performance as we did with Dashboard + Statistics This makes loading the default view snappy, given they you’ve browsed other views first long enough for it to load Changing any filters or opening drill-down views still take as long as now Reduce repeated work done in GraphQL gateway. Some sort of caching can probably be achieved using the query contexts. Worth asking Tameem to take a look. This is something I believe is a bottleneck for almost all functionality we have in the app and we should come up with a strategy for improving different parts of the code overtime. Related ticket here. Refactor the performance view to use the analytics database we created for metabase. You probably would need my help in doing this and it’s quite a lot of work, so I don’t recommend doing this now. For Dashboard and Statistics we’ve implemented a pre-loading logic (same as point 1.), but optimising beyond that might not be possible as we do not have control over the code inside the embed If the charts themselves load slowly, I would look into adding a migration that creates MongoDB indexes for fields that are searched from when drawing the visualisations
@persist
directive in our queries to persist the returned data - 3