opencrvs / opencrvs-core

A global solution to civil registration
https://www.opencrvs.org
Other
90 stars 73 forks source link

Optimise database reads & create indices for all Hearth collections #4627

Closed rikukissa closed 1 year ago

rikukissa commented 1 year ago

None of our MongoDB collections currently have indices in place. This slows down our database. Indices are usually used to create "shortcuts" to database rows. Shortcuts can be based on an "id" field or another field. Without them, the database has to search through all rows to find one with a specific id.

Dev tasks

  1. [x] Research which FHIR resources get used when “download & assign” functionality is called. The best way to do this would be to listen for logs emitted by MongoDB to see which collections are accessed (and with what kind of queries) as part of the operation. - 2

    1. How long does it current take to download a record?
    2. Where is most of the time spent?
    3. Do we need to index with any other fields besides “id”?
  2. [x] Create a migration that adds a unique index for “id” to all Hearth collection. _history collections need to be created with unique: false as the collection can contain multiple versions of the same record - 2

    1. Other fields we regularly use and should have an index
  3. [x] Go through processing steps for "download & assign". Simplify the code so that the GraphQL handler fetches the full payload to be returned with as few as possible HTTP requests to Hearth and returns the payload so that it bypasses type resolvers altogether. https://www.hl7.org/implement/standards/FHIR/search.html#revinclude - 5

  4. [ ] Take new measurements of record download - 1

  5. [x] Investigate an ESLint rule in Gateway to force developers to use the datasource when fetching locations or practitionerroles. There is an example in the client eslintrs where we use the rule no-restricted-imports.

euanmillar commented 1 year ago

Needs tasks for creating indices for new records. A migration on its own is not enough

rikukissa commented 1 year ago

The index created for a collection gets regenerated automatically every time data in the collection changes, so we don't have to manually update the index in the code if I get your point correctly

Zangetsu101 commented 1 year ago

The items in the _history collections should have unique id's as well so I don't think we need to use unique: false.

rikukissa commented 1 year ago

Isn't the collection for snapshots of different versions of individual tasks though? So if you have Task {"id": "123-123-123"}, there might be multiple of those depending on how many times the task changed? _id might be unique, but we never search with that

Zangetsu101 commented 1 year ago

If I remember correctly, the id field is unique even for the _history collections. Like whenever a new Resource is pushed into the _history it gets a new id. Well I could be wrong here, so @tahmidrahman-dsi could you make sure if the id's are indeed unique please?

tahmidrahman-dsi commented 1 year ago

If I remember correctly, the id field is unique even for the _history collections. Like whenever a new Resource is pushed into the _history it gets a new id. Well I could be wrong here, so @tahmidrahman-dsi could you make sure if the id's are indeed unique please?

@Zangetsu101 the id is not unique, however, _id is unique image