Investigate the use of indexes on FHIR Core/SDK

opensrp / fhircore

FHIR Core / OpenSRP 2 is a Kotlin application for delivering offline-capable, mobile-first healthcare project implementations from local community to national and international scale using FHIR and WHO Smart Guidelines on Android.

https://smartregister.org

Apache License 2.0

50 stars 39 forks source link

Investigate the use of indexes on FHIR Core/SDK #3180

Closed ndegwamartin closed 2 months ago

ndegwamartin commented 3 months ago

Overview We need to investigate the use of indexes on the platform with respect to FHIR Core. This will help determine whether we need to index extra/missing fields to improve performance at the db level.

Some steps to guide the process are

Get the generated query
Pass it to a query explainer to see if it’s using indexes
Check if there’s any missing fields that are not indexed
Try if indexing those missing fields will make the query faster
Try to index based on resourceType and test for performance improvements

FikriMilano commented 3 months ago

Gotcha, we'll proceed as discussed 👍

FikriMilano commented 3 months ago

Based on this EXPLAIN QUERY PLAN result: SEARCH TABLE ResourceEntity USING INDEX index_ResourceEntity_resourceType_resourceId (resourceType=?)

The DB is already indexing based on resourceType and fields. For indexing based on fields, the indexed fields are only fields that can be used as search parameter, fields that is not a search parameter won’t be indexed, which makes sense since indexing would only be effective if the fields are used as a sort or filtering query. For these reason I don’t think we can improve this anymore by indexing.
After some R&D, I think it will be a good idea to partition based on resourceType instead, since we are mostly search based on resourceType and there is a fixed amount of resourceType on FHIR, that way, the db will do much less scanning on the rows, assuming common resourceTypes that has a lot of rows (QR, Task, Patient, Encounter) are separated into it’s own partition. That way, we can have a faster query.

FikriMilano commented 3 months ago

@ndegwamartin @ellykits @dubdabasoduba

FikriMilano commented 3 months ago

@pld

ellykits commented 3 months ago

By partitioning, do you mean having separate tables per resource type? @FikriMilano

ageryck commented 3 months ago

Based on this EXPLAIN QUERY PLAN result: SEARCH TABLE ResourceEntity USING INDEX index_ResourceEntity_resourceType_resourceId (resourceType=?)

The DB is already indexing based on resourceType and fields. For indexing based on fields, the indexed fields are only fields that can be used as search parameter, fields that is not a search parameter won’t be indexed, which makes sense since indexing would only be effective if the fields are used as a sort or filtering query. For these reason I don’t think we can improve this anymore by indexing.

After some R&D, I think it will be a good idea to partition based on resourceType instead, since we are mostly search based on resourceType and there is a fixed amount of resourceType on FHIR, that way, the db will do much less scanning on the rows, assuming common resourceTypes that has a lot of rows (QR, Task, Patient, Encounter) are separated into it’s own partition. That way, we can have a faster query.

@FikriMilano do you suggest vertical or horizontal partitioning? Would this result in any refactors on how we currently query?

FikriMilano commented 3 months ago

By partitioning, do you mean having separate tables per resource type? @FikriMilano

@ellykits yes

FikriMilano commented 3 months ago

@FikriMilano do you suggest vertical or horizontal partitioning? Would this result in any refactors on how we currently query?

@ageryck horizontal partitioning, the idea is to have less rows to scan.

No, there won't be refactor on our queries, because we are using the FHIR Engine API, not raw queries.

FikriMilano commented 3 months ago

The refactor will only happen in the FHIR Engine library

pld commented 3 months ago

Is ReosurceType indexed? Is it a key in queries?On Apr 2, 2024, at 08:19, FikriMilano @.***> wrote: The refactor will only happen in the FHIR Engine library

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

ndegwamartin commented 3 months ago

@FikriMilano, could you also confirm whether it is a composite index or not?

FikriMilano commented 3 months ago

@pld @ndegwamartin

yes ResourceType is already indexed as unique composite index of resourceType + id.
resourceType is not a key in queries, it's simply a column

ndegwamartin commented 3 months ago

@FikriMilano I wonder if for cases where we have filters by ResourceType we might need a non-composite index to speed things up, esp. if we have any queries that filter by column this directly. The column may have a low cardinality but we might still get a significant performance gain. Might be a good idea to test this with our use case.

FikriMilano commented 3 months ago

@ndegwamartin I'm thinking the same, let's see

FikriMilano commented 2 months ago

@ndegwamartin after the experiment, there are no performance difference in having resourceType as single column index, whether it's together with or without the composite index.

Also, after reading this article, the experiment result matches the theory mentioned in this article, mainly about "having a single column index of [column X], will perform the same with composite index if [column X] is the first order" https://user3141592.medium.com/single-vs-composite-indexes-in-relational-databases-58d0eb045cbe

FikriMilano commented 2 months ago

@ndegwamartin should we give partitioning a try? in a different ticket maybe?

ndegwamartin commented 2 months ago

@FikriMilano thanks for this, yeah we can close this ticket and discuss the other alternatives + track on a different ticket. cc @ellykits