google / android-fhir

The Android FHIR SDK is a set of Kotlin libraries for building offline-capable, mobile-first healthcare applications using the HL7® FHIR® standard on Android.
https://google.github.io/android-fhir/
Apache License 2.0
494 stars 295 forks source link

Update columns' order in TokenIndexEntity multi-column index #2729

Open LZRS opened 2 days ago

LZRS commented 2 days ago

IMPORTANT: All PRs must be linked to an issue (except for extremely trivial and straightforward changes).

Description Updates the order of columns in the multi-column index used in the TokenIndexEntity entity table to have index_value as the leftmost column. Also removes the index_system column from the index since it would never get used because the queries generated use the IFNULL statement that forces evaluation from the actual rows.

With an example query

SELECT a.*
FROM ResourceEntity a
WHERE a.resourceType = 'Task'
  AND a.resourceUuid IN (SELECT resourceUuid
                         FROM TokenIndexEntity
                         WHERE resourceType = 'Task'
                           AND index_name = 'status'
                           AND (
                             (index_value = 'ready' AND IFNULL(index_system, '') = 'http://hl7.org/fhir/task-status') OR
                             (index_value = 'in-progress' AND
                              IFNULL(index_system, '') = 'http://hl7.org/fhir/task-status')));

Previous query plan, was

QUERY PLAN
|--SEARCH a USING INDEX index_ResourceEntity_resourceType_resourceId (resourceType=?)
`--LIST SUBQUERY 1
   `--SEARCH TokenIndexEntity USING COVERING INDEX index_TokenIndexEntity_resourceType_index_name_index_system_index_value_resourceUuid (resourceType=? AND index_name=?)

The updated query plan would be

QUERY PLAN
|--SEARCH a USING INDEX index_ResourceEntity_resourceType_resourceId (resourceType=?)
`--LIST SUBQUERY 1
   `--MULTI-INDEX OR
      |--INDEX 1
      |  `--SEARCH TokenIndexEntity USING INDEX index_TokenIndexEntity_index_value_resourceType_index_name_resourceUuid (index_value=? AND resourceType=? AND index_name=?)
      `--INDEX 2
         `--SEARCH TokenIndexEntity USING INDEX index_TokenIndexEntity_index_value_resourceType_index_name_resourceUuid (index_value=? AND resourceType=? AND index_name=?) 

Type Feature

Checklist

LZRS commented 2 days ago

Running query

SELECT a.*
FROM ResourceEntity a
WHERE a.resourceType = 'Task'
  AND a.resourceUuid IN (SELECT resourceUuid
                         FROM TokenIndexEntity
                         WHERE resourceType = 'Task'
                           AND index_name = 'status'
                           AND (
                             (index_value = 'ready' AND IFNULL(index_system, '') = 'http://hl7.org/fhir/task-status') OR
                             (index_value = 'in-progress' AND
                              IFNULL(index_system, '') = 'http://hl7.org/fhir/task-status')));

in a database with

took 0.749 seconds vs 2.354 seconds from previous index

jingtang10 commented 1 day ago

hey what if you change this line: https://github.com/google/android-fhir/blob/b59acf288b38731be1472c0523d7cd1e4828a042/engine/src/main/java/com/google/android/fhir/search/filter/TokenParamFilterCriterion.kt#L93

to:

"${ if (it.uri.isNullOrBlank()) "" else "IFNULL(index_system,'') = ? AND " }index_value = ?",

wouldn't that save you from changing the index?

also - i'm not sure if we should allow empty system at all? shouldn't we always look for a code system? so that even if it's empty we should be checking taht it's an empty code system. that way, even if your query doesn't include a code system we'll still be hitting the index.

LZRS commented 1 day ago

hey what if you change this line:

https://github.com/google/android-fhir/blob/b59acf288b38731be1472c0523d7cd1e4828a042/engine/src/main/java/com/google/android/fhir/search/filter/TokenParamFilterCriterion.kt#L93

to:

"${ if (it.uri.isNullOrBlank()) "" else "IFNULL(index_system,'') = ? AND " }index_value = ?",

wouldn't that save you from changing the index?

also - i'm not sure if we should allow empty system at all? shouldn't we always look for a code system? so that even if it's empty we should be checking taht it's an empty code system. that way, even if your query doesn't include a code system we'll still be hitting the index.

Changing that line the query, would probably look like this

SELECT a.*
FROM ResourceEntity a
WHERE a.resourceType = 'Task'
  AND a.resourceUuid IN (SELECT resourceUuid
                         FROM TokenIndexEntity
                         WHERE resourceType = 'Task'
                           AND index_name = 'status'
                           AND (
                             (IFNULL(index_system, '') = 'http://hl7.org/fhir/task-status' AND index_value = 'ready') OR
                             (IFNULL(index_system, '') = 'http://hl7.org/fhir/task-status' AND index_value = 'in-progress')));

whereby the query plan still doesn't change

QUERY PLAN
|--SEARCH a USING INDEX index_ResourceEntity_resourceType_resourceId (resourceType=?)
`--LIST SUBQUERY 1
   `--SEARCH TokenIndexEntity USING COVERING INDEX index_TokenIndexEntity_resourceType_index_name_index_system_index_value_resourceUuid (resourceType=? AND index_name=?)

The order of columns in the query doesn't need to match the way they've been defined in the index. I think it checks that leftmost column of an index exists within the WHERE constraints, and if there is an conflict it checks for the next column in the index...that is until, it either encounters a inequality comparison, a missing column constraint, or a column with function evaluation unless the indexed column is a function evaluation. To still have the index_system as part of the index, we can probably define it in the index as IFNULL(index_system, '')

Index(value = ["index_value", "resourceType", "index_name", "IFNULL(index_system, '')" "resourceUuid"]),

For queries that wouldn't have index_system, it would be a little in-efficient since it won't return resourceUuid directly from the covering index

I think it should still be okay to search without the system in the query. Assuming for the above query, it should still be okay to search

SELECT a.*
FROM ResourceEntity a
WHERE a.resourceType = 'Task'
  AND a.resourceUuid IN (SELECT resourceUuid
                         FROM TokenIndexEntity
                         WHERE resourceType = 'Task'
                           AND index_name = 'status'
                           AND (index_value = 'ready'  OR index_value = 'in-progress'));