google / android-fhir

The Android FHIR SDK is a set of Kotlin libraries for building offline-capable, mobile-first healthcare applications using the HL7® FHIR® standard on Android.
https://google.github.io/android-fhir/
Apache License 2.0
495 stars 296 forks source link

Use INTERSECT and UNION for multiple filters in generated query #2725

Open LZRS opened 1 week ago

LZRS commented 1 week ago

A suggestion to use Intersect or Union of the resulting resourceUuid of the index table subqueries depending on the Search operation. This might help reduce redundancy if same resourceUuid appears across the tables or in the case of operation AND, resourceUuid doesn't appear.

Example Query generated for multiple filters currently uses multiple ANDs across the different index tables

An example to currently generated query

SELECT a.resourceUuid, a.serializedResource FROM ResourceEntity a
WHERE a.resourceType = 'Location'
AND 
a.resourceUuid IN (
    SELECT resourceUuid FROM TokenIndexEntity WHERE resourceType = 'Location' AND index_name = 'status' AND index_value = 'active')
AND 
a.resourceUuid IN (
    SELECT resourceUuid FROM TokenIndexEntity WHERE resourceType = 'Location' AND index_name = 'type' AND (index_value = 'bu' AND IFNULL(index_system,'') = 'http://terminology.hl7.org/CodeSystem/location-physical-type'))

May be replaced with

SELECT a.resourceUuid, a.serializedResource
FROM ResourceEntity a
WHERE a.resourceType = 'Location'
  AND a.resourceUuid IN (SELECT resourceUuid
                         FROM TokenIndexEntity
                         WHERE resourceType = 'Location' AND index_name = 'status' AND index_value = 'active'

INTERSECT

SELECT resourceUuid
FROM TokenIndexEntity
WHERE resourceType = 'Location'
  AND index_name = 'type'
  AND (index_value = 'bu' AND IFNULL(index_system, '') = 'http://terminology.hl7.org/CodeSystem/location-physical-type'));
jingtang10 commented 1 week ago

do you have any performance benchmark numbers for this one or is this more speculative?

LZRS commented 1 week ago

Yeah, it's kinda of speculative but I can get some data

LZRS commented 1 week ago

For the above queries, with db

SELECT COUNT(*) FROM TokenIndexEntity;
74942 rows

SELECT COUNT(*) FROM TokenIndexEntity WHERE resourceType = 'Location';
11784 rows

SELECT COUNT(*) FROM ResourceEntity;
8881 rows

SELECT COUNT(*) FROM ResourceEntity WHERE resourceType = 'Location'; 1966 rows

The first query took around17ms and the second one 7ms

Additional context

SELECT COUNT(*)
FROM (SELECT resourceUuid
      FROM TokenIndexEntity
      WHERE resourceType = 'Location'
        AND index_name = 'status'
        AND index_value = 'active'

      INTERSECT

      SELECT resourceUuid
      FROM TokenIndexEntity
      WHERE resourceType = 'Location'
        AND index_name = 'type'
        AND (index_value = 'bu' AND
             IFNULL(index_system, '') = 'http://terminology.hl7.org/CodeSystem/location-physical-type'));

57 rows

SELECT COUNT(*)
FROM (SELECT resourceUuid
      FROM TokenIndexEntity
      WHERE resourceType = 'Location'
        AND index_name = 'status'
        AND index_value = 'active'

      UNION ALL

      SELECT resourceUuid
      FROM TokenIndexEntity
      WHERE resourceType = 'Location'
        AND index_name = 'type'
        AND (index_value = 'bu' AND
             IFNULL(index_system, '') = 'http://terminology.hl7.org/CodeSystem/location-physical-type'));

2023 rows