hapifhir / hapi-fhir

🔥 HAPI FHIR - Java API for HL7 FHIR Clients and Servers
http://hapifhir.io
Apache License 2.0
2.04k stars 1.33k forks source link

fix reindex optimizeStorage=ALL_VERSIONS #6421

Closed mrdnctrk closed 3 weeks ago

mrdnctrk commented 3 weeks ago

In this PR, I am fixing two issues (#6419 and #6420) related to $reindex operation with optimizeStorage set to ALL_VERSIONS.

We actually have a test for this functionality but the test uses h2 and the issue doesn't happen for h2. https://github.com/hapifhir/hapi-fhir/blob/7ee941585ba82ad67f55f6f7cc8fcbd2a89a493b/hapi-fhir-jpaserver-test-r4/src/test/java/ca/uhn/fhir/jpa/reindex/ReindexTaskTest.java#L132-L132

UPDATE: Because I changed the prefetch function prefetch even when there is a single resource, I updated some tests for query counts for transactions by reducing the expected number of select queries. In case there is single resource to Update an existing resource in a transaction (there cab be other resources to create, delete etc in the bundle), previously we weren't prefetching but with the change we are prefetching. You can see one example below, where the number of select statements has reduced from 3 to 2 because of prefetch.

before - 3 queries (to get the resource itself, to get the current version ,and to get the token index):

[0] SqlQuery at 2024-10-30T14:25:51.356-04:00 took 0ms on Thread: main
SQL:
select
        rt1_0.RES_ID,
        rt1_0.RES_DELETED_AT,
        rt1_0.FHIR_ID,
        rt1_0.RES_VERSION,
        rt1_0.SP_HAS_LINKS,
        rt1_0.HAS_TAGS,
        rt1_0.HASH_SHA256,
        rt1_0.SP_INDEX_STATUS,
        rt1_0.RES_LANGUAGE,
        rt1_0.SP_CMPSTR_UNIQ_PRESENT,
        rt1_0.SP_CMPTOKS_PRESENT,
        rt1_0.SP_COORDS_PRESENT,
        rt1_0.SP_DATE_PRESENT,
        rt1_0.SP_NUMBER_PRESENT,
        rt1_0.SP_QUANTITY_NRML_PRESENT,
        rt1_0.SP_QUANTITY_PRESENT,
        rt1_0.SP_STRING_PRESENT,
        rt1_0.SP_TOKEN_PRESENT,
        rt1_0.SP_URI_PRESENT,
        rt1_0.PARTITION_DATE,
        rt1_0.PARTITION_ID,
        rt1_0.RES_PUBLISHED,
        rt1_0.RES_TYPE,
        rt1_0.SEARCH_URL_PRESENT,
        rt1_0.RES_UPDATED,
        rt1_0.RES_VER 
    from
        HFJ_RESOURCE rt1_0 
    where
        rt1_0.RES_ID='2'

[1] SqlQuery at 2024-10-30T14:25:51.363-04:00 took 0ms on Thread: main
SQL:
select
        rht1_0.PID,
        rht1_0.RES_DELETED_AT,
        rht1_0.RES_ENCODING,
        rht1_0.RES_VERSION,
        rht1_0.HAS_TAGS,
        rht1_0.PARTITION_DATE,
        rht1_0.PARTITION_ID,
        mp1_0.RES_VER_PID,
        mp1_0.PARTITION_DATE,
        mp1_0.PARTITION_ID,
        mp1_0.REQUEST_ID,
        mp1_0.RES_PID,
        mp1_0.SOURCE_URI,
        rht1_0.RES_PUBLISHED,
        rht1_0.REQUEST_ID,
        rht1_0.RES_TEXT,
        rht1_0.RES_ID,
        rht1_0.RES_TEXT_VC,
        rht1_0.RES_TYPE,
        rht1_0.RES_VER,
        rht1_0.SOURCE_URI,
        rht1_0.RES_UPDATED 
    from
        HFJ_RES_VER rht1_0 
    left join
        HFJ_RES_VER_PROV mp1_0 
            on rht1_0.PID=mp1_0.RES_VER_PID 
    where
        rht1_0.RES_ID='2' 
        and rht1_0.RES_VER='1'

[2] SqlQuery at 2024-10-30T14:25:51.372-04:00 took 0ms on Thread: main
SQL:
select
        mpt1_0.RES_ID,
        mpt1_0.SP_ID,
        mpt1_0.HASH_IDENTITY,
        mpt1_0.HASH_SYS,
        mpt1_0.HASH_SYS_AND_VALUE,
        mpt1_0.HASH_VALUE,
        mpt1_0.SP_MISSING,
        mpt1_0.SP_NAME,
        mpt1_0.PARTITION_DATE,
        mpt1_0.PARTITION_ID,
        mpt1_0.RES_TYPE,
        mpt1_0.SP_SYSTEM,
        mpt1_0.SP_UPDATED,
        mpt1_0.SP_VALUE 
    from
        HFJ_SPIDX_TOKEN mpt1_0 
    where
        mpt1_0.RES_ID='2'

now - 2 queries: ( first query gets the resource together with its current version, the second query gets the token index)

[0] SqlQuery at 2024-10-30T14:22:37.956-04:00 took 0ms on Thread: main
SQL:
select
        rt1_0.RES_ID,
        rt1_0.RES_DELETED_AT,
        rt1_0.FHIR_ID,
        rt1_0.RES_VERSION,
        rt1_0.SP_HAS_LINKS,
        rt1_0.HAS_TAGS,
        rt1_0.HASH_SHA256,
        rt1_0.SP_INDEX_STATUS,
        rt1_0.RES_LANGUAGE,
        rt1_0.SP_CMPSTR_UNIQ_PRESENT,
        rt1_0.SP_CMPTOKS_PRESENT,
        rt1_0.SP_COORDS_PRESENT,
        rt1_0.SP_DATE_PRESENT,
        rt1_0.SP_NUMBER_PRESENT,
        rt1_0.SP_QUANTITY_NRML_PRESENT,
        rt1_0.SP_QUANTITY_PRESENT,
        rt1_0.SP_STRING_PRESENT,
        rt1_0.SP_TOKEN_PRESENT,
        rt1_0.SP_URI_PRESENT,
        rt1_0.PARTITION_DATE,
        rt1_0.PARTITION_ID,
        rt1_0.RES_PUBLISHED,
        rt1_0.RES_TYPE,
        rt1_0.SEARCH_URL_PRESENT,
        rt1_0.RES_UPDATED,
        rt1_0.RES_VER,
        rht1_0.PID,
        rht1_0.RES_DELETED_AT,
        rht1_0.RES_ENCODING,
        rht1_0.RES_VERSION,
        rht1_0.HAS_TAGS,
        rht1_0.PARTITION_DATE,
        rht1_0.PARTITION_ID,
        mp1_0.RES_VER_PID,
        mp1_0.PARTITION_DATE,
        mp1_0.PARTITION_ID,
        mp1_0.REQUEST_ID,
        mp1_0.RES_PID,
        mp1_0.SOURCE_URI,
        rht1_0.RES_PUBLISHED,
        rht1_0.REQUEST_ID,
        rht1_0.RES_TEXT,
        rht1_0.RES_ID,
        rht1_0.RES_TEXT_VC,
        rht1_0.RES_TYPE,
        rht1_0.RES_VER,
        rht1_0.SOURCE_URI,
        rht1_0.RES_UPDATED 
    from
        HFJ_RESOURCE rt1_0 
    left join
        HFJ_RES_VER rht1_0 
            on rt1_0.RES_VER=rht1_0.RES_VER 
            and rt1_0.RES_ID=rht1_0.RES_ID 
    left join
        HFJ_RES_VER_PROV mp1_0 
            on rht1_0.PID=mp1_0.RES_VER_PID 
    where
        rt1_0.RES_ID in ('2')

[1] SqlQuery at 2024-10-30T14:22:37.969-04:00 took 0ms on Thread: main
SQL:
select
        rt1_0.RES_ID,
        rt1_0.RES_DELETED_AT,
        rt1_0.FHIR_ID,
        rt1_0.RES_VERSION,
        rt1_0.SP_HAS_LINKS,
        rt1_0.HAS_TAGS,
        rt1_0.HASH_SHA256,
        rt1_0.SP_INDEX_STATUS,
        rt1_0.RES_LANGUAGE,
        rt1_0.SP_CMPSTR_UNIQ_PRESENT,
        rt1_0.SP_CMPTOKS_PRESENT,
        rt1_0.SP_COORDS_PRESENT,
        rt1_0.SP_DATE_PRESENT,
        rt1_0.SP_NUMBER_PRESENT,
        rt1_0.SP_QUANTITY_NRML_PRESENT,
        rt1_0.SP_QUANTITY_PRESENT,
        rt1_0.SP_STRING_PRESENT,
        mpt1_0.RES_ID,
        mpt1_0.SP_ID,
        mpt1_0.HASH_IDENTITY,
        mpt1_0.HASH_SYS,
        mpt1_0.HASH_SYS_AND_VALUE,
        mpt1_0.HASH_VALUE,
        mpt1_0.SP_MISSING,
        mpt1_0.SP_NAME,
        mpt1_0.PARTITION_DATE,
        mpt1_0.PARTITION_ID,
        mpt1_0.RES_TYPE,
        mpt1_0.SP_SYSTEM,
        mpt1_0.SP_UPDATED,
        mpt1_0.SP_VALUE,
        rt1_0.SP_TOKEN_PRESENT,
        rt1_0.SP_URI_PRESENT,
        rt1_0.PARTITION_DATE,
        rt1_0.PARTITION_ID,
        rt1_0.RES_PUBLISHED,
        rt1_0.RES_TYPE,
        rt1_0.SEARCH_URL_PRESENT,
        rt1_0.RES_UPDATED,
        rt1_0.RES_VER 
    from
        HFJ_RESOURCE rt1_0 
    left join
        HFJ_SPIDX_TOKEN mpt1_0 
            on rt1_0.RES_ID=mpt1_0.RES_ID 
    where
        rt1_0.RES_ID in ('2')
github-actions[bot] commented 3 weeks ago

Formatting check succeeded!

codecov[bot] commented 3 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Please upload report for BASE (rel_7_6@5fdce4e). Learn more about missing BASE report.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## rel_7_6 #6421 +/- ## ========================================== Coverage ? 83.53% Complexity ? 27770 ========================================== Files ? 1738 Lines ? 107468 Branches ? 13493 ========================================== Hits ? 89775 Misses ? 11905 Partials ? 5788 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.