hapifhir / hapi-fhir

🔥 HAPI FHIR - Java API for HL7 FHIR Clients and Servers
http://hapifhir.io
Apache License 2.0
2.05k stars 1.34k forks source link

Spike removal of query chunker by converting to Any(Int[]::array) queries. #6102

Open tadgh opened 4 months ago

tadgh commented 4 months ago

Current query chunker breaking where clauses into chunks of 800 (see https://github.com/hapifhir/hapi-fhir/blob/bab282344b627935fd94c245d8f594d6e13c8650/hapi-fhir-jpaserver-base/src/main/java/ca/uhn/fhir/jpa/util/QueryChunker.java#L43-L43) Causes some problems when interacting with Hibernate search.

One specific example is that if you are doing a ?PractitionerRole?_text=abc&sort=practitioner.family, this will not work, as the pre-fetch of PIDS comes from elastic, which may be >800. Multiple iterations of the sort against the database will sort those individual 800-pid chunks, but then you are left with glueing the chunks back together, and they are no longer sorted, e.g.

consider these 2 chunks (imagine our maximum chunk size is 4 instead of 800) Elasticsearch Returns: [a, f, z, o, i, b, r, k, p, b, v, s]

Broken into 3 chunks: [a, f, z, o] [i, b, r, k] [p, b, v, s]

Then the database sorts them [a, f, o, z] [b, i, k, r] [b, p, s v]

^-- all are in order, but then it combines them back: [a, f, o, z, b, i, k, r, b, p, s v]

which is not sorted.

If we didnt have this WHERE ID IN (800 pids) clause, we would eliminate this class of problem.

michaelabuckley commented 4 months ago

Modern sql has added the ANY keyword, that is like IN, but more general. It modifies a comparison, and distributes it over either an array or a select. So

select r.id from hfj_resource r
where r.fhir_id in ('force1', 'force2')

is equivalent to

select r.id from hfj_resource r
where r.fhir_id = any ({'force1', 'force2'}) 
-- note {} is the postgres syntax for literal arrays

In JDBC, these have different bind parameters. The IN cause requires one bind for each value, but for the ANY clause, since the value is a single array, it is a single param.

select r.id from hfj_resource r
where r.fhir_id in (?,?)

requires java like

var stmt = con.prepareStatement("select r.id from hfj_resource r where r.fhir_id in (?,?,?)");
stmt.setString(1,"force1");
stmt.setString(2,"force2");

but the ANY is more like

var sqlArray = con.createArrayOf('varchar2', new String[]{'force1', 'force2'});
var stmt = con.prepareStatement("select r.id from hfj_resource r where r.fhir_id = ANY(?)");
stmt.setArray(1,sqlArray);

See https://www.postgresql.org/docs/8.2/functions-subquery.html#AEN13969

Supported on H2, Oracle, Postgres, Sql Server, and MySql.