Open tadgh opened 4 months ago
Modern sql has added the ANY keyword, that is like IN, but more general. It modifies a comparison, and distributes it over either an array or a select. So
select r.id from hfj_resource r
where r.fhir_id in ('force1', 'force2')
is equivalent to
select r.id from hfj_resource r
where r.fhir_id = any ({'force1', 'force2'})
-- note {} is the postgres syntax for literal arrays
In JDBC, these have different bind parameters. The IN cause requires one bind for each value, but for the ANY clause, since the value is a single array, it is a single param.
select r.id from hfj_resource r
where r.fhir_id in (?,?)
requires java like
var stmt = con.prepareStatement("select r.id from hfj_resource r where r.fhir_id in (?,?,?)");
stmt.setString(1,"force1");
stmt.setString(2,"force2");
but the ANY is more like
var sqlArray = con.createArrayOf('varchar2', new String[]{'force1', 'force2'});
var stmt = con.prepareStatement("select r.id from hfj_resource r where r.fhir_id = ANY(?)");
stmt.setArray(1,sqlArray);
See https://www.postgresql.org/docs/8.2/functions-subquery.html#AEN13969
Supported on H2, Oracle, Postgres, Sql Server, and MySql.
Current query chunker breaking where clauses into chunks of 800 (see https://github.com/hapifhir/hapi-fhir/blob/bab282344b627935fd94c245d8f594d6e13c8650/hapi-fhir-jpaserver-base/src/main/java/ca/uhn/fhir/jpa/util/QueryChunker.java#L43-L43) Causes some problems when interacting with Hibernate search.
One specific example is that if you are doing a
?PractitionerRole?_text=abc&sort=practitioner.family
, this will not work, as the pre-fetch of PIDS comes from elastic, which may be >800. Multiple iterations of the sort against the database will sort those individual 800-pid chunks, but then you are left with glueing the chunks back together, and they are no longer sorted, e.g.consider these 2 chunks (imagine our maximum chunk size is 4 instead of 800) Elasticsearch Returns:
[a, f, z, o, i, b, r, k, p, b, v, s]
Broken into 3 chunks:
[a, f, z, o]
[i, b, r, k]
[p, b, v, s]
Then the database sorts them
[a, f, o, z]
[b, i, k, r]
[b, p, s v]
^-- all are in order, but then it combines them back:
[a, f, o, z, b, i, k, r, b, p, s v]
which is not sorted.
If we didnt have this
WHERE ID IN (800 pids)
clause, we would eliminate this class of problem.