hail-is / hail

Cloud-native genomic dataframes and batch computing
https://hail.is
MIT License
975 stars 243 forks source link

[batch] Optimize SQL query generated for listing jobs / batches / job groups #14410

Open jigold opened 6 months ago

jigold commented 6 months ago

What happened?

The current query code uses EXISTS and IN to check whether a batch / job / job group has a particular attribute. This is extremely slow especially for queries that have a lot of matches. The fix for this is to use a LEFT JOIN LATERAL. However, by adding a new optional table join, I believe the code has to be more complicated than it currently is because there's a second WHERE clause. Note that the fix needs to be applied to both v1 and v2 queries.

See https://github.com/hail-is/hail/pull/14282/files#r1490076943 for more discussion.

Version

0.2.128

Relevant log output

No response

jigold commented 6 months ago

These are examples of current usages. This list is not exhaustive.

https://github.com/hail-is/hail/blob/main/batch/batch/front_end/query/query_v1.py#L30-L44

https://github.com/hail-is/hail/blob/main/batch/batch/front_end/query/query.py#L117-L138 https://github.com/hail-is/hail/blob/main/batch/batch/front_end/query/query.py#L161-L182 https://github.com/hail-is/hail/blob/main/batch/batch/front_end/query/query.py#L185-L209

daniel-goldstein commented 2 months ago

@ehigham this might be relevant to your UI search inquiry