Improve inefficient SPARQL query template

surchs commented 5 months ago

Our current SPARQL query template has a couple of problems that make it slow:

Unnecessary nesting of OPTIONAL clauses https://github.com/neurobagel/api/blob/1e9ef34d54b89b824c42f1e6293a0ab9d5931c65/app/api/utility.py#L232-L237

and

https://github.com/neurobagel/api/blob/1e9ef34d54b89b824c42f1e6293a0ab9d5931c65/app/api/utility.py#L258-L264 (even worse)

in order of importance

by not explicitly putting the desired session type in the triple pattern, we have to traverse the entire graph to see if there is anything that fits the pattern with nb:hasAcquisition/nb:hasContrastType -> very slow!
the OPTIONAL statement is not necessary. If a subject does not have a phenotypic or imaging session, we don't need to look any further anyway
we have access to ?subject from the outer scope, so no need to restate it here

also: https://github.com/neurobagel/api/blob/1e9ef34d54b89b824c42f1e6293a0ab9d5931c65/app/api/utility.py#L207-L211

is most likely not necessary - it would only help to capture those subjects who do not have any file system path associated and thus would not be datalad gettable.

TODOs:

[x] explicitly state session types for pheno and imaging session in the triple patterns of the sub-queries
[ ] ~~remove superfluous OPTIONAL statements (they are expensive)~~
[ ] ~~remove repeated triple patterns in sub-queries~~

From initial testing, this will let us cut query execution time by an order of magnitude.

surchs commented 5 months ago

Turns out the main thing was really just stating explicitly what session type we want for the imaging session. That cut the query execution time by ~ 90%

surchs commented 5 months ago

:rocket: Issue was released in v0.2.1 :rocket:

neurobagel / api

Improve inefficient SPARQL query template #307