by not explicitly putting the desired session type in the triple pattern, we have to traverse the entire graph to see if there is anything that fits the pattern with nb:hasAcquisition/nb:hasContrastType -> very slow!
the OPTIONAL statement is not necessary. If a subject does not have a phenotypic or imaging session, we don't need to look any further anyway
we have access to ?subject from the outer scope, so no need to restate it here
is most likely not necessary - it would only help to capture those subjects who do not have any file system path associated and thus would not be datalad gettable.
TODOs:
[x] explicitly state session types for pheno and imaging session in the triple patterns of the sub-queries
[ ] remove superfluous OPTIONAL statements (they are expensive)
[ ] remove repeated triple patterns in sub-queries
From initial testing, this will let us cut query execution time by an order of magnitude.
Turns out the main thing was really just stating explicitly what session type we want for the imaging session. That cut the query execution time by ~ 90%
Our current SPARQL query template has a couple of problems that make it slow:
Unnecessary nesting of OPTIONAL clauses https://github.com/neurobagel/api/blob/1e9ef34d54b89b824c42f1e6293a0ab9d5931c65/app/api/utility.py#L232-L237
and
https://github.com/neurobagel/api/blob/1e9ef34d54b89b824c42f1e6293a0ab9d5931c65/app/api/utility.py#L258-L264 (even worse)
in order of importance
nb:hasAcquisition/nb:hasContrastType
-> very slow!OPTIONAL
statement is not necessary. If a subject does not have a phenotypic or imaging session, we don't need to look any further anyway?subject
from the outer scope, so no need to restate it herealso: https://github.com/neurobagel/api/blob/1e9ef34d54b89b824c42f1e6293a0ab9d5931c65/app/api/utility.py#L207-L211
is most likely not necessary - it would only help to capture those subjects who do not have any file system path associated and thus would not be datalad gettable.
TODOs:
remove superfluousOPTIONAL
statements (they are expensive)remove repeated triple patterns in sub-queriesFrom initial testing, this will let us cut query execution time by an order of magnitude.
See also: https://github.com/neurobagel/planning/issues/142