If a specific session matches my query and has both imaging and phenotypic data, I expect to see in the unaggregated API results two entries for that subject-session: one ImagingSession and one PhenotypicSession instance.
Current Behavior
When the query is going to a graph that also has the Neurobagel vocabulary in it, a matching session (unexpectedly) has 3 instead of two entries in the results: one ImagingSession, one PhenotypicSession, and one Session instance.
Note that the number of matching subjects, as well as the num_matching_{phenotypic,imaging}_sessions for a particular subject are still calculated correctly. The main problem is that the extra session instance (which also appears as an extra row in a participant-level results TSV from the query tool) gives the impression of 3 different session types, which is not how the data is modeled in the graph.
This happens due to the new class relationships we have in the graph, specifically
nb:ImagingSession a rdfs:Class;
rdfs:subClassOf nb:Session.
nb:PhenotypicSession a rdfs:Class;
rdfs:subClassOf nb:Session.
...
nb:Session a rdfs:Class.
This will return ImagingSession, PhenotypicSession, and Session for session_type, since an instance of the first two is inferred through RDF inference to also be an instance of Session. By default in RDF, each class is a subclass of itself.
Is there an existing issue for this?
Expected Behavior
If a specific session matches my query and has both imaging and phenotypic data, I expect to see in the unaggregated API results two entries for that subject-session: one
ImagingSession
and onePhenotypicSession
instance.Current Behavior
When the query is going to a graph that also has the Neurobagel vocabulary in it, a matching session (unexpectedly) has 3 instead of two entries in the results: one
ImagingSession
, onePhenotypicSession
, and oneSession
instance.Note that the number of matching subjects, as well as the
num_matching_{phenotypic,imaging}_sessions
for a particular subject are still calculated correctly. The main problem is that the extra session instance (which also appears as an extra row in a participant-level results TSV from the query tool) gives the impression of 3 different session types, which is not how the data is modeled in the graph.e.g.,
Error message
No response
Environment
How to reproduce
No response
Anything else?
This happens due to the new class relationships we have in the graph, specifically
from https://github.com/neurobagel/recipes/blob/main/vocab/nb_vocab.ttl.
When we first select for all sessions in the Neurobagel query:
https://github.com/neurobagel/api/blob/d2e090b412c365f169f31c028351fc5457a9cfd5/docs/default_neurobagel_query.rq#L19-L20
This will return
ImagingSession
,PhenotypicSession
, andSession
for session_type, since an instance of the first two is inferred through RDF inference to also be an instance ofSession
. By default in RDF, each class is a subclass of itself.