neurobagel / api

https://api.neurobagel.org/
MIT License
4 stars 3 forks source link

SPARQL query matches same session too many times due to subclass inferencing #374

Closed alyssadai closed 1 day ago

alyssadai commented 3 days ago

Is there an existing issue for this?

Expected Behavior

If a specific session matches my query and has both imaging and phenotypic data, I expect to see in the unaggregated API results two entries for that subject-session: one ImagingSession and one PhenotypicSession instance.

Current Behavior

When the query is going to a graph that also has the Neurobagel vocabulary in it, a matching session (unexpectedly) has 3 instead of two entries in the results: one ImagingSession, one PhenotypicSession, and one Session instance.

Note that the number of matching subjects, as well as the num_matching_{phenotypic,imaging}_sessions for a particular subject are still calculated correctly. The main problem is that the extra session instance (which also appears as an extra row in a participant-level results TSV from the query tool) gives the impression of 3 different session types, which is not how the data is modeled in the graph.

e.g.,

    ...
    "subject_data": [
      {
        "sub_id": "sub-01",
        "session_id": "ses-01",
        "num_matching_phenotypic_sessions": 2,
        "num_matching_imaging_sessions": 2,
        "session_type": "http://neurobagel.org/vocab/ImagingSession",
        "age": null,
        "sex": null,
        "diagnosis": [
          null
        ],
        "subject_group": null,
        "assessment": [
          null
        ],
        "image_modal": [
          "http://purl.org/nidash/nidm#T1Weighted",
          "http://purl.org/nidash/nidm#FlowWeighted"
        ],
        "session_file_path": "/data/neurobagel/bagel-cli/bids-examples/synthetic/sub-01/ses-01",
        "completed_pipelines": {
          "https://github.com/nipoppy/pipeline-catalog/tree/main/processing/fmriprep": [
            "23.1.3"
          ],
          "https://github.com/nipoppy/pipeline-catalog/tree/main/processing/freesurfer": [
            "7.3.2"
          ]
        }
      },
      {
        "sub_id": "sub-01",
        "session_id": "ses-01",
        "num_matching_phenotypic_sessions": 2,
        "num_matching_imaging_sessions": 2,
        "session_type": "http://neurobagel.org/vocab/PhenotypicSession",
        "age": 34.1,
        "sex": "http://purl.bioontology.org/ontology/SNOMEDCT/248152002",
        "diagnosis": [
          null
        ],
        "subject_group": "http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C94342",
        "assessment": [
          "https://www.cognitiveatlas.org/task/id/trm_57964b8a66aed",
          "https://www.cognitiveatlas.org/task/id/tsk_4a57abb949ece"
        ],
        "image_modal": [
          null
        ],
        "session_file_path": null,
        "completed_pipelines": {}
      },
      {
        "sub_id": "sub-01",
        "session_id": "ses-01",
        "num_matching_phenotypic_sessions": 2,
        "num_matching_imaging_sessions": 2,
        "session_type": "http://neurobagel.org/vocab/Session",
        "age": 34.1,
        "sex": "http://purl.bioontology.org/ontology/SNOMEDCT/248152002",
        "diagnosis": [
          null
        ],
        "subject_group": "http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C94342",
        "assessment": [
          "https://www.cognitiveatlas.org/task/id/trm_57964b8a66aed",
          "https://www.cognitiveatlas.org/task/id/tsk_4a57abb949ece",
          null
        ],
        "image_modal": [
          null,
          "http://purl.org/nidash/nidm#T1Weighted",
          "http://purl.org/nidash/nidm#FlowWeighted"
        ],
        "session_file_path": "/data/neurobagel/bagel-cli/bids-examples/synthetic/sub-01/ses-01",
        "completed_pipelines": {
          "https://github.com/nipoppy/pipeline-catalog/tree/main/processing/fmriprep": [
            "23.1.3"
          ],
          "https://github.com/nipoppy/pipeline-catalog/tree/main/processing/freesurfer": [
            "7.3.2"
          ]
        }
      },
      ...

Error message

No response

Environment

How to reproduce

No response

Anything else?

This happens due to the new class relationships we have in the graph, specifically

nb:ImagingSession a rdfs:Class;
    rdfs:subClassOf nb:Session.

nb:PhenotypicSession a rdfs:Class;
    rdfs:subClassOf nb:Session.

...

nb:Session a rdfs:Class.

from https://github.com/neurobagel/recipes/blob/main/vocab/nb_vocab.ttl.

When we first select for all sessions in the Neurobagel query:

https://github.com/neurobagel/api/blob/d2e090b412c365f169f31c028351fc5457a9cfd5/docs/default_neurobagel_query.rq#L19-L20

This will return ImagingSession, PhenotypicSession, and Session for session_type, since an instance of the first two is inferred through RDF inference to also be an instance of Session. By default in RDF, each class is a subclass of itself.

neurobagel-bot[bot] commented 1 day ago

:rocket: Issue was released in v0.4.2 :rocket: