neurobagel / api

https://api.neurobagel.org/
MIT License
4 stars 3 forks source link

Graph response formatting fails when no pipeline data is in the database #367

Open alyssadai opened 3 hours ago

alyssadai commented 3 hours ago

Is there an existing issue for this?

Expected Behavior

No response

Current Behavior

When submitting any cohort query to an n-API v0.4.0

the query results in an Internal server error.

The same issue does not occur if the n-API is run in aggregated mode, or for a graph database containing at least one subject with pipeline metadata.

Error message

Error in the n-API container logs:

TypeError: DataFrame.reset_index() got an unexpected keyword argument 'name'

This error is misleading, as it gives the impression that the relevant section of code:

https://github.com/neurobagel/api/blob/99137363af22e2b81e212890d9cec291502cfd08/app/api/crud.py#L222-L232

is using a non-existent or deprecated argument / has some inherent syntax error.

However, what's actually happening is that the code assumes reset_index() is operating on a pd.Series (which DOES have the name argument). But something is going wrong in the logic for session_completed_pipeline_data such that it's producing a pd.DataFrame instead (which DOESN'T have the name argument for reset_index()).

Source of problem

Earlier in the code, when pipeline_grouped_data is constructed: https://github.com/neurobagel/api/blob/99137363af22e2b81e212890d9cec291502cfd08/app/api/crud.py#L202-L220

we are dropping NaNs during the groupby, meaning that when there are no pipeline names in the data, we get an empty dataframe like:

Empty DataFrame
Columns: [sub_id, session_id, session_type, pipeline_name, pipeline_version]
Index: []

as a result, when we then try to run groupby again on this object to construct session_completed_pipeline_data, that has no effect and still returns a pd.DataFrame, causing the unexpected keyword error when we then try to run reset_index() on it.

If we instead set dropna=False in the groupby when constructing pipeline_grouped_data, there is no longer an error, but the resulting completed_pipelines field for single subject-session looks like this in the response:

        "completed_pipelines": {
          "null": []
        }

Environment

How to reproduce

No response

Anything else?

Some considerations

Why this wasn't caught by our tests

To avoid similar issues

alyssadai commented 2 hours ago

Also possibly related to https://github.com/neurobagel/api/issues/303