neurobagel / old-query-tool

User interface for searching across Neurobagel graph
https://query.neurobagel.org/
MIT License
2 stars 0 forks source link

Investigate having human-readable labels instead of term URIs in the output .tsvs #121

Closed alyssadai closed 1 year ago

alyssadai commented 1 year ago

Why

As a user, when I download metadata of my results, I would like the discrete values in the table to be human readable (i.e. "Parkinson's disease" rather than "snomed:49049000"), so that I can directly us the data in a script and don't have to go and look up what these things mean like a machine.

What

We should figure out what the best way is to make this happen. Here are some unsorted ideas:

Context

Currently, the subject-level tsv generated by the query tool still contains URIs instead of human-readable labels, unlike the reference output examples provided in the documentation https://github.com/neurobagel/documentation/wiki/Query-Tool#example-data

I definitely remember us discussing previously that we wanted to implement human-readable labels in the outputs, eventually having them in the graph in the first place (https://github.com/neurobagel/project/issues/47) but for now using the "hard-coded mapping of controlled term IRI to human-readable label that already exists in the query tool" (see https://github.com/neurobagel/query-tool/issues/76).

I do think these human-readable labels will be much more user-friendly/compelling as well as useful for verifying the downloaded results of a query, and may be worth prioritizing for our upcoming demos depending on the amount of work needed.

See related:

Conclusion / Outcome

The query tool should continue handling / being aware of the unique termIRIs internally, but should return human readable labels to the user (optionally: in addition to the unique termIRIs). In order for this to be possible, a couple of things need to happen:

Related but not the same problem:

alyssadai commented 1 year ago

@rmanaem, any thoughts on how difficult this would be to implement? I'm guessing you'll need to fetch the list of diagnoses/assessments/etc. in the new OpenNeuro graph manually...

rmanaem commented 1 year ago

Shouldn't be too difficult since we have the labels hardcoded. What's interesting to me is that this slipped by and we forgot about it.

rmanaem commented 1 year ago

Shouldn't be too difficult since we have the labels hardcoded.

Scratch that It turns out we need some form of context to map URIs to their corresponding human-readable label as we hardcoded categorical options using prefixes and the response from API contains the full URI.

rmanaem commented 1 year ago

Unblocked as it requires discussion to figure out implementation requisites.

alyssadai commented 1 year ago

Related to https://github.com/neurobagel/api/issues/37

surchs commented 1 year ago

@alyssadai mind taking a look at the current spec and see if anything is missing here?

alyssadai commented 1 year ago

Hey @surchs, the description generally makes sense to me.

Combine this work with changing what the API returns. E.g. currently the first query already obtains a massive (uncompressed) JSON blob with all the metadata. Maybe we only need this when an actual download is triggered? Then the querying of the terms could happen (by the API?)

I'm not entirely sure I understand what you mean here. Could you elaborate?

surchs commented 1 year ago

I'm not entirely sure I understand what you mean here. Could you elaborate?

We're going to change query_tool <> API interaction in these ways already:

so I think we might as well add one more aspect and that is:

The reason for this last part is that currently the roundtrip from query -> API -> graph -> API -> query is very slow. And the last step (API -> query) is about half of that, maybe even more, because for every query the API returns all the results that match at the participant level with all the available metadata. So it's a huge JSON blob. But on the query tool side we only really look at the dataset-level summaries until the user actually decides to download any metadata.

So it would be reasonable to say:

That would make the whole process a good bit faster as the final JSON blob being sent back by the API would likely also be much smaller.

surchs commented 1 year ago

idea -> start with cogatlas because they have an API

surchs commented 1 year ago

Blocked by Seb's lack of availability

edit: unblocked again by better understanding of scope 🤷

surchs commented 1 year ago

@alyssadai check "Conclusion" in the issue spec for a list of tasks that relate to new API term endpoints. Please edit and / or close the issue if you agree

alyssadai commented 1 year ago

@surchs since the implementation for this issue depends on larger architectural decisions for the ecosystem (e.g., as multiple tools/steps need human-readable labels), the conclusion points from the description have been absorbed into this larger issue https://github.com/neurobagel/project/issues/47. Will close this one and continue the conversation/create new issues from there.