Closed alyssadai closed 1 year ago
@rmanaem, any thoughts on how difficult this would be to implement? I'm guessing you'll need to fetch the list of diagnoses/assessments/etc. in the new OpenNeuro graph manually...
Shouldn't be too difficult since we have the labels hardcoded. What's interesting to me is that this slipped by and we forgot about it.
Shouldn't be too difficult since we have the labels hardcoded.
Scratch that It turns out we need some form of context to map URIs to their corresponding human-readable label as we hardcoded categorical options using prefixes and the response from API contains the full URI.
Unblocked as it requires discussion to figure out implementation requisites.
Related to https://github.com/neurobagel/api/issues/37
@alyssadai mind taking a look at the current spec and see if anything is missing here?
Hey @surchs, the description generally makes sense to me.
Combine this work with changing what the API returns. E.g. currently the first query already obtains a massive (uncompressed) JSON blob with all the metadata. Maybe we only need this when an actual download is triggered? Then the querying of the terms could happen (by the API?)
I'm not entirely sure I understand what you mean here. Could you elaborate?
I'm not entirely sure I understand what you mean here. Could you elaborate?
We're going to change query_tool <> API interaction in these ways already:
so I think we might as well add one more aspect and that is:
The reason for this last part is that currently the roundtrip from query -> API -> graph -> API -> query is very slow. And the last step (API -> query) is about half of that, maybe even more, because for every query the API returns all the results that match at the participant level with all the available metadata. So it's a huge JSON blob. But on the query tool side we only really look at the dataset-level summaries until the user actually decides to download any metadata.
So it would be reasonable to say:
That would make the whole process a good bit faster as the final JSON blob being sent back by the API would likely also be much smaller.
idea -> start with cogatlas because they have an API
Blocked by Seb's lack of availability
edit: unblocked again by better understanding of scope 🤷
@alyssadai check "Conclusion" in the issue spec for a list of tasks that relate to new API term endpoints. Please edit and / or close the issue if you agree
@surchs since the implementation for this issue depends on larger architectural decisions for the ecosystem (e.g., as multiple tools/steps need human-readable labels), the conclusion points from the description have been absorbed into this larger issue https://github.com/neurobagel/project/issues/47. Will close this one and continue the conversation/create new issues from there.
Why
As a user, when I download metadata of my results, I would like the discrete values in the table to be human readable (i.e. "Parkinson's disease" rather than "snomed:49049000"), so that I can directly us the data in a script and don't have to go and look up what these things mean like a machine.
What
We should figure out what the best way is to make this happen. Here are some unsorted ideas:
rdfs:label
or something to add human label. Then return only the human readable labels from the APIContext
Currently, the subject-level tsv generated by the query tool still contains URIs instead of human-readable labels, unlike the reference output examples provided in the documentation https://github.com/neurobagel/documentation/wiki/Query-Tool#example-data
I definitely remember us discussing previously that we wanted to implement human-readable labels in the outputs, eventually having them in the graph in the first place (https://github.com/neurobagel/project/issues/47) but for now using the "hard-coded mapping of controlled term IRI to human-readable label that already exists in the query tool" (see https://github.com/neurobagel/query-tool/issues/76).
I do think these human-readable labels will be much more user-friendly/compelling as well as useful for verifying the downloaded results of a query, and may be worth prioritizing for our upcoming demos depending on the amount of work needed.
See related:
152 because the examples should update automatically
Conclusion / Outcome
The query tool should continue handling / being aware of the unique termIRIs internally, but should return human readable labels to the user (optionally: in addition to the unique termIRIs). In order for this to be possible, a couple of things need to happen:
Related but not the same problem: