Closed falquaddoomi closed 2 years ago
@falquaddoomi I just realized there's probably no where this change is deployed where I can test it. What I was going to do is take this list of example taxon labels...
"Sus scrofa",
"Drosophila melanogaster",
"Homo sapiens",
"Mus musculus",
"Bos taurus",
"Saccharomyces cerevisiae S288C",
"Xenopus tropicalis",
"Danio rerio",
"Gallus gallus",
"Anolis carolinensis",
"Canis lupus familiaris",
"Felis catus",
"Macaca mulatta",
"Monodelphis domestica",
"Ornithorhynchus anatinus",
"Pan troglodytes",
"Rattus norvegicus",
"Takifugu rubripes",
"Equus caball",
... and make sure they can map to ids, and then back to labels again, without any "loss" (id mapping to multiple labels or vice versa, or failing to find a match). Would you be able to test this locally?
Hey @vincerubinetti, sure, I can test that locally. I'll also make a test case out of it.
Also, for future situations like this, I'm currently writing a script to deploy a temporary "preview" VM with the upcoming biolink-api version running on it. I'll see if I can integrate it into the PR process so it can be used for testing. (The VM will be marked preemtible, so it'll be both low-cost and will be terminated after at most 24 hours.)
Just tested it for the first 5 examples. It worked for all of them except for "Drosophila melanogaste" where it returned nothing. But its ID, "NCBITaxon:7227", does return the label in the labeler endpoint. The hard coded mapping in UI 2.0 and 3.0 also happens to contain it.
I'm guessing this is some deeper (data quality?) issue, rather than something in this PR? If so, not sure what to do here.
Not sure if it is as simple as this, but "Drosophila melanogaste" has an "R" at the end: "Drosophila melanogaster"
So, I coded up the test you proposed and there were a few issues:
/ontol/identifier/
endpoint maps "Felis catus" to NCBITaxon:9685
, but /ontol/labeler/
(the old ID-to-label endpoint) maps that ID to "cat". Specifically, there are three examples in the fixed label list that don't map to the labeler:
Fortunately querying /ontol/identifier/
for 'cat', does produce NCBITaxon:9685
, but it's not the first element in the list. I should probably amend /onto/labeler
to return all the possible labels and not just the first one, since the ordering is apparently arbitrary.
@seandavi That was the problem, I didn't copy it properly.
Apologies yes, "caballus" was a typo. FWIW this list is the taxon facets returned from searching for "SSH":
I'm not sure what to do about the other problems though :/ @putmantime ?
Yeah, I don't know...well, the good news is that all of these results are produced from the same set of (ID, label) pairs, so if you get a label or an ID back from one endpoint you're guaranteed to get a result when querying the other endpoint. It might not be the result you expect (especially because /ontol/labeler/
just returns the first element, so they're definitionally not reversible functions), but you're guaranteed to get something.
Perhaps the solution is to simply be returning a "taxon id" facet instead of a "taxon label" facet. That seems to be more unambiguous. With the ids I can then just make use of the labeler endpoint to get nice human readable labels to display to the user. For this fix, we'd want to make sure we do this in ALL of the cases where biolink is returning taxon label facets. So far the only places I've seen this is the search endpoint and all of the association endpoints, but I bet there are more.
We need input from @putmantime or someone from the tislab before we can continue.
A small addendum to this:
I'm in the process of incorporating this into the frontend, and I noticed that the search endpoints that take a taxon
filter only seem to work with NCBITaxon
ids, and not with OMIM
and etc.
As such, you may want to make the NCBITaxon
ids always show up first in the list of matches? I've made the frontend prefer them, but maybe it'd be good to put that in the backend too for anyone using the endpoint directly.
Also, perhaps it's time to delete the vestigial _taxon_map
facet? At the moment I'm just explicitly deleting it from the facets.
This PR adds a new endpoint,
/ontol/identifier/
, that accepts a list of labels and produces matching IDs. Specifically, it queries for the label side of the<label> rdfs:label <label>
relation, much like the existing/ontol/labeler/
except over labels and not IDs. All matching results are returned for a given label, unlike/onto/labeler/
which returns just the first result. The results are returned in the following format:A list of labels can be supplied as the
label
parameter, via either GET (as a querystring param, e.g.?label=<first>&label=<second>...
) or POST (as a querystring param or in a JSON-encoded body like{'label': [<first>, <second>, ...]}
.This PR requires ontobio to be up to commit https://github.com/monarch-initiative/ontobio/commit/91222c8b442196d6eeeafeb6073946494e8a3a10. (I'll issue individual PRs to the main ontobio repo once the short-term UI needs are settled.)
Closes issue #391.