Open vincerubinetti opened 2 years ago
It's not exactly what you're asking for, but would a facet structure like this work?:
"facet_counts": {
"category": {
"disease": 27,
"publication": 9,
"anatomical entity": 5,
"cell": 5,
"gene": 2,
"sequence feature": 2,
"phenotype": 1,
"quality": 1
},
"taxon": {
"NCBITaxon:9031": 1,
"NCBITaxon:9606": 1
},
"taxon_label": {
"Gallus gallus": 1,
"Homo sapiens": 1
},
"_taxon_map": {
"NCBITaxon:9031": {
"Gallus gallus": 1
},
"NCBITaxon:9606": {
"Homo sapiens": 1
}
}
}
Two things are different here: 1) there's a new taxon
facet that groups results by taxon ID, and 2) there's a _taxon_map
entry in facet_counts
that groups first by taxon ID, then by taxon label, with the value being the count of both that ID and label. AFAIK there should be a one-to-one mapping between ID and label, so there'll always just be one child of the ID node, but just in case there isn't this structure will still work.
If so, I have this implemented in my fork of the ontobio library -- here's where the _taxon_map
key is injected into the facet counts: https://github.com/falquaddoomi/ontobio/blob/92231d447a/ontobio/golr/golr_query.py#L603. I assume we'll have to figure out who downstream might be affected by this...maybe the best way is to submit a PR?
That's fine with me. If this is easier to implement or more consistent with how other things and data structures in biolink are implmented, I'd say go for it.
Is the main reason you chose that structure because it supports 1 to many id to label mappings Faisal?
I don't believe that will be the case as we have chosen the NCBI id/label pair for a taxon.
If what I say is true I think the most explicit and easily readable structure would be an object for each with clear attributes.
"_taxon_map": [{ "label": "Gallus gallus", "id": "NCBITaxon:9031", "count": 1 } ]
But is a list of objects going to cause even more issues in this case @vincerubinetti ?
I formatted it that way partly because I wasn't sure if there might be more than one label that matches a given taxon ID, and also because that structure kind of more closely matches how facet pivots are returned from Solr. If IDs and labels are in fact one-to-one I agree that the structure you proposed is more readable, and it's a trivial change on my end.
Let me do some research and see if I can confirm 1to1. The typical return type from solr was something I wasn't sure of and standardizing to that might be of more value than the clarity of my proposed structure.
I'm developing the 3.0 version of the monarch ui/website, and I've run into a limitation. @putmantime
Here is an example response from the
/search/entity/{term}
endpoint, searching "ssh":Notice that
taxon_label
is being returned for facets, instead oftaxon
(id). This is nice for displaying a list of taxon facets, but not for actually filtering by them, because the endpoint only supports filtering bytaxon
(id), nottaxon_label
.This requires the frontend to make a hard-coded label to id mapping for taxons. This duplicates information that we already have in biolink, is brittle, and is likely to get out of sync.
And yes, I can look up
taxon
fromdocs
by finding the correspondingtaxon_label
field. However, then I would need to make sure all results are indocs
so I have all the mappings, and that might go beyond the maxrows
[per page] param.Possible solutions:
Support a
taxon_label
filter parameter (in addition to thetaxon
parameter) in the search endpoint. I guess this would be most useful if it was an exact match, rather than a fuzzy match. If there are multiple taxon ids that map to the same exact taxon label, then this option wouldn't be viable.Return an additional
taxon
field infacet_counts
with all the information I need:id
,label
, andcount
. This would leave thetaxon_label
facet untouched so current applications using biolink don't suddenly break.Have some kind of
taxon_map
field at the top level of the response so I can go from label to id easily. Though, I think this is pretty ugly... don't want to add a top level thing for a special exception for just one type of facet.