Open jmcmurry opened 9 years ago
Right now we take the first listed common name if available in scigraph, if it's not availabe we use the scientific name.
Per Kent, this scigraph query pulls from ncbigene.ttl and monarch.owl. I see that the latter depends on bbop taxonomy slims which is probably the source that would need to be updated to get a complete set of common names. However, I don't know who the right person would be to take this on or indeed whether taxslim.owl has a specific purview that would make additions undesirable. There are over 1800 common names in there, but not sure what inclusion criteria were. Missing items include but are not limited to:
@cmungall?
I just found this issue, I think it might include #1224 which I recently submitted. I think all the scientific names look quite weird in their current capitalization. The BBOP taxonomy slim doesn't have these capitalizations, so is Monarch app transforming them? For example 'homo sapiens' should be 'Homo sapiens', 'Fugu Rubripes' should be 'Fugu rubripes' (buy why is human lowercased?)
And chimp is displaying a related synonym misspelling as its label for some reason. 'Chimpansee Troglodytes' should be 'Pan troglodytes'. 'Bos bovis' is also a synonym, not the valid name ('Bos taurus').
To distinguish scientific names it would be nice if genus and species were in italics.
minor terminological point - there is no such thing as a bbop taxonomy slim, the purl is http://purl.obolibrary.org/obo/ncbitaxon/subsets/taxslim.owl (it just so happens that this is being served off the central OBO filestore, which just happens to be hosted at berkeley).
taxslim is just a subset of NCBITaxon, there should no transformations. Both taxslim and NCBITaxon should be using the correct capitalization for binomals (Homo sapiens), and generally lowercase is used for synonyms, e.g. 'human'
@cmungall thanks, I got that from a previous comment and didn't really think about what that file was.
It is odd that the labels being displayed seem biased toward synonyms rather than labels. In addition to Bos and Pan, 'Canis domesticus' I think should always be 'Canis familiaris'. I wonder if these are being grabbed in the search for a common name.
I guess this is the problem: https://github.com/monarch-initiative/monarch-app/blob/master/lib/monarch/api.js#L6116
If there are any synonyms it takes the first synonym in place of the label. But it doesn't look like there is a way to distinguish common names from taxonomic synonyms/misspellings. These are marked by annotations on annotations in the OWL. Do these not have a place in Scigraph? It has always seemed a little convoluted to me anyway; maybe we should make subproperties.
We don't have axiom annotations for literals in SciGraph
The convolutedness is as a result of the mapping to OBO: http://owlcollab.github.io/oboformat/doc/obo-syntax.html#5.6
We can always define a mapping back to different properties when going into neo4j.
Just realized that in site search here: http://beta.monarchinitiative.org/search/pax2 we don't seem to be consistent about use of common vs scientific names and just curious as to why.
Gallus gallus is represented as chicken, but Bos taurus not represented as Cattle. Not a big deal, but just flagging it for consideration because while Bos taurus is common enough the more obscure species we get in, the less likely that the scientific name will be immediately recognizable.
@kshefchek do you know?