monarch-initiative / monarch-legacy

Monarch web application and API
BSD 3-Clause "New" or "Revised" License
42 stars 37 forks source link

Scientific vs Common names for species in site search page #988

Open jmcmurry opened 9 years ago

jmcmurry commented 9 years ago

Just realized that in site search here: http://beta.monarchinitiative.org/search/pax2 we don't seem to be consistent about use of common vs scientific names and just curious as to why.

Gallus gallus is represented as chicken, but Bos taurus not represented as Cattle. Not a big deal, but just flagging it for consideration because while Bos taurus is common enough the more obscure species we get in, the less likely that the scientific name will be immediately recognizable. screen shot 2015-10-08 at 12 28 49

@kshefchek do you know?

kshefchek commented 9 years ago

Right now we take the first listed common name if available in scigraph, if it's not availabe we use the scientific name.

jmcmurry commented 9 years ago

Per Kent, this scigraph query pulls from ncbigene.ttl and monarch.owl. I see that the latter depends on bbop taxonomy slims which is probably the source that would need to be updated to get a complete set of common names. However, I don't know who the right person would be to take this on or indeed whether taxslim.owl has a specific purview that would make additions undesirable. There are over 1800 common names in there, but not sure what inclusion criteria were. Missing items include but are not limited to:

@cmungall?

balhoff commented 8 years ago

I just found this issue, I think it might include #1224 which I recently submitted. I think all the scientific names look quite weird in their current capitalization. The BBOP taxonomy slim doesn't have these capitalizations, so is Monarch app transforming them? For example 'homo sapiens' should be 'Homo sapiens', 'Fugu Rubripes' should be 'Fugu rubripes' (buy why is human lowercased?)

And chimp is displaying a related synonym misspelling as its label for some reason. 'Chimpansee Troglodytes' should be 'Pan troglodytes'. 'Bos bovis' is also a synonym, not the valid name ('Bos taurus').

To distinguish scientific names it would be nice if genus and species were in italics.

cmungall commented 8 years ago

minor terminological point - there is no such thing as a bbop taxonomy slim, the purl is http://purl.obolibrary.org/obo/ncbitaxon/subsets/taxslim.owl (it just so happens that this is being served off the central OBO filestore, which just happens to be hosted at berkeley).

taxslim is just a subset of NCBITaxon, there should no transformations. Both taxslim and NCBITaxon should be using the correct capitalization for binomals (Homo sapiens), and generally lowercase is used for synonyms, e.g. 'human'

balhoff commented 8 years ago

@cmungall thanks, I got that from a previous comment and didn't really think about what that file was.

It is odd that the labels being displayed seem biased toward synonyms rather than labels. In addition to Bos and Pan, 'Canis domesticus' I think should always be 'Canis familiaris'. I wonder if these are being grabbed in the search for a common name.

balhoff commented 8 years ago

I guess this is the problem: https://github.com/monarch-initiative/monarch-app/blob/master/lib/monarch/api.js#L6116

If there are any synonyms it takes the first synonym in place of the label. But it doesn't look like there is a way to distinguish common names from taxonomic synonyms/misspellings. These are marked by annotations on annotations in the OWL. Do these not have a place in Scigraph? It has always seemed a little convoluted to me anyway; maybe we should make subproperties.

cmungall commented 8 years ago

We don't have axiom annotations for literals in SciGraph

The convolutedness is as a result of the mapping to OBO: http://owlcollab.github.io/oboformat/doc/obo-syntax.html#5.6

We can always define a mapping back to different properties when going into neo4j.