Open bradfordcondon opened 6 years ago
@mestato see https://github.com/statonlab/tripal_curator/blob/master/docs/Edit_by_CV.md
there's only 11 other terms, so i can just go ahead and change them if you want. Although maybe you should try one to give me feedback on the tool?
I converted biomaterials using tripal:temperature to PATO's temperature term:
http://www.ontobee.org/ontology/PATO?iri=http://purl.obolibrary.org/obo/PATO_0000146
nb i edited on the DEV SITE. so: https://hardwoods.ag.utk.edu/admin/tripal/extension/tripal_curator
terms still using "biomaterial property" CV: at https://hardwoods.ag.utk.edu/admin/tripal/extension/tripal_curator/CV_usage/42
need to pick PATO terms for ...
/remind me in 24 hours
Oops! I used the wrong ontology
https://hardwoodgenomics.org/cv/lookup/PO THIS is the plant trait ontology
@bradfordcondon set a reminder for Aug 15th 2018
TO's temperature = PATO's temperature
A physical quality of the thermal energy of a system. [ PATOC:GVG ]
plant experimental condition PECO:0007359
http://www.ontobee.org/ontology/PATO?iri=http://purl.obolibrary.org/obo/PATO_0000011
no word for cultivar or variety. need to browse.
i know meg found one, i cant find it....
maturity?
A quality of a single physical entity which is held by a bearer when the latter exhibits complete growth, differentiation, or development. [ Merriam-Webster:Merriam-Webster ]
A spatial quality inhering in a bearer by virtue of the bearer's spatial location relative to other objects in the vicinity. [ PATOC:GVG ]
the TO uses this PATO term.
I like the EDAM geographic location more.
:wave: @bradfordcondon,
This is on hold until i load the plant trait ontology in satisfactorily. We may decide to not use terms from this ontology if they wont look nice in a browser.
hmmm i THINK the loaders are all nice and fixed now so we could reapproach this.
That said, as part of https://github.com/NAL-i5K/tripal_eutils we were looking at this... we should rethink mapping the properties. As we've talked about on that project, it would be ideal if ncbi were the ones to map their properties to ontology terms. Will they?
A few resources specifically for biosample records:
NCBI kind of has their own controlled vocab:
Updates to BioSamples database at European Bioinformatics Institute
No idea if this is helpful but at least we aren't the only people who've noticed this problem: Metadata in the BioSample Online Repository are Impaired by Numerous Anomalies https://arxiv.org/abs/1708.01286
Thanks for the thoughtful reply
I actually do use that: the eutils module imports it into a "ncbi biosample" CV.
However there is no versioning, no relationships ( this is a CV not an ontology, right?).
Ideally, NCBI would provide an OBO that has all these terms and perhaps says that each term is_a some term in the EFO or some other ontology. perhaps thats what EBI's mapping does, ill have a look.
from https://arxiv.org/abs/1708.01286
The BioSample metadata field names and their values are not
standardized or controlled—15% of the metadata fields use field names not specified in the BioSample data dictionary. Only 9 out of 452 BioSample-specified
fields ordinarily require ontology terms as values, and the quality of these controlled fields is better than that of uncontrolled ones, as even simple binary or
numeric fields are often populated with inadequate values of different data types
(e.g., only 27% of Boolean values are valid)
yes! exactly.
ok, so EBI biosample attributes look like this:
https://www.ebi.ac.uk/biosamples/samples/SAMN02953603
<Property class="sex" characteristic="false" comment="false" type="STRING">
<QualifiedValue>
<Value>female</Value>
<TermSourceREF><Name/>
<TermSourceID>http://purl.obolibrary.org/obo/PATO_0000383</TermSourceID>
</TermSourceREF>
</QualifiedValue>
</Property>
<Property class="sub species" characteristic="false" comment="false" type="STRING">
<QualifiedValue>
<Value>familiaris</Value>
</QualifiedValue>
</Property>
here it is in genbank:
<Attributes>
<Attribute attribute_name="sex" harmonized_name="sex" display_name="sex">female</Attribute>
<Attribute attribute_name="sub-species" harmonized_name="sub_species" display_name="sub species">familiaris
</Attribute>
<Attribute attribute_name="breed" harmonized_name="breed" display_name="breed">boxer</Attribute>
</Attributes>
so you get this ontologyTerms key, if applicable. Pretty cool honestly. Will NCBI ever support something like this? I dont know. If htey did adopt this it would break our importers because it changes the XML structure (yayyyy)
edit: also the term appears to be for the VALUE not for the TYPE. http://www.ontobee.org/ontology/PATO?iri=http://purl.obolibrary.org/obo/PATO_0000383 its linked to FEMALE not sex . You can also link a term to the type. So thats pretty awesome.
I'm left not knowing a) who decides how these terms are mapped b) if NCBI cares
honestly the ontology tagging looks totally unstructured check out the user guide https://www.ebi.ac.uk/biosamples/docs/cookbook/curate_sample.html
Wow, thats suprisingly not that helpful. I mean its helpful if someone has already selected a term for the value, but how often is that the case? Its not required, right?
Type terms seem like the logical place to start, especially for unifying information across databases. The NCBI non-hierarchical vocab for biosample has 615 terms by my count. I wonder if we divided them up among everyone in the lab how long it would take to assign terms.... and how reasonable the results would be. And then if anyone would actually use them other than us.
The NCBI non-hierarchical vocab for biosample has 615 terms by my count.
I think you could start by picking a single "package". think you can get away with the base package and its only ~18 terms. Thats what (i think) the majority of the end users end up using. its why teh example data i built eutils with only uses the same set of 20 terms.
I think taht offering suggested mapping for the base terms would be a nice contribution for the chado mapping paper too....
here is my table for the vanilla plant submission terms for the NCBI.
https://docs.google.com/spreadsheets/d/1uO2Pu4Kh_pcyHfbeAGr72zZp9JWtcD667lw9LP3B8Is/edit#gid=0
I am trying to limit myself to, in order of preference, SIO/EFO, PO/ TO ontologies. If theres not a direct match, its not easy work.
I think I already did some work on this in another issue so im going to pause before i do too much more. Basically, doing this is a total PIA. Some of the terms are too ambiguous. Perhaps looking through synonyms might help...
Pilot project by @bradfordcondon
on dev...
[x] load PTO
[x] install tripal curator
[x] Convert a property to a "good" ontology
Once this is done, we can discuss converting all properties on live.