Closed bradfordcondon closed 5 years ago
https://github.com/tripal/tripal/issues/120
let's try following Sofia's suggestion
Note the "imports" statemetns at the top. i'd bet those get ignored. Those are also .OWL files. So unless we want to load the whole ontologies into our db (CHEBI, PO, RO) we wouldw ant to load the subset .OWL files.
Core is working on an .OWL loader. Maybe this is a "wait and see" situation until we move farther along in the CV project
New plan:
One could argue we want the full ontologies on our site. Maybe. But i don't want htem all on my dev, if I can expect each one to take ~2 days to load in....
subontologies are available here https://github.com/bradfordcondon/plant_trait_ontology_import_obos
Issue: the typedefs dont hae names.
[Typedef]
id: decreased_in_magnitude_relative_to
domain: PATO:0000001 ! quality
range: PATO:0000001 ! quality
is_transitive: true
is_a: different_in_magnitude_relative_to
I think they should have names, and hte names can be equal to id.
I would want to automate this fix for some of the ontologies....
OK - I've created a "solitaire" Plant trait ontology.
It has all the crossreferences to other ontologies (CHEBI, PATO, GO, RO, PECO) removed. It also has the nameless terms (which i beleive are also crossreferences added "after the fact") removed.
The important thing is, it loads. And, it loads in 1 minute 50 seconds.
This is a HWG decision. Do you want to load the full ontologies for CHEBI, PATO, GO, RO, and PECO? If so, you can load them then load the full PTO. If not, load this solitaire OBO. For a developer site, obviously loading solitaire is the choice.
here's the core discussion. https://github.com/tripal/tripal/issues/120
We will load in my miniature pre-requesites.
Loaded dev.
Note url needs raw https://raw.githubusercontent.com/bradfordcondon/plant_trait_ontology_import_obos/master/pto_simple.obo
cant submit jobs to the obo loader: it says there is already a job in the queue
oddly, the trpal jobs queue had to be truncated first (`TRUNCATE tripal_jobs;).
loaded live.
ownloading URL http://purl.obolibrary.org/obo/to.obo, saving to /tmp/obo_apRpvb
Step 1: Preloading File /tmp/obo_apRpvb...
Step 2: Loading type defs...emory: 48,377,344 bytes.
Step 3: Loading terms...%. Memory: 48,372,168 bytes.
A term that belongs to another ontology is used within this vocabulary. Therefore a lookup was performed with the EBI Ontology Lookup Service to retrieve the information for this term. Please note, that vocabularies with many non-local terms require remote lookups and these lookups can dramatically decrease loading time.
I'm rerunning hte importer pointed at the true ontology now that core has added EBI-OLS support for terms not in the ontology. It appears to be running OK so far on dev. Need to re-run live if it works.
https://hardwoods.ag.utk.edu/cv/lookup/TO
Looks good!
Let's reload it live as well. : https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/397674
looks great! two terms were loaded weird because of a colon in the ID: see https://github.com/tripal/tripal/issues/525
I think i can just manually assign/rename and be happy though.
select * from chado.db where name = 'fatty acid anion 18';
db_id name description urlprefix url
(0 rows)
hmmm im actually not sure how to do that because of weird not null constraints:
select * from chado.dbxref dbx INNER JOIN chado.db db ON db.db_id = dbx.db_id WHERE db.name = 'fatty acid 18';
dbxref_id db_id accession version description db_id name description urlprefix url
245078 190 3 190 fatty acid 18
(1 row)
hardwoods_06112018=> delete from chado.dbxref dbx INNER JOIN chado.db db ON db.db_id = dbx.db_id WHERE db.name = 'fatty acid 18';
ERROR: syntax error at or near "INNER"
LINE 1: delete from chado.dbxref dbx INNER JOIN chado.db db ON db.db...
^
hardwoods_06112018=> delete from chado.dbxref WHERE dbxref_id = 245078;
ERROR: null value in column "dbxref_id" violates not-null constraint
DETAIL: Failing row contains (134184, 90, CHEBI:132502, , null, 0, 0).
CONTEXT: SQL statement "UPDATE ONLY "chado"."cvterm" SET "dbxref_id" = NULL WHERE $1 OPERATOR(pg_catalog.=) "dbxref_id""
reopning this because PATO is NOT THE PLANT TRAIT ONTOLOGY! oops!
I deleted the already loaded plant trait ontology (no relationships) and am reloading now.
load this instead http://www.obofoundry.org/ontology/to.html job: https://www.hardwoodgenomics.org/admin/tripal/tripal_jobs/view/413191
after loading it looks pretty horrible. https://www.hardwoodgenomics.org/cv/lookup/TO
let's delete an re-load just to be sure.
https://www.hardwoodgenomics.org/cv/lookup/TO
deleted and reloaded, and its still hideous.
let's examine these root terms vs the OBO file and see if we can figure out the problem.
for example: PATO:0000085 [TO:sensitivity toward] (31)
here's its record:
[Term]
id: PATO:0000085 ! sensitivity toward
is_a: PATO:0001018 ! physical quality
so it should be under PATO:0001018
, which should be under PATO:0001241 ! physical object quality
which is under PATO:0000001 ! quality
. that term itself is not defined in the OBO, its presumably loaded in separately.
select * from chado.cvterm cvt INNER JOIN chado.cv cv ON cv.cv_id = cvt.cv_id where cvt.name ='quality';
cvterm_id cv_id name definition dbxref_id is_obsolete is_relationshiptype cv_id name definition
133777 89 quality 244680 0 0 89 bfo The upper level ontology upon which OBO Foundry ontologies are built.
(1 row)
hardwoods_06112018=> select * from chado.cvterm cvt INNER JOIN chado.cv cv ON cv.cv_id = cvt.cv_id where cvt.name ='physical quality';
cvterm_id cv_id name definition dbxref_id is_obsolete is_relationshiptype cv_id name definition
138725 97 physical quality 246426 0 0 97 pato An ontology of phenotypic qualities (properties, attributes or characteristics)
(1 row)
so we do have these term in the db. quality is in bfo instead of pato...
OK. looks like the PATO terms are loaded in weird: name = PATO:0001241 instead of physical object quality
.
select cvt.name from chado.cvterm_relationship cr INNER JOIN chado.cvterm cvt ON cvt.cvterm_id = cr.object_id INNER JOIN chado.cvterm cvtsubj ON cr.subject_id = cvtsubj.cvterm_id where cvtsubj.name = 'physical quality';
name
PATO:0001241
(1 row)
this is pretty clear from just looking at the term pages. https://www.hardwoodgenomics.org/cv/lookup/TO/quality
Im going to load the ontologies on a fresh site and see if they load in messed up: if we so we know its still the OBO loader..
confirmed on a fresh site. the obo must be formatted in a different way?
Multiple plant relevant ontologies listed here: http://browser.planteome.org/amigo
Does the PO from obo foundry match the official one from github? (https://github.com/Planteome/plant-ontology) The Planteome expandable/collapsible trees look great for all these ontologies, but I'm not sure how their code may parse/store the obo file differently (or if they even use chado table structure).
no, looks different. looks like it makes less extensive use of cross-ref'd terms. Let's try this instead, thanks.
@mestato this is the plant ontology, not the plant trait ontology. Sorry i see you note they list several.... which is the one you'd like the biomaterials to use?
IE http://www.obofoundry.org/ontology/to.html
TO vs PO. Which is the correct one?
You are also correct that we are limited in our ontology usage by Chado. Cross-ref'd terms are troublesome, especially if they are included "as is" instead of as synonyms for terms.
from http://planteome.org/node/1
For the cvterms defining the fields (Tripal/Chado structure), I don't much care where the terms come from -the real importance there is alignment with other Tripal databases, as computer-level (web services) interoperability is the idea. But for the values of the fields (actual data!) - the target audience is plant users (both computationally savvy and not). Since the biomaterials have a lot of fields with values that could be ontologized - plant structure, development stage, experimental treatment - we will have to draw from different ontologies. plant structure => Plant Anatomical Entity, plant development stage => plant structure development stage. For experimental treatments, we need to dig into TO vs PATO, I don't know how they overlap or interrelate . I would prefer TO to align with planteome, but we likely will find more suitable generic terms in PATO. We don't use the environmental ontology right now.
For the cvterms defining the fields (Tripal/Chado structure), I don't much care where the terms come from -the real importance there is alignment with other Tripal databases, as computer-level (web services) interoperability is the idea
Well, the property terms will determine the CV browser mappings as well.
But I'm hearing you say that we can't expect all properties to map to a single ontology, which does make sense. TO makes heavy use of PATO: I find the PATO terms by searching within TO. I'm thinking that this is OK, and we hsould use these terms- if we have to tweak the browser to display them correctly we can do that.
Mapping the values to cvterms will require quite a bit of thought and work because many of the property values are in fact dense blocks text that need to be split into multiple property value pairs. We'll deal with that after we map the properties themselves, as a start.
ok, i suspect that the problem isnt the ontology but the loader. All those PATO terms showing at the root have their db/accessions swapped. see https://github.com/tripal/tripal/issues/558
giving this another try on dev.
because we've got so many messed up terms (ie with the db and accession reversed), im going to delete first. All terms are in the plaint_trait_ontology
cv.
Performing EBI OLS Lookup for: PO:0001108,304 bytes.
Cannot find the term via an EBI OLS lookup: PO:0001108. EBI Reported: Resource not found.Consider finding the OBO file for this ontology and manually loading it first.
[site http://default] [TRIPAL ERROR] [TRIPAL_JOB] Cannot find the term via an EBI OLS lookup: PO:0001108. EBI Reported: Resource not found.Consider finding the OBO file for this ontology and manually loading it first.
this error is as-reported in the issue
now we're waiting on : https://github.com/tripal/tripal/issues/665
Im checking out the loading separately plan:
[x] loading plant ontology (PO)
[ ] loading chebi
[ ] retry loading TO (plant trait ontology)
oops: CHEBI runs out of memory, and loading the PO does not prevent the missing PO term error:
Cannot find the term via an EBI OLS lookup: PO:0001108. EBI Reported: Resource not found.Consider finding the OBO file for this ontology and manually loading it first.
I dumped $full_url
and $results
:
string(110) "http://www.ebi.ac.uk/ols/api/ontologies/po/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FPO_0001108"
array(6) {
["timestamp"]=>
int(1537831783800)
["status"]=>
int(404)
["error"]=>
string(9) "Not Found"
["exception"]=>
string(62) "org.springframework.data.rest.webmvc.ResourceNotFoundException"
["message"]=>
string(18) "Resource not found"
["path"]=>
string(90) "/ols/api/ontologies/po/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FPO_0001108"
term is clearly in EBI: https://www.ebi.ac.uk/ols/ontologies/to/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FPO_0001108
yep! if we were to look it up uner the TO instead:
http://www.ebi.ac.uk/ols/api/ontologies/to/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FPO_0001108
(note change from /po/terms/
to /to/terms/`)
SO, before we throw an error, we can try looking it up under the namespace of the currently loaded ontology instead...
Update on this: This term was fixed in the Trait ontology thanks to me bringing it up. The next error'd ssue is a relationship term which appears to be OK in the OBO. I am working on modifying the importer to just warn instead of Error if it cant look up the term correctly: once that feature is in, we'll be ready to try loading again.
yayyyy heres the plant trait ontology tree (TO not PTO for issue title...)
beautiful!
we can load once my PR is merged or we can use the 680_warn_instead_of_error_no_term
branch.
The code change was to warn instead of error if it cant find a term. In one case, hte plant trait ontology had an error which was fixed. the other errors are problems with the API call of the loader, so we might choose to wait until that gets fixed? very minor: 2 relationship ontology terms are missing: RO:0002310
and RO:0002577
hi all, this is finally unblocked.
I think this is done? I don't see any pending PRs on core.
it's happily loaded here on live: https://hardwoodgenomics.org/cv/lookup/TO
The term for purple is a root term on our site. On EBI, its deep in the term tree.
This ontology is not loaded right.
Reloading seems to fix. This is it on dev after reloading. I'll do the same for live.
done
OBO file available here