Closed caufieldjh closed 2 years ago
I appear to have fixed this somewhere along the way:
$ python run.py --input ../Bioportal/4store-export-2022-07-20/data/ --include_only 263b039a23c15fb41090bc493d97,391d5eda40a75812f9eed0f6e4d8 --write_curies
Looking for records in ../Bioportal/4store-export-2022-07-20/data/
Will only include the specified 2 file(s).
2 files found.
Setting up ROBOT...
ROBOT path: /home/harry/BioPortal-to-KGX/robot
ROBOT evironment variables: -Xmx12g -XX:+UseG1GC
Loading prefix maps from prefixes/
Loaded prefixes for 547 ontologies.
Loaded preferred prefixes for 3 ontologies.
Transforming all...
Starting on ../Bioportal/4store-export-2022-07-20/data/45/9a/263b039a23c15fb41090bc493d97
ROBOT: relax ATO_2
Relaxing /tmp/tmpvs1tprqt to transformed/ontologies/ATO/ATO_2_relaxed.json...
Complete.
KGX transform ATO_2
Will write new CURIEs for nodes in ATO_2.
Validating graph files with pandas...
Graph file transformed/ontologies/ATO/ATO_2_edges.tsv parses OK.
Graph file transformed/ontologies/ATO/ATO_2_nodes.tsv parses OK.
Starting on ../Bioportal/4store-export-2022-07-20/data/a4/06/391d5eda40a75812f9eed0f6e4d8
ROBOT: relax ATOL_2
Relaxing /tmp/tmpdk2f8ssl to transformed/ontologies/ATOL/ATOL_2_relaxed.json...
Complete.
KGX transform ATOL_2
Will write new CURIEs for nodes in ATOL_2.
Validating graph files with pandas...
Graph file transformed/ontologies/ATOL/ATOL_2_nodes.tsv parses OK.
Graph file transformed/ontologies/ATOL/ATOL_2_edges.tsv parses OK.
Successful transforms: ATO_2, ATOL_2
$ head transformed/ontologies/ATOL/ATOL_2_nodes.tsv && head transformed/ontologies/ATO/ATO_2_nodes.tsv
id category name description provided_by
ATOL:0000261 biolink:OntologyClass milk fatty acid iso C18:0 concentration ATOL_2_relaxed.json
ATOL:0001593 biolink:OntologyClass efficiency of phenylalanine utilization ATOL_2_relaxed.json
ATOL:0001592 biolink:OntologyClass efficiency of methionine utilization ATOL_2_relaxed.json
ATOL:0000262 biolink:OntologyClass milk fatty acid cis-14-c18:1 concentration ATOL_2_relaxed.json
ATOL:0001595 biolink:OntologyClass efficiency of tryptophan utilization ATOL_2_relaxed.json
ATOL:0000263 biolink:OntologyClass milk fatty acid trans-16-C18:1 concentration ATOL_2_relaxed.json
ATOL:0001111 biolink:OntologyClass colon length ATOL_2_relaxed.json
ATOL:0001594 biolink:OntologyClass efficiency of threonine utilization ATOL_2_relaxed.json
ATOL:0000264 biolink:OntologyClass milk carotenoid concentration ATOL_2_relaxed.json
id category name description provided_by
ATO:0004150 biolink:OntologyClass Brachytarsophrys platyparietus (Rao and Yang 1997) ATO_2_relaxed.json
ATO:0005482 biolink:OntologyClass Tomopterna damarensis (Dawood and Channing 2002) ATO_2_relaxed.json
ATO:0005483 biolink:OntologyClass Tomopterna delalandii (Tschudi 1838) ATO_2_relaxed.json
ATO:0005480 biolink:OntologyClass Strongylopus wageri (Wager 1961) ATO_2_relaxed.json
ATO:0004152 biolink:OntologyClass Leptobrachella brevicrus (Dring 1984) ATO_2_relaxed.json
OBO:TEMP biolink:OntologyClass ATO_2_relaxed.json
ATO:0004151 biolink:OntologyClass Leptobrachella baluensis (Smith 1931) ATO_2_relaxed.json
ATO:0005481 biolink:OntologyClass Tomopterna cryptotis (Boulenger 1907) ATO_2_relaxed.json
ATO:0004158 biolink:OntologyClass Leptobrachium abbotti (Cochran 1926) ATO_2_relaxed.json
I noticed that the most recent merged graph merges entries from ATO and ATOL:
This is unexpected as ATOL should have the CURIE prefix ATOL, but sure enough, it got assigned ATO instead:
I suspect this is overzealous prefix mapping. There may be similar instances among similarly named BP ontos, such as MA and MAT.