ncbo / BioPortal-to-KGX

Assemble a BioPortal Knowledge Graph
BSD 3-Clause "New" or "Revised" License
4 stars 1 forks source link

Collision between ATO and ATOL, potentially others of similar names #72

Closed caufieldjh closed 2 years ago

caufieldjh commented 2 years ago

I noticed that the most recent merged graph merges entries from ATO and ATOL:

ATO:0000368     biolink:NamedThing      startle response|Hydromantes (Gistel 1848)      None    ATO_2_nodes.tsv|ATOL_2_nodes.tsv

This is unexpected as ATOL should have the CURIE prefix ATOL, but sure enough, it got assigned ATO instead:

$ head ATOL_2_nodes.tsv
id      category        name    description     provided_by
ATO:0000261     biolink:NamedThing      milk fatty acid iso C18:0 concentration         Animal Trait Ontology for Livestock
ATO:0001593     biolink:NamedThing      efficiency of phenylalanine utilization         Animal Trait Ontology for Livestock
ATO:0001592     biolink:NamedThing      efficiency of methionine utilization            Animal Trait Ontology for Livestock

I suspect this is overzealous prefix mapping. There may be similar instances among similarly named BP ontos, such as MA and MAT.

caufieldjh commented 2 years ago

I appear to have fixed this somewhere along the way:

$ python run.py --input ../Bioportal/4store-export-2022-07-20/data/ --include_only 263b039a23c15fb41090bc493d97,391d5eda40a75812f9eed0f6e4d8 --write_curies
Looking for records in ../Bioportal/4store-export-2022-07-20/data/
Will only include the specified 2 file(s).
2 files found.
Setting up ROBOT...
ROBOT path: /home/harry/BioPortal-to-KGX/robot
ROBOT evironment variables: -Xmx12g -XX:+UseG1GC
Loading prefix maps from prefixes/
Loaded prefixes for 547 ontologies.
Loaded preferred prefixes for 3 ontologies.
Transforming all...
Starting on ../Bioportal/4store-export-2022-07-20/data/45/9a/263b039a23c15fb41090bc493d97
ROBOT: relax ATO_2
Relaxing /tmp/tmpvs1tprqt to transformed/ontologies/ATO/ATO_2_relaxed.json...
Complete.
KGX transform ATO_2
Will write new CURIEs for nodes in ATO_2.
Validating graph files with pandas...
Graph file transformed/ontologies/ATO/ATO_2_edges.tsv parses OK.
Graph file transformed/ontologies/ATO/ATO_2_nodes.tsv parses OK.
Starting on ../Bioportal/4store-export-2022-07-20/data/a4/06/391d5eda40a75812f9eed0f6e4d8
ROBOT: relax ATOL_2
Relaxing /tmp/tmpdk2f8ssl to transformed/ontologies/ATOL/ATOL_2_relaxed.json...
Complete.
KGX transform ATOL_2
Will write new CURIEs for nodes in ATOL_2.
Validating graph files with pandas...
Graph file transformed/ontologies/ATOL/ATOL_2_nodes.tsv parses OK.
Graph file transformed/ontologies/ATOL/ATOL_2_edges.tsv parses OK.
Successful transforms: ATO_2, ATOL_2

$ head transformed/ontologies/ATOL/ATOL_2_nodes.tsv && head transformed/ontologies/ATO/ATO_2_nodes.tsv 
id      category        name    description     provided_by
ATOL:0000261    biolink:OntologyClass   milk fatty acid iso C18:0 concentration         ATOL_2_relaxed.json
ATOL:0001593    biolink:OntologyClass   efficiency of phenylalanine utilization         ATOL_2_relaxed.json
ATOL:0001592    biolink:OntologyClass   efficiency of methionine utilization            ATOL_2_relaxed.json
ATOL:0000262    biolink:OntologyClass   milk fatty acid cis-14-c18:1 concentration              ATOL_2_relaxed.json
ATOL:0001595    biolink:OntologyClass   efficiency of tryptophan utilization            ATOL_2_relaxed.json
ATOL:0000263    biolink:OntologyClass   milk fatty acid trans-16-C18:1 concentration            ATOL_2_relaxed.json
ATOL:0001111    biolink:OntologyClass   colon length            ATOL_2_relaxed.json
ATOL:0001594    biolink:OntologyClass   efficiency of threonine utilization             ATOL_2_relaxed.json
ATOL:0000264    biolink:OntologyClass   milk carotenoid concentration           ATOL_2_relaxed.json
id      category        name    description     provided_by
ATO:0004150     biolink:OntologyClass   Brachytarsophrys platyparietus (Rao and Yang 1997)              ATO_2_relaxed.json
ATO:0005482     biolink:OntologyClass   Tomopterna damarensis (Dawood and Channing 2002)                ATO_2_relaxed.json
ATO:0005483     biolink:OntologyClass   Tomopterna delalandii (Tschudi 1838)            ATO_2_relaxed.json
ATO:0005480     biolink:OntologyClass   Strongylopus wageri (Wager 1961)                ATO_2_relaxed.json
ATO:0004152     biolink:OntologyClass   Leptobrachella brevicrus (Dring 1984)           ATO_2_relaxed.json
OBO:TEMP        biolink:OntologyClass                   ATO_2_relaxed.json
ATO:0004151     biolink:OntologyClass   Leptobrachella baluensis (Smith 1931)           ATO_2_relaxed.json
ATO:0005481     biolink:OntologyClass   Tomopterna cryptotis (Boulenger 1907)           ATO_2_relaxed.json
ATO:0004158     biolink:OntologyClass   Leptobrachium abbotti (Cochran 1926)            ATO_2_relaxed.json