Open caufieldjh opened 2 years ago
example with ICD10PCS.
Previously:
id category name provided_by aggregator_knowledge_source iri object predicate primary_knowledge_source relation same_as subject
ICD10PCS:0WJ34Z biolink:Procedure|biolink:OntologyClass BioPortal http://purl.bioontology.org/ontology/ICD10PCS/0WJ34Z
ICD10PCS:079430Z biolink:Procedure|biolink:OntologyClass BioPortal http://purl.bioontology.org/ontology/ICD10PCS/079430Z
ICD10PCS:0FPD4KZ biolink:Procedure|biolink:OntologyClass BioPortal http://purl.bioontology.org/ontology/ICD10PCS/0FPD4KZ
ICD10PCS:2W3HX3Z biolink:Procedure|biolink:OntologyClass BioPortal http://purl.bioontology.org/ontology/ICD10PCS/2W3HX3Z
ICD10PCS:2W56X1Z biolink:Procedure|biolink:OntologyClass BioPortal http://purl.bioontology.org/ontology/ICD10PCS/2W56X1Z
ICD10PCS:01QC3ZZ biolink:Procedure|biolink:OntologyClass BioPortal http://purl.bioontology.org/ontology/ICD10PCS/01QC3ZZ
ICD10PCS:2W0MX7Z biolink:Procedure|biolink:OntologyClass BioPortal http://purl.bioontology.org/ontology/ICD10PCS/2W0MX7Z
ICD10PCS:0SJL3Z biolink:Procedure|biolink:OntologyClass BioPortal http://purl.bioontology.org/ontology/ICD10PCS/0SJL3Z
Currently:
$ head transformed/ontologies/ICD10PCS/ICD10PCS_21_nodes.tsv
id category name description provided_by
ICD10PCS:0WJ34Z biolink:Procedure BioPortal
ICD10PCS:079430Z biolink:Procedure BioPortal
ICD10PCS:0FPD4KZ biolink:Procedure BioPortal
ICD10PCS:2W3HX3Z biolink:Procedure BioPortal
ICD10PCS:2W56X1Z biolink:Procedure BioPortal
ICD10PCS:01QC3ZZ biolink:Procedure BioPortal
ICD10PCS:2W0MX7Z biolink:Procedure BioPortal
ICD10PCS:0SJL3Z biolink:Procedure BioPortal
ICD10PCS:2W6CX0Z biolink:Procedure BioPortal
This may also be a good juncture to see if the values added to edgefiles in primary_knowledge_source
can be used in the nodelists too
Another example, with BFO.
Previous transform:
id category name description provided_by aggregator_knowledge_source iri object predicate primary_knowledge_source relation same_as subject
BFO:0000019 biolink:OntologyClass quality BioPortal http://purl.obolibrary.org/obo/BFO_0000019
BFO:0000015 biolink:OntologyClass process p is a process = Def. p is an occurrent that has temporal proper parts and for some time t, p s-depends_on some material entity at t. (axiom label in BFO2 Reference: [083-003]) BioPortal http://purl.obolibrary.org/obo/BFO_0000015
BFO:0000016 biolink:OntologyClass disposition BioPortal http://purl.obolibrary.org/obo/BFO_0000016
BFO:0000017 biolink:OntologyClass realizable entity BioPortal http://purl.obolibrary.org/obo/BFO_0000017
BFO:0000018 biolink:OntologyClass zero-dimensional spatial region BioPortal http://purl.obolibrary.org/obo/BFO_0000018
BFO:0000011 biolink:OntologyClass spatiotemporal region BioPortal http://purl.obolibrary.org/obo/BFO_0000011
IAO:0000116 biolink:OntologyClass editor note BioPortal http://purl.obolibrary.org/obo/IAO_0000116
IAO:0000117 biolink:OntologyClass term editor BioPortal http://purl.obolibrary.org/obo/IAO_0000117
BFO:0000134 biolink:OntologyClass BioPortal http://purl.obolibrary.org/obo/BFO_0000134
BFO:0000179 biolink:OntologyClass BFO OWL specification label Relates an entity in the ontology to the name of the variable that is used to represent it in the code that generates the BFO OWL file from the lispy specification. BioPortal http://purl.obolibrary.org/obo/BFO_0000179
IAO:0000115 biolink:OntologyClass definition BioPortal http://purl.obolibrary.org/obo/IAO_0000115
IAO:0000112 biolink:OntologyClass example of usage BioPortal http://purl.obolibrary.org/obo/IAO_0000112
IAO:0000111 biolink:OntologyClass editor preferred term BioPortal http://purl.obolibrary.org/obo/IAO_0000111
IAO:0000232 biolink:OntologyClass curator note BioPortal http://purl.obolibrary.org/obo/IAO_0000232
BFO:0000008 biolink:OntologyClass temporal region BioPortal http://purl.obolibrary.org/obo/BFO_0000008
Current transform:
id category name description provided_by
BFO:0000019 biolink:OntologyClass quality Basic Formal Ontology
BFO:0000015 biolink:OntologyClass process p is a process = Def. p is an occurrent that has temporal proper parts and for some time t, p s-depends_on some material entity at t. (axiom label in BFO2 Reference: [083-003]) Basic Formal Ontology
BFO:0000016 biolink:OntologyClass disposition Basic Formal Ontology
BFO:0000017 biolink:OntologyClass realizable entity Basic Formal Ontology
BFO:0000018 biolink:OntologyClass zero-dimensional spatial region Basic Formal Ontology
BFO:0000011 biolink:OntologyClass spatiotemporal region Basic Formal Ontology
IAO:0000116 biolink:OntologyClass editor note Basic Formal Ontology
IAO:0000117 biolink:OntologyClass term editor Basic Formal Ontology
BFO:0000134 biolink:OntologyClass Basic Formal Ontology
BFO:0000179 biolink:OntologyClass BFO OWL specification label Relates an entity in the ontology to the name of the variable that is used to represent it in the code that generates the BFO OWL file from the lispy specification. Basic Formal Ontology
IAO:0000115 biolink:OntologyClass definition Basic Formal Ontology
IAO:0000112 biolink:OntologyClass example of usage Basic Formal Ontology
IAO:0000111 biolink:OntologyClass editor preferred term Basic Formal Ontology
IAO:0000232 biolink:OntologyClass curator note Basic Formal Ontology
BFO:0000008 biolink:OntologyClass temporal region Basic Formal Ontology
The name
field is still populated, so that's great, but provided_by
is now the name of the ontology instead of the aggregator knowledge source (probably also fine, but should include version, too), extra headings are different (an improvement, and perhaps something KGX is doing?), and iri
isn't there at all. Would really prefer to have IRIs present so nodes may be mapped back to source BP ontologies.
This may be due to a difference in bmt
or in Biolink Model itself.
Here's one confirmed difference: if I run a transform like the following
kgx.cli.transform(inputs=[repaired_outpath],
input_format='obojson',
output=outpath,
output_format='tsv',
stream=True,
knowledge_sources=[("aggregator_knowledge_source", "BioPortal"),
("primary_knowledge_source", primary_knowledge_source)])
then aggregator_knowledge_source
is not added to the node or edge file, 'primary_knowledge_source' is added to the edgefile but the corresponding values are included under provided_by
.
This isn't really a blocker - the transforms should merge perfectly well without IRIs present - so if it's related to kgx
or bmt
then perhaps it can be solved as part of the kg-bioportal merge.
Metadata is missing in new transforms; provided_by
is back to providing only the source file name.
Example from ODNAE:
id category name description provided_by
CHEBI:25698 biolink:ChemicalSubstance ether A compound ROR (where R is not H). ODNAE_3_relaxed.json
GO:0010646 biolink:BiologicalProcess regulation of cell communication Any process that modulates the frequency, rate or extent of cell communication. Cell communication is the process that mediates interactions between a cell and its surroundings. Encompasses interactions such as signaling or attachment between one cell and another cell, between a cell and an extracellular matrix, or between a cell and any other aspect of its environment. ODNAE_3_relaxed.json
GO:0010647 biolink:BiologicalProcess positive regulation of cell communication Any process that increases the frequency, rate or extent of cell communication. Cell communication is the process that mediates interactions between a cell and its surroundings. Encompasses interactions such as signaling or attachment between one cell and another cell, between a cell and an extracellular matrix, or between a cell and any other aspect of its environment. ODNAE_3_relaxed.json
ODNAE:0000100 biolink:NamedThing zidovudine (Retrovir)-associated neuropathy AE ODNAE_3_relaxed.json
DRON:00021698 biolink:Drug Disulfiram Oral Tablet ODNAE_3_relaxed.json
Will make this its own issue because I think I have a solution.
Many transforms appear to be missing descriptions, IRIs, and possibly other fields populated in the previous set of transforms. Will need to verify the JSON -> TSV step is populating fields as expected, particularly
name
anddescription
.