monarch-initiative / mondo-ingest

Coordinating the mondo-ingest with external sources
https://monarch-initiative.github.io/mondo-ingest/
6 stars 3 forks source link

Migration table: add provenance to definitions! #626

Open matentzn opened 2 months ago

matentzn commented 2 months ago

Right now the migration (slurp) tables dont add provenance to definitions, they should!

joeflack4 commented 2 months ago

@matentzn How would you like to see this done?

Here's an example of a term w/ a definition now (DOID:0081461): slurp/doid.tsv: mondo_id mondo_label xref xref_source original_label definition parents
ID LABEL A oboInOwl:hasDbXref >A oboInOwl:source SPLIT=" A IAO:0000115 SC %
MONDO:0971037 thyroid gland spindle epithelial tumor with thymus-like elements DOID:0081461 MONDO:equivalentTo thyroid gland spindle epithelial tumor with thymus-like elements A thyroid gland carcinoma that is characterized by a lobulated architectural pattern and the presence of a biphasic cellular population composed of spindle epithelial cells and glandular cells. MONDO:0015075

components/doid.owl:

    <owl:Class rdf:about="http://purl.obolibrary.org/obo/DOID_0081461">
        <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/DOID_3963"/>
        <obo:IAO_0000115>A thyroid gland carcinoma that is characterized by a lobulated architectural pattern and the presence of a biphasic cellular population composed of spindle epithelial cells and glandular cells.</obo:IAO_0000115>
        <oboInOwl:hasDbXref>NCI:C46105</oboInOwl:hasDbXref>
        <oboInOwl:hasExactSynonym>SETTLE</oboInOwl:hasExactSynonym>
        ...
        <rdfs:label xml:lang="en">thyroid gland spindle epithelial tumor with thymus-like elements</rdfs:label>
    </owl:Class>

The definition in the TSV is:

A thyroid gland carcinoma that is characterized by a lobulated architectural pattern and the presence of a biphasic cellular population composed of spindle epithelial cells and glandular cells.

Are you saying that we should append any oio:hasDbXref (or potentially other predicates) to the definitions?

matentzn commented 2 months ago

See definition_evidence column

mondo_id mondo_label xref xref_source original_label definition definition_evidence parents
ID LABEL A oboInOwl:hasDbXref >A oboInOwl:source SPLIT=| A IAO:0000115 >A oboInOwl:hasDbXref SPLIT=| SC %
MONDO:0971037 thyroid gland spindle epithelial tumor with thymus-like elements DOID:0081461 MONDO:equivalentTo thyroid gland spindle epithelial tumor with thymus-like elements A thyroid gland carcinoma that is characterized by a lobulated architectural pattern and the presence of a biphasic cellular population composed of spindle epithelial cells and glandular cells. DOID:0081461 MONDO:0015075
joeflack4 commented 2 months ago

Ahhh, I see. That will be very easy to add.

Although note, I see it has SPLIT=|. The way that the migrate pipeline works currently, if I add this, I think it's only ever going to have 1 CURIE in that column, the same curie as in the xref column. I assume that is OK.

matentzn commented 2 months ago

No harm with the SPLIT=| just a reflex to making it future proof.