Closed putmantime closed 1 year ago
A little more digging with pandas turned up some more:
OMIM:100650 {'biolink:Gene', 'biolink:Disease'} OMIM:106150 {'biolink:Gene', 'biolink:Disease'} OMIM:106180 {'biolink:Gene', 'biolink:Disease'} OMIM:107670 {'biolink:Disease', 'biolink:Gene'} OMIM:109690 {'biolink:Disease', 'biolink:Gene'} OMIM:118425 {'biolink:Gene', 'biolink:Disease'} OMIM:120120 {'biolink:Gene', 'biolink:Disease'} OMIM:124080 {'biolink:Disease', 'biolink:Gene'} OMIM:126452 {'biolink:Disease', 'biolink:Gene'} OMIM:131240 {'biolink:Gene', 'biolink:Disease'} OMIM:134637 {'biolink:Gene', 'biolink:Disease'} OMIM:139320 {'biolink:Gene', 'biolink:Disease'} OMIM:139360 {'biolink:Disease', 'biolink:Gene'} OMIM:142830 {'biolink:Disease', 'biolink:Gene'} OMIM:147545 {'biolink:Disease', 'biolink:Gene'} OMIM:147575 {'biolink:Gene', 'biolink:Disease'} OMIM:152390 {'biolink:Gene', 'biolink:Disease'} OMIM:158105 {'biolink:Disease', 'biolink:Gene'} OMIM:162080 {'biolink:Disease', 'biolink:Gene'} OMIM:163729 {'biolink:Gene', 'biolink:Disease'} OMIM:168820 {'biolink:Disease', 'biolink:Gene'} OMIM:173360 {'biolink:Gene', 'biolink:Disease'} OMIM:173470 {'biolink:Gene', 'biolink:Disease'} OMIM:176797 {'biolink:Disease', 'biolink:Gene'} OMIM:176943 {'biolink:Gene', 'biolink:Disease'} OMIM:182100 {'biolink:Gene', 'biolink:Disease'} OMIM:188830 {'biolink:Gene', 'biolink:Disease'} OMIM:191160 {'biolink:Gene', 'biolink:Disease'} OMIM:191170 {'biolink:Gene', 'biolink:Disease'} OMIM:217050 {'biolink:Gene', 'biolink:Disease'} OMIM:300265 {'biolink:Gene', 'biolink:Disease'} OMIM:600020 {'biolink:Disease', 'biolink:Gene'} OMIM:600098 {'biolink:Gene', 'biolink:Disease'} OMIM:600700 {'biolink:Gene', 'biolink:Disease'} OMIM:600985 {'biolink:Disease', 'biolink:Gene'} OMIM:601130 {'biolink:Disease', 'biolink:Gene'} OMIM:601373 {'biolink:Gene', 'biolink:Disease'} OMIM:601410 {'biolink:Disease', 'biolink:Gene'} OMIM:601465 {'biolink:Gene', 'biolink:Disease'} OMIM:602421 {'biolink:Gene', 'biolink:Disease'} OMIM:602686 {'biolink:Disease', 'biolink:Gene'} OMIM:603013 {'biolink:Disease', 'biolink:Gene'} OMIM:603324 {'biolink:Gene', 'biolink:Disease'} OMIM:603372 {'biolink:Gene', 'biolink:Disease'} OMIM:603517 {'biolink:Gene', 'biolink:Disease'} OMIM:603615 {'biolink:Disease', 'biolink:Gene'} OMIM:604124 {'biolink:Gene', 'biolink:Disease'} OMIM:605204 {'biolink:Gene', 'biolink:Disease'} OMIM:606989 {'biolink:Gene', 'biolink:Disease'} OMIM:607093 {'biolink:Gene', 'biolink:Disease'} OMIM:607585 {'biolink:Gene', 'biolink:Disease'} OMIM:607759 {'biolink:Gene', 'biolink:Disease'} OMIM:608537 {'biolink:Gene', 'biolink:Disease'} OMIM:613733 {'biolink:Disease', 'biolink:Gene'} OMIM:615538 {'biolink:Gene', 'biolink:Disease'} OMIM:616902 {'biolink:Gene', 'biolink:Disease'} OMIM:617352 {'biolink:Gene', 'biolink:Disease'}
@putmantime, I assume you want us to call a spade a spade: if an OMIM is classified as a Gene, then just call it a Gene, ignoring any other additional classification (as a "disease")?
The ingest uses mim2gene mappings which have a column indicating when a MIM id is considered a Gene. I suppose using these mappings to assert "Gene" in an overriding fashion would likely resolve the issue. Unless I hear otherwise, I'll proceed forward with that understanding.
The slight complication here is that the OMIM gene_to_phenotype.py ingest doesn't seem to actually upload the Gene nodes themselves. I'll have to clarify where the gene nodes are being loaded into the graph (@kevinschaper... they must be loaded separately, somewhere?)
I removed the gene node creation (last week I think?). For OMIM genes, we really want to have an edge only ingest here and wire the edges to HGNC (and MONDO). I'm less sure about what we need to do to capture the nodes that were being captured as NucleicAcidEntity / heritable_phenotypic_marker - but we want real nodes for them with more than just an ID, which ideally would happen it's own ingest, assuming that we won't have other nodes for them that we can map to.
We can easily ensure that the OMIM ingest only outputs Gene subjects. On the HGNC mappings, OMIM mim2gene only provides the HGNC gene symbols. Do we have any map available of HGNC gene symbol to HGNC ID around?
@putmantime. @kevinschaper things that this issue is a red herring. I've reassigned it to you and him for (re-)discussion and closure and/or clarification. I'll also put it in the 'Icebox' of zenhub tracking.
My best guess is that this is resolved by making OMIM an edge only ingest. I think we likely need to add a node ingest to pick up everything that can't be mapped to HGNC, but we can take care of that in #255
This same problem manifested itself in using our new mapping strategy as well, but this case is safe to close.
In our OMIM Ingest are categorized as both gene and disease.
MATCH (n:
biolink:Disease biolink:Gene
) RETURN count(n) as count returns 19 nodesMATCH (n:
biolink:Disease biolink:Gene
) RETURN n.idn.id
"OMIM:107670" "OMIM:109690" "OMIM:124080" "OMIM:126452" "OMIM:139360" "OMIM:142830" "OMIM:147545" "OMIM:158105" "OMIM:162080" "OMIM:168820" "OMIM:176797" "OMIM:600020" "OMIM:600985" "OMIM:601130" "OMIM:601410" "OMIM:602686" "OMIM:603013" "OMIM:603615" "OMIM:613733"