monarch-initiative / dipper

Data Ingestion Pipeline for Monarch
https://dipper.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
57 stars 26 forks source link

Gene associations for EDS2 not correct #784

Closed cmungall closed 5 years ago

cmungall commented 5 years ago

See https://github.com/monarch-initiative/mondo/issues/760

However, the xref issues in this ticket don't fully explain:

https://beta.monarchinitiative.org/disease/OMIM:130010#gene

Here we see 3 genes associated. The one with the strongest "seeming" evidence and the most sources is COL5A1 but that's actually wrong, this gene is for EDS1!

cmungall commented 5 years ago

Trying to debug in sg-data, but this isn't giving me any genes?

https://scigraph-data.monarchinitiative.org/scigraph/graph/neighbors/OMIM%3A130010?depth=1&blankNodes=true&direction=BOTH&entail=false

cmungall commented 5 years ago

hmm, I though SG automatically went over equiv nodes?

Anyway querying https://scigraph-data.monarchinitiative.org/scigraph/graph/neighbors/MONDO%3A0019568?depth=1&blankNodes=true&direction=BOTH&entail=false

{
      "sub": "HGNC:2209",
      "obj": "MONDO:0019568",
      "pred": "RO:0002607",
      "meta": {
        "equivalentOriginalNodeSource": [
          "https://www.ncbi.nlm.nih.gov/gene/1289"
        ],
        "isDefinedBy": [
          "https://data.monarchinitiative.org/ttl/ctd.ttl"
        ],
        "lbl": [
          "is marker for"
        ],
        "equivalentOriginalNodeTarget": [
          "http://purl.obolibrary.org/obo/MESH_C536195"
        ]
      }
    },
  {
      "sub": "HGNC:2210",
      "obj": "MONDO:0019568",
      "pred": "RO:0003303",
      "meta": {
        "equivalentOriginalNodeSource": [
          "http://omim.org/entry/120190"
        ],
        "isDefinedBy": [
          "https://data.monarchinitiative.org/ttl/omim.ttl"
        ],
        "lbl": [
          "causes condition"
        ],
        "equivalentOriginalNodeTarget": [
          "http://omim.org/entry/130010"
        ]
      }
    },

this is actually correct, see https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/2210

So it seems either the UI is pulling from a different version of the data OR this is a data bug introduced in going from SG->Solr?

kshefchek commented 5 years ago

you can see the evidence graphs here

looks like orphanet is the culprit, I think this should be fixed in the next data release

kshefchek commented 5 years ago

fixed with the new release