monarch-initiative / dipper

Data Ingestion Pipeline for Monarch
https://dipper.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
56 stars 26 forks source link

MGI alleles without genomic position #919

Open pnrobinson opened 4 years ago

pnrobinson commented 4 years ago

the anx allele (http://www.informatics.jax.org/allele/MGI:1856657) is a locus for which the actual genetic variant is not known. When you go from https://beta.monarchinitiative.org/variant/MGI:1856657 to the gene page, the gene page is empty: https://beta.monarchinitiative.org/variant/MGI:1856657#gene

Probably we want to filter the alleles in MGI according to whether they are associated with a gene or not and adjust the behavior of our UI. There is code in phenol that filters out genetic markers that are not associated with genes that could be used as a basis.

kshefchek commented 4 years ago

I think theres something odd going on with the routing, but I'm not able to reproduce it with my browser. In theory you should never be routed to https://beta.monarchinitiative.org/variant/MGI:1856657, because this is a gene in our db

The page corresponding to http://www.informatics.jax.org/allele/MGI:1856657: https://beta.monarchinitiative.org/gene/MGI:1856657

Has one allele, the wild type (an allele is not a variant but we conflate these in our neo4j indexes which is possibly an issue): https://beta.monarchinitiative.org/variant/MGI:5907294

the allele is associated with anx according to MGI http://www.informatics.jax.org/allele/MGI:5907294, so this filtering would need to happen at the dipper level if this is desired.

pnrobinson commented 4 years ago

@kshefchek -- let's zoom about this, that sounds like a weird bug. Not that anx is not a gene, it's a locus, and imho MGI and also NCBI Gene are confusing. It is a little hard to filter out

kshefchek commented 4 years ago

Yes let's zoom on this! I'm looking at this a little closer now and it's an interesting case:

In dipper (replacing IDs with labels)

MGI:1856657 a GENO:variant allele, SO:sequence_alteration ;
  GENO:sequence_derives_from MGI:5651334 ;
  owl:sameAs MGI:88029 .

MGI:88029 a SO:heritable_phenotypic_marker ;
    rdfs:subClassOf SO:gene ;
    owl:equivalentClass NCBIGene:11743 .

MGI:5907294 a GENO:reference allele ;
    GENO:is_reference_allele_of MGI:88029 .

Because there's an owl:sameAs between MGI:1856657 and MGI:88029, we merge these two identifiers in SciGraph, and I think the choice of whcih ID is picked is arbitrary. I have mixed feelings about merging identifiers in the same namespace, even if (in my untrained perspective) these two concepts do look identical.

After the clique merge we end up with a concept that we're calling a variant allele, sequence alteration, heritable phenotypic marker, and a subclass of a gene. I think calling this a heritable phenotypic marker is the most correct.

Would be great to get @mbrush's perspective on this as well

kshefchek commented 4 years ago

possibly also related https://github.com/monarch-initiative/dipper/issues/519