Provide more metadata on variants/genotypes

monarch-initiative / dipper

Data Ingestion Pipeline for Monarch

https://dipper.readthedocs.io/en/latest/

BSD 3-Clause "New" or "Revised" License

56 stars 26 forks source link

Provide more metadata on variants/genotypes #724

Open cmungall opened 9 years ago

cmungall commented 9 years ago

Compare

http://www.informatics.jax.org/allele/MGI:3894845
http://beta.monarchinitiative.org/gene/MGI:3894845
- requires linked xref to MGI
- type "gene trapped allele"
- etc

May require loading some geno relationships into golr

cmungall commented 9 years ago

Also for variants,

We certainly have some of this info in SG, but are not showing it (e.g. position).

We have previously discussed running on our own computes on variants (I think with @nlwashington), but I don't recall the outcome, and it doesn't appear to be tracked, so this ticket may be a good place to discuss.

nlwashington commented 9 years ago

This is a good ticket. We'll specify here the additional items to add into the golr schema for genotypes and their parts. We can do it in two ways:

a "genotype closure" which gets all the parts of the genotype in a flat list. but this isn't helpful if you want to show these parts as specific columns in a view
the genotype partonomy split into different columns, at least:
- *_genotype_id | label | type
- *_genomic_background_id | label | type
- *_variant_id | label | type
- *_gene_id | label | type
- * (possibly other parts of the genotype partonomy? variant loci? GVC? or even the "closure"?)

For variants, it's really important for us to add their pathogenicity (quality) calls, whcih we don't yet. We don't have a good way to create the "intermediate" kinds yet (like suspected pathogenic != benign). How to tag?

What else do you want to see, @cmungall ?

kshefchek commented 8 years ago

For this release should we simply remove the overview tab all together (also on genes) since we're moving the compare tab to the initial open tab? Or would this go in the jumbotron?

kltm commented 8 years ago

Just a note, but overloading the jumbotron will eventually lead to (more) layout breakage. BS3 didn't really design that as an arbitrary data container.

nlwashington commented 8 years ago

well, we absolutely need to have these kinds of things for all pages:

link to external site for primary identifier (MGI, OMIM, etc.)
equivalent ids
alt ids (deprecated) - ideally with relationship to them but okay if not
synonyms
definition / description
comments (maybe)
if it is an instance, then list it's rdf:type. this is particularly important for things like variants.

i don't think we want to overload the jumbotron with all that info. i think the overview tab still has this kind of information. maybe call it "info" or something instead?

kshefchek commented 8 years ago

Is there somewhere I should be looking for this information other than /graph/{id} and /vocabulary/id/{id}? We pull much of this information but it only shows up if it's populated on SciGraph, see: http://duckworth.crbs.ucsd.edu:9000/scigraph/vocabulary/id/ClinVarVariant:88756

I can add types to variant and any other pages.

jmcmurry commented 7 years ago

@kshefchek @TomConlin as this is an old ticket, just wanted to follow up to see whether this ticket has in the meantime gotten easier to tackle.

kshefchek commented 5 years ago

bumping this ticket, it would be great to ingest allele type when this is available (for example MGI). I would like to find all cases of haploinsufficiency in mice, but am not able to write this query without knowing if an allele is a knock out.

justaddcoffee commented 5 years ago

Could/should we break this ticket into a few tickets? Seems like this might actually be a few separate items. I could start on improving the MGI ingest, then Clinvar ingest - any others?

kshefchek commented 5 years ago

That's a good idea, there are many subtasks involved here

justaddcoffee commented 5 years ago

@kshefchek maybe could chat about this today-ish (or whenever) if you have time? Ping me if you get a minute today

mbrush commented 5 years ago

Feel free to tag me if there are any modeling or ontology-related questions/needs for this work - e.g. additional terms needed in GENO, choosing best terms from SO, etc.

justaddcoffee commented 5 years ago

About ingesting haploinsufficiency data, I pinged Harold Drabkin on Seth's advice, since he's an MGI informatics guy who might know where haploinsufficiency data lives in MGI (if it is in there at all)