Closed pnrobinson closed 5 years ago
hmmmm hgvs allele doesn't really work - maybe something more generic? and we could do better w evidence. @mbrush what should be included here?
One of the problems we have with model organism phenopackets is they will necessarily represent populations rather than individuals. e.g. there can be variable penetrance & expressivity of a phenotype for a given genotype.
For mouse we'd want background and genotype e.g.
Allelic Composition | Genetic Background | |
---|---|---|
hm1 | Fbn1tm1Hcd/Fbn1tm1Hcd | involves: 129S1/Sv 129X1/SvJ C57BL/6J |
ht2 Disease Model | Fbn1tm1Hcd/Fbn1+ | involves: 129S1/Sv 129X1/SvJ C57BL/6J |
cx3 Disease Model | Fbn1tm1Hcd/Fbn1+Tgfb2tm1Doe/Tgfb2+ | involves: 129P2/OlaHsd 129S1/Sv 129X1/SvJ * C57BL/6J |
So your current example
"variants": [{
"hgvsAllele": {
"id": "",
"hgvs": "Fbn1tm1Hcd"
},
"genotype": {
"id": "GENO:0000135",
"label": "heterozygous"
}
}]
might be better modeled like this?
message Variant {
oneof allele {
HgvsAllele hgvs_allele = 2;
VcfAllele vcf_allele = 3;
SpdiAllele spdi_allele = 4;
IscnAllele iscn_allele = 5;
MouseAllele mouse_allele = 8;
}
// Genotype of the alleles using GENO ontology
OntologyClass genotype = 6;
// For mice the background of the variant is also required
// e.g. involves: 129S1/Sv * 129X1/SvJ * C57BL/6J
// see http://www.informatics.jax.org/allele/MGI:3690325#phenotypes for examples
String background = 7;
}
// See http://informatics.jax.org/mgihome/nomen/
// To encode the allele Fbn1<sup>tm1Hcd</sup>
message MouseAllele {
string id = 1;
// e.g., Fbn1
string gene = 2;
// The allele_code should be used for the allele name or lab code, which is written
// in superscript according to the International Committee on Standardized Genetic
// Nomenclature for Mice
// e.g. tm1Hcd
string allele_code = 3;
}
So using the above the example would become:
ht2
"variants": [{
"mouseAllele": {
"id": "",
"gene": "Fbn1",
"allele_name": "tm1Hcd"
},
"genotype": {
"id": "GENO:0000135",
"label": "heterozygous"
},
"background": "involves: 129S1/Sv * 129X1/SvJ * C57BL/6J"
}]
hm1
"variants": [{
"mouseAllele": {
"id": "",
"gene": "Fbn1",
"allele_name": "tm1Hcd"
},
"genotype": {
"id": "GENO:0000136",
"label": "homozygous"
},
"background": "involves: 129S1/Sv * 129X1/SvJ * C57BL/6J"
}]
cx3:
"variants": [{
"mouseAllele": {
"id": "",
"gene": "Fbn1",
"allele_name": "tm1Hcd"
},
"genotype": {
"id": "GENO:0000135",
"label": "heterozygous"
},
"background": "involves: 129P2/OlaHsd * 129S1/Sv * 129X1/SvJ * C57BL/6J "
},
{
"mouseAllele": {
"id": "",
"gene": "Tgfb2",
"allele_name": "tm1Doe"
},
"genotype": {
"id": "GENO:0000135",
"label": "heterozygous"
},
"background": "involves: 129P2/OlaHsd * 129S1/Sv * 129X1/SvJ * C57BL/6J "
}]
@mellybelly
One of the problems we have with model organism phenopackets is they will necessarily represent populations rather than individuals. e.g. there can be variable penetrance & expressivity of a phenotype for a given genotype.
Perhaps, but not always. For example an IMPC knockout will have an equal number of males and females for any given knockout. These could be represented as a cohort with each mouse' specific phenotypes recorded in a distinct phenopacket. This allows for phenotypic variability for a genotype. For a more nebulous 'mouse model' i.e. an amalgam we ought to have a frequency associated with the phenotype. Currently this can be represented as an ontology term in the Phenotype.modifiers
field.
Hi Jules, I like your suggestion for the mouse allele message. Should we implement it and revise the documentation accordingly? @mellybelly -- note that IMPC could represent its data as a cohort of Phenopackets, and we should talk to Terry about whether this is of interest/relevant.
I have made a PR for the current status of the documentation -- @julesjacobsen I do not want to get things too mixed up, please let me know how we should proceed to implement the mouseAllele class? https://github.com/phenopackets/phenopacket-schema/pull/86
Given you like my suggestion I'll implement it and push to master for you to pull and then we can merge your changes.
See commit: 8a3114054df6ee27259600222ca8205c8917f1aa
Just another thought - MouseAllele
is a pretty shabby name. perhaps RodentAllele
would be more accurate, but whats the acronym for the nomenclature committee? They seem to have been agreed by both the mouse and rat communities.
I would get input from Cindy and Mary
Haven't looked at your schema closely but does it handle transgene alleles?
I'd opt for a more generic scheme here that is extensible to any nomenclature, e.g. a tuple of string and nomenclature. I'd use gene IDs for genes
I believe it can, when used as part of the gene
field.
http://www.informatics.jax.org/mgihome/nomen/gene.shtml#transg
Cynthia and Carol are having a look through this on Thursday.
I think I commented on this in a bit of a hurry before. Looks like @cindyJax has taken a look as I see some other tickets. I chatted very briefly with Terry last week.
I don't have much to add beyond what Cindy already posted in her tickets (thanks!) and what @mellybelly said earlier in this ticket: "One of the problems we have with model organism phenopackets is they will necessarily represent populations rather than individuals. e.g. there can be variable penetrance & expressivity of a phenotype for a given genotype." - I'd go further, in fact the example here is for an allele, not a population!
I think it's worthwhile to experiment with extending phenopackets beyond the scope of representing individual human patients, but I would be more comfortable doing this on a branch rather than master. I am worried about consequences of both broadening the scope of what "subject" means in phenopackets as well as baking in assumptions about how model organism databases model complex genotypes.
I think we can leave model organisms for v2. IMPC in principle has individual level phenotypes and this was the original thought. In principle, we should have another message type, such as Model, that would use the elements appropriately.
Somehow I missed this whole thread before I posted the other tickets (sorry about the number, trying to be thorough). We (at MGI) had discussed the issue at hand - how do mouse populations relate to individual patient data? Much of what you have pulled together so far is quite human-centric.
After many discussions, the decision is to table the mouse/model phenopacket until the VR group have finalized their model so that we can represent the variants in VR
I added a new test class (branch mgi_model) that creates a phenopacket to describe this mouse model http://www.informatics.jax.org/allele/MGI:3690325 I think we need to add a new variant type -- I would suggest we ask Judy,Carol,Terry and others what they think would work best. Do we have any other suggestions? @mellybelly @cmungall @julesjacobsen