Open mbrush opened 7 years ago
The AGR groups annotate disease data to a variety of entities, including; 'fish', strains, genotypes, alleles, transgenes/constructs and genes. Ideally, we would like an ontology that would encompass all these entities. 'Fish' encompasses genotype + environment to use in the disease annotation files and eventually in the phenotype annotation files.
@sbello GENO should cover all of these entities - although there are some nuances around how 'fish' and 'constructs' might map to concepts in GENO. We should discuss to ensure GENO meets AGR needs here.
also note that GENO was developed in part based on the modeling we did at ZFIN ;-)
An example of a very specific modeling requirement from @dosumis related to transgene representation to accommodate data from the VirtualFlyBrain and Flybase projects. Essentially:
These requirements are particularly relevant for things like enhancer traps that can integrate into (and create a variant allele of) an existing gene, and also usurp its enhancer to drive expression of some transgenic product. e.g. the fly enhancer trap variant Scer\GAL4(Bx-MS1096). Further documented in @dosumis's doc here.
Transgene representation needs to account for perspective where a single transgenic insertion can generate more than one variant allele.
This is not a fly specific issue. To give one major example, the mouse phenotype consortium generates knock-ins for each targeted gene. These can be considered to be alleles of the knocked in marker (lacZ) which capture the regulatory sequences of the targeted gene as well as knock-out (null?) alleles of the targeted gene. Even if you don't treat the transgene as an allele of the gene it carries, you still need to take into account its nature as an artificial gene encompassing inserted ORF + regulatory sequences from its genomic neighborhood and keep that separate from the effect on the targeted gene.
WormBase has variation types of
-Engineered_allele -Allele -SNP -Confirmed_SNP -Predicted_SNP -RFLP with attached tags of Reference_strain_digest and Polymorphic_strain_digest Transposon_insertion with cross reference to ?Transposon_family -Natural_variant
Engineered_allele is the tag we use for Crispr alleles and transgenic insertions into known defined genetic loci.
For transgenic insertions into noncoding or undefined regions, we only assign a transgene name and keep them in the transgene class.
You should be able to access our variation schema here: http://www.wormbase.org/tools/schema/run search for 'Variation' in the Class box.
Karen
On Mon, Feb 27, 2017 at 7:20 AM, David Osumi-Sutherland < notifications@github.com> wrote:
Note: this is far from fly specific. The mouse phenotype consortium generates Knock ins for each targeted gene. These can be considered to be alleles of the knocked in marker (lacZ) which capture the regulatory sequences of the targeted gene as well as knock out (null?) alleles of the targeted gene.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/monarch-initiative/GENO-ontology/issues/29#issuecomment-282749995, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlVGuDdFQq9a3xlnhk_cytFh0TWFf_yks5rgunNgaJpZM4MKhWK .
Seems that there have been discussions/ideas happening about genotype/variant representation - with use cases/requirements coming from Monarch and other efforts (e.g. JAX, MGI, MPD, AGR, etc).
I'd like to collect info on these efforts and any requirements/thoughts about genotype data representation, querying, operations, and rendering in applications, etc. - ideally in advance of the Feb 28 'Genotype Representation' session at the Monarch All Hands.
@cmungall @mellybelly @pnrobinson @sbello, others - please jot some thoughts down if you have a chance. Thanks!