monarch-initiative / GENO-ontology

Repository for representing genotypes and their association with phenotypes
18 stars 6 forks source link

Collecting use cases/requirements for genotype data representation and use #29

Open mbrush opened 7 years ago

mbrush commented 7 years ago

Seems that there have been discussions/ideas happening about genotype/variant representation - with use cases/requirements coming from Monarch and other efforts (e.g. JAX, MGI, MPD, AGR, etc).

I'd like to collect info on these efforts and any requirements/thoughts about genotype data representation, querying, operations, and rendering in applications, etc. - ideally in advance of the Feb 28 'Genotype Representation' session at the Monarch All Hands.

@cmungall @mellybelly @pnrobinson @sbello, others - please jot some thoughts down if you have a chance. Thanks!

sbello commented 7 years ago

The AGR groups annotate disease data to a variety of entities, including; 'fish', strains, genotypes, alleles, transgenes/constructs and genes. Ideally, we would like an ontology that would encompass all these entities. 'Fish' encompasses genotype + environment to use in the disease annotation files and eventually in the phenotype annotation files.

mbrush commented 7 years ago

@sbello GENO should cover all of these entities - although there are some nuances around how 'fish' and 'constructs' might map to concepts in GENO. We should discuss to ensure GENO meets AGR needs here.

mellybelly commented 7 years ago

also note that GENO was developed in part based on the modeling we did at ZFIN ;-)

mbrush commented 7 years ago

An example of a very specific modeling requirement from @dosumis related to transgene representation to accommodate data from the VirtualFlyBrain and Flybase projects. Essentially:

  1. Transgene representation needs to account for perspective where a single transgenic insertion can generate more than one variant allele (where causative variant may depend on the design/perspective of a given G2P experiment).
  2. Transgene representation must recognizes contributions to a particular transgene from a transgenic construct and the endogenous genome.

These requirements are particularly relevant for things like enhancer traps that can integrate into (and create a variant allele of) an existing gene, and also usurp its enhancer to drive expression of some transgenic product. e.g. the fly enhancer trap variant Scer\GAL4(Bx-MS1096). Further documented in @dosumis's doc here.

dosumis commented 7 years ago

Transgene representation needs to account for perspective where a single transgenic insertion can generate more than one variant allele.

This is not a fly specific issue. To give one major example, the mouse phenotype consortium generates knock-ins for each targeted gene. These can be considered to be alleles of the knocked in marker (lacZ) which capture the regulatory sequences of the targeted gene as well as knock-out (null?) alleles of the targeted gene. Even if you don't treat the transgene as an allele of the gene it carries, you still need to take into account its nature as an artificial gene encompassing inserted ORF + regulatory sequences from its genomic neighborhood and keep that separate from the effect on the targeted gene.

kyook commented 7 years ago

WormBase has variation types of

-Engineered_allele -Allele -SNP -Confirmed_SNP -Predicted_SNP -RFLP with attached tags of Reference_strain_digest and Polymorphic_strain_digest Transposon_insertion with cross reference to ?Transposon_family -Natural_variant

Engineered_allele is the tag we use for Crispr alleles and transgenic insertions into known defined genetic loci.

For transgenic insertions into noncoding or undefined regions, we only assign a transgene name and keep them in the transgene class.

You should be able to access our variation schema here: http://www.wormbase.org/tools/schema/run search for 'Variation' in the Class box.

Karen

On Mon, Feb 27, 2017 at 7:20 AM, David Osumi-Sutherland < notifications@github.com> wrote:

Note: this is far from fly specific. The mouse phenotype consortium generates Knock ins for each targeted gene. These can be considered to be alleles of the knocked in marker (lacZ) which capture the regulatory sequences of the targeted gene as well as knock out (null?) alleles of the targeted gene.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/monarch-initiative/GENO-ontology/issues/29#issuecomment-282749995, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlVGuDdFQq9a3xlnhk_cytFh0TWFf_yks5rgunNgaJpZM4MKhWK .