monarch-initiative / monarch-app

Monarch Initiative website and API
https://monarchinitiative.org/
BSD 3-Clause "New" or "Revised" License
17 stars 5 forks source link

Confirm that IMPC genotypes are all present in the graph #771

Open kevinschaper opened 2 months ago

kevinschaper commented 2 months ago

As part of our genotypes and variants super ticket we identified that we want to confirm that our pipeline that brings in genotypes from Alliance of Genome Resources, coming originally from MGI is bringing in all of the genotypes represented in IMPC.

Here are genotype-phenotype associations from IMPC: http://ftp.ebi.ac.uk/pub/databases/impc/all-data-releases/latest/results/genotype-phenotype-assertions-ALL.csv.gz

That might be the best file to look at? maybe we confirm genotype to phenotype associations are all present as well, so that we know whether we should import directly from IMPC.

madanucd commented 1 week ago

The following files are considered for the alias results mentioned below:

  1. Monarch Knowledge Graph (KG) Release:
    monarch-kg.tar.gz (Aug 12, 2024 release)

  2. IMPC Data Release:
    genotype-phenotype-assertions-ALL.csv.gz (Jun 13, 2024 release)

Despite the IMPC file indicating genotype-phenotype assertions, it lacks explicit genotype identifiers or names. However, it provides detailed information on markers, alleles, and strains, which are utilized for mapping and identification within the Monarch KG. The following results summarize the findings:

Summary of Findings

Source Nodes (Unique) Edges Edges (Unique)
Monarch_KG 77,539 383,210 377,229
IMPC 8,263 67,619 43,776
Common 3,965 30,315 20,225

Challenges in Genotype Identification

The absence of genotype details poses significant challenges, particularly given the complexity of nomenclature for genotypes, alleles, strains, and genetic compositions. For instance, strain names can vary significantly depending on the vendor, as illustrated below:

This variability is not limited to strain names; allelic compositions and genetic backgrounds also often include vendor-specific abbreviations appended to their names.

.
madanucd commented 1 week ago

@kevinschaper 👆