Open kevinschaper opened 3 months ago
The following files are considered for the alias results mentioned below:
Monarch Knowledge Graph (KG) Release:
monarch-kg.tar.gz (Aug 12, 2024 release)
IMPC Data Release:
genotype-phenotype-assertions-ALL.csv.gz (Jun 13, 2024 release)
Despite the IMPC file indicating genotype-phenotype assertions, it lacks explicit genotype identifiers or names. However, it provides detailed information on markers, alleles, and strains, which are utilized for mapping and identification within the Monarch KG. The following results summarize the findings:
Source | Nodes (Unique) | Edges | Edges (Unique) |
---|---|---|---|
Monarch_KG | 77,539 | 383,210 | 377,229 |
IMPC | 8,263 | 67,619 | 43,776 |
Common | 3,965 | 30,315 | 20,225 |
The absence of genotype details poses significant challenges, particularly given the complexity of nomenclature for genotypes, alleles, strains, and genetic compositions. For instance, strain names can vary significantly depending on the vendor, as illustrated below:
This variability is not limited to strain names; allelic compositions and genetic backgrounds also often include vendor-specific abbreviations appended to their names.
.@kevinschaper 👆
As part of our genotypes and variants super ticket we identified that we want to confirm that our pipeline that brings in genotypes from Alliance of Genome Resources, coming originally from MGI is bringing in all of the genotypes represented in IMPC.
Here are genotype-phenotype associations from IMPC: http://ftp.ebi.ac.uk/pub/databases/impc/all-data-releases/latest/results/genotype-phenotype-assertions-ALL.csv.gz
That might be the best file to look at? maybe we confirm genotype to phenotype associations are all present as well, so that we know whether we should import directly from IMPC.