monarch-initiative / genophenocorr

Genotype Phenotype Correlation
https://monarch-initiative.github.io/genophenocorr/stable
MIT License
4 stars 1 forks source link

Fail if the phenopacket has variant in mismatching genome build #84

Closed ielis closed 5 months ago

ielis commented 8 months ago

We need to handle the bug #83 where ingest fails due to a variant on hg19 genome build while the app uses hg38.

Background

The PhenopacketVariantCoordinateFinder is responsible for turning GenomicInterpretation from Phenopacket Schema into VariantCoordinates. The vcf_record field of the GenomicInterpretation has a genome_assembly subfield that should contain the build of the variant in a usable format.

In case of CNVs that use VRS elements, we can use the sequence_id to test if we're on the right build. The example in phenopacket docs lists an allele with a sequence_id==NC_000010.11. The RefSeq identifier corresponds to chr10 in GRCh38.p13 build. We know this based on the assembly report tables that are in our code base. Upon inspection of both tables, we can only find the corresponding contig in GRCh38.p13 (chr10 in GRCh37.p13 corresponds to NC_000010.10, note the difference in version).

PhenopacketVariantCoordinateFinder, the parsing code, knows about GenomeBuild (field self._build) which has an identifier property. The property has the following values {'GRCh37.p13', 'GRCh38.p13'}. Therefore, we can match the identifier with variant's build to check that the variant uses the right build.

Definition of done

GenomeBuild.identifier Phenopacket
GRCh37.p13 grch37, GRCh37, GRCh37.p13, hg19, HG19, ...
GRCh38.p13 grch38, GRCh38, GRCh38.p13, hg38, HG38, ...
pnrobinson commented 8 months ago

There are phenopackets with structural variants for which we only have the label, and not the contig. This is going to be the case for everything that was identified before GS. I do not think that the CNVs should be required to have the contig.

ielis commented 8 months ago

Can you please include an example of these cases?

We may run into issues with such variants. I am not sure how to perform functional annotation without contig info. I think VEP won't talk to us..

ielis commented 6 months ago

Related to #120