As a developer, I want to calculate functional consequence scores using VEP annotations for each variant in a credible set because these scores will inform how damaging the variants are, which is important for prioritising genes.
Background
We have functional consequence data from VEP in the variant index object. The goal for L2G is to assess the functional impact of these variants and use these score for prioritising genes from GWAS loci.
We have assigned scores to these consequences to quantify how damaging each variant may be, and now need to implement new features to incorporate these scores into our set of functional genomics features. As described in https://github.com/opentargets/issues/issues/3552, we only want to take into consideration protein coding genes when calculating the neighbourhood features.
An accurate definition of the features is available in the features specifications document, and has been agreed with @addramir.
Tasks
[X] Write a method in the VariantIndex class that extracts VEP information
[x] New features to add:
vepMaximum: Max VEP score per gene across all variants in given credible set
vepMaximumNeighbourhood: Max VEP score across all variants relative to the mean VEP score across all genes in the vicinity
vepMean: Mean VEP score per gene weighted by posterior probabilities across credible set
vepMeanNeighbourhood: Mean VEP score across all variants relative to the mean VEP score across all genes in the vicinity
As a developer, I want to calculate functional consequence scores using VEP annotations for each variant in a credible set because these scores will inform how damaging the variants are, which is important for prioritising genes.
Background
We have functional consequence data from VEP in the variant index object. The goal for L2G is to assess the functional impact of these variants and use these score for prioritising genes from GWAS loci.
We have assigned scores to these consequences to quantify how damaging each variant may be, and now need to implement new features to incorporate these scores into our set of functional genomics features. As described in https://github.com/opentargets/issues/issues/3552, we only want to take into consideration protein coding genes when calculating the neighbourhood features.
An accurate definition of the features is available in the features specifications document, and has been agreed with @addramir.
Tasks
vepMaximum
: Max VEP score per gene across all variants in given credible setvepMaximumNeighbourhood
: Max VEP score across all variants relative to the mean VEP score across all genes in the vicinityvepMean
: Mean VEP score per gene weighted by posterior probabilities across credible setvepMeanNeighbourhood
: Mean VEP score across all variants relative to the mean VEP score across all genes in the vicinity