rgcgithub / regenie

regenie is a C++ program for whole genome regression modelling of large genome-wide association studies.
https://rgcgithub.github.io/regenie
Other
181 stars 53 forks source link

Masks with unknown annotations do not work #541

Open Jeremy37 opened 1 month ago

Jeremy37 commented 1 month ago

I am trying to use REGENIE to do burden/set tests with different masks based on variant predicted deleteriousness - much as you did in your main UK Biobank exomes paper. For example, I thought that I could define a masks file (--mask-def) as follows:

mask_lof LoF mask_missense_all LoF,missense_0del,missense_1del,missense_5del mask_missense_1del LoF,missense_1del,missense_5del mask_missense_5del LoF,missense_5del

I then have a variant annotations file (--anno-file) such as: 22:45525550:C:T FBLN1 LoF 22:45550555:A:G FBLN1 missense_1_del 22:45577085:G:A FBLN1 missense_0_del 22:45518718:A:G FBLN1 missense_0_del 22:45527861:C:G FBLN1 missense_5_del

Here, I defined the variant's annotation based on the consequence (e.g. missense) and the number of predictors that scored the variant as pathogenic.

When running REGENIE, I get this warning:

 * masks            : [data/regenie/regenie_masks.tsv] n_masks = 4
WARNING: Detected 3 masks with unknown annotations.

REGENIE runs to completion, and provides lines in the output for these masks along with allele frequency thresholds. However, the variants used in these masks (e.g. mask_missense_all), as output in the *.snplist file, include ONLY LoF variants, not variants with the annotations defined for the mask (such as missense_0del). That is all outputs for the other masks are identical to the lof mask.

What am I doing wrong here? Is there a more suitable way to define the masks and annotations for this use case? Thanks very much for your help!

Ojami commented 1 month ago

you have missense_1_del in anno-file, while missense_1del in mask file. An underline _ is missing, no?

Jeremy37 commented 3 weeks ago

Embarrassed to say that I think that's it! I was sure I had checked for identical annotation values, but apparently it was a mistake. Thank you.