rgcgithub / regenie

regenie is a C++ program for whole genome regression modelling of large genome-wide association studies.
https://rgcgithub.github.io/regenie
Other
182 stars 52 forks source link

Aggregate tests when REF is actually the minor allele #454

Open dianacornejo opened 10 months ago

dianacornejo commented 10 months ago

Hi @joellembatchou,

So I'm running regenie for UKB data while using the burden/skat-o options.

I was wondering if there's any workaround to include variants in which the REF allele is actually the minor allele (and this constitutes a rare variant).

See the example below

CHROM | ID | REF | ALT | ALT_FREQS 
22 | 22:17887300:C:A | C | A | 0.996944 

so in this case this variant will be eliminated from the burden test since the ALT allele has a frequency over a certain threshold (e.g. 0.01) but this variant is still rare if we were to count the number of C alleles and will make it in our analysis with the same threshold.

What would be the best way to include those variants in my analysis? Is it possible to give a parameter to Regenie to account for these cases? if not, would it be hard to add something like this to the code?

Thank you so much for you response

joellembatchou commented 10 months ago

Hi,

Yes this is a consequence of how REGENIE uses AAF instead of MAF to determine which variants go into the mask. One workaround is to force the REF allele to be the major allele in the genotype file. For example, this can be done in PLINK2 using --maj-ref 'force'.

Cheers, Joelle

koido commented 10 months ago

Hi @joellembatchou,

I want to ask related questions to the @dianacornejo 's question. I believe this question is also helpful for @dianacornejo .

I have two questions.

First, I think the easiest way is to set --aaf-file using MAF instead of AAF if we want to use MAF for building masks. Does this cause some problems in determining which variants go into the mask?

Second, after determining variants in the mask, does the gene-based test (such as the burden test) always count the minor (rare) allele or ALT allele?

Many thanks for providing such excellent software. I appreciate your kind help.

Best, Masaru

joellembatchou commented 9 months ago

Hi,

Right you can use --aaf-file to force which variants go into which masks but for the burden test which uses max of ALT allele counts across sites by default, having a variant where minor allele is REF will throw off the resulting burden mask (ie many of the entries will be 2).

The SKATO/ACATV test will not be affected by this though.

Cheers, Joelle

git-jemiller commented 1 month ago

Hi Joelle,

I'm also trying to address the same issue. I was wondering if the plink file I'm using as input for step1 needs to have all ref alleles as the major allele or can I just use the --maj-ref 'force' for the input to step2?

Thanks, Jason

joellembatchou commented 1 month ago

Hi,

In step 1, which allele is chosen as REF/ALT will not affect the final results but in step 2 with burden tests, since variants are collapsed into a burden score, the allele coding will matter (for single variants test as well since you will get a flipped effect size estimates depending on which allele is REF). If you want to force the major allele to be REF, you can indeed use --maj-ref 'force' in PLINK2.

Cheers, Joelle