Open dianacornejo opened 10 months ago
Hi,
Yes this is a consequence of how REGENIE uses AAF instead of MAF to determine which variants go into the mask. One workaround is to force the REF allele to be the major allele in the genotype file. For example, this can be done in PLINK2 using --maj-ref 'force'
.
Cheers, Joelle
Hi @joellembatchou,
I want to ask related questions to the @dianacornejo 's question. I believe this question is also helpful for @dianacornejo .
I have two questions.
First, I think the easiest way is to set --aaf-file
using MAF instead of AAF if we want to use MAF for building masks. Does this cause some problems in determining which variants go into the mask?
Second, after determining variants in the mask, does the gene-based test (such as the burden test) always count the minor (rare) allele or ALT allele?
Many thanks for providing such excellent software. I appreciate your kind help.
Best, Masaru
Hi,
Right you can use --aaf-file to force which variants go into which masks but for the burden test which uses max of ALT allele counts across sites by default, having a variant where minor allele is REF will throw off the resulting burden mask (ie many of the entries will be 2).
The SKATO/ACATV test will not be affected by this though.
Cheers, Joelle
Hi Joelle,
I'm also trying to address the same issue. I was wondering if the plink file I'm using as input for step1 needs to have all ref alleles as the major allele or can I just use the --maj-ref 'force' for the input to step2?
Thanks, Jason
Hi,
In step 1, which allele is chosen as REF/ALT will not affect the final results but in step 2 with burden tests, since variants are collapsed into a burden score, the allele coding will matter (for single variants test as well since you will get a flipped effect size estimates depending on which allele is REF). If you want to force the major allele to be REF, you can indeed use --maj-ref 'force'
in PLINK2.
Cheers, Joelle
Hi @joellembatchou,
So I'm running regenie for UKB data while using the burden/skat-o options.
I was wondering if there's any workaround to include variants in which the REF allele is actually the minor allele (and this constitutes a rare variant).
See the example below
so in this case this variant will be eliminated from the burden test since the ALT allele has a frequency over a certain threshold (e.g. 0.01) but this variant is still rare if we were to count the number of C alleles and will make it in our analysis with the same threshold.
What would be the best way to include those variants in my analysis? Is it possible to give a parameter to Regenie to account for these cases? if not, would it be hard to add something like this to the code?
Thank you so much for you response