Need to set --rare-mac to a large value to get robust interaction results when interactor is rare

I'm doing an interaction test in the UK Biobank between a particular candidate gene's rare-variant burden (100 carriers out of 500,000) and each other gene's rare-variant burden. (I specified the candidate gene's burden as a column in the covariate file and used --interaction for convenience.)

I got hugely inflated results for some genes (like p < 1e-50), even though the overall inflation was minimal. All of the top "hit" genes I checked had only two carriers of rare variants in both the candidate gene and the "hit" gene, but because the number of rare-variant carriers in the "hit" gene was > 1000 (which is the --rare-mac threshold), it uses the sandwich estimator rather than the more robust heteroskedastic linear model that's used for genes below the --rare-mac threshold.

I solved the problem by setting --rare-mac 999999 - all of the "hit" genes went away and there were no true positive hits. What do you think of scrapping the sandwich estimator and using the heteroskedastic linear model for all variants regardless of frequency? You could set --rare-mac to infinity by default, in other words.

rgcgithub / regenie

Need to set --rare-mac to a large value to get robust interaction results when interactor is rare #417