Open Xuemin-Wang opened 2 months ago
I checked the missingness of samples and variants. Samples had a missing genotype rate of 1.397% - 5.215%; whereas 16,543,401 out of 18,879,304 variants that had a MAC >= 5 had a missing genotype < 1%.
Hi,
Could you please run step 2 with --write-samples
to get the list of 624 sample IDs used in the analysis then pass that file to PLINK when applying the MAC 5 filter on the "../bgen/final" PGEN fileset?
Cheers, Joelle
Dear REGENIE developers,
I'm using REGENIE v3.5.gz on 237 cases and 387 controls. Genotypes of my samples were jointly called from WGS data, which had 50,811,891 variants. It was mentioned that 44,639,814 variants were dropped due to low MAC (default 5; "Number of ignored tests due to low MAC : 44639814"), resulting in 6,172,077 variants in the output file. To investigate whether there were so many variants with a MAC < 5, I firstly prefilter variant by plink as below and found there were 18,879,304 variants that had a MAC >= 5.
plink2 \ --pfile ../bgen/final \ --keep ../qcfiles/eur_samples_to_keep.txt \ --set-missing-var-ids @:# \ --mac 5 \ --make-pgen \ --out ../bgen/final624.mac5
I re-ran regenie using the prefiltered genotypes (../bgen/final624.mac5) and found two thirds of variants were still dropped, leaving 6,085,552 variants in the output result file. Those ignored variants had a MAC of 5 or above and shouldn't have been dropped from the test.
autosomal variants used step 1 were LD pruned and filtered by --maf 0.02 --hwe 1e-6 by plink. variants in the MHC region were not included in step 1 prediction as shown below.
pruning to remove highly correlated SNPs
plink2 \ --pfile ../bgen/final \ --exclude range ../qcfiles/mhc_range.txt \ --keep ../qcfiles/eur_samples_to_keep.txt \ --set-missing-var-ids @:# \ --rm-dup exclude-all \ --maf 0.02 \ --hwe 1e-6 \ --indep-pairwise 500 50 0.1 \ --out ../bgen/final.step1
generate regenie step 1 data
plink2 \ --pfile ../bgen/final \ --chr 1-22 \ --keep ../qcfiles/eur_samples_to_keep.txt \ --extract ../bgen/final.step1.prune.in \ --set-missing-var-ids @:# \ --make-pgen \ --out ../bgen/final.step1.variants
Here's the script to run step 1. regenie \ --step 1 \ --pgen ../bgen/final.step1.variants \ --phenoFile ../pheno.txt \ --covarFile ../covariates.txt \ --covarColList SEX,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10 \ --bsize 1000 \ --lowmem \ --lowmem-prefix $TMPDIR/regenie_tmp_preds_all \ --bt \ --write-null-firth \ --out ../regenie/step_1_out/step_1
And here's first few lines of the step2 log file.
Copyright (c) 2020-2024 Joelle Mbatchou, Andrey Ziyatdinov and Jonathan Marchini. Distributed under the MIT License. Compiled with Boost Iostream library. Using Intel MKL with Eigen.
Log of output saved in file : ../regenie/res/step2_final_ADD.log
Options in effect: --step 2 \ --pgen ../bgen/final \ --minMAC 1 \ --test additive \ --bsize 1000 \ --phenoFile ../pheno.txt \ --covarFile ../covariates.txt \ --covarColList SEX,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10 \ --pred ../regenie/step_1_out/step_1_pred.list \ --bt \ --firth \ --approx \ --firth-se \ --use-null-firth ../regenie/step_1_out/step_1_firth.list \ --af-cc \ --gz \ --out ../regenie/res/step2_final_ADD
Association testing mode with fast multithreading using OpenMP
Would you be able to help me out? Please let me know if other info is required for debugging.
Many thanks, Patrick