Closed AnnaFurtjes closed 3 years ago
Hi,
Regenie does not perform additional QC checks for step 1, it will only look in the genotype file for variants whose IDs match those in the --extract file.
One way to better assess what is going on would be to use (example with PLINK BED where 2nd column in .bim is variant ID)
grep -wFf extract.file <( cut -f2 geno.bim ) | wc -l
where 'geno.bim' corresponds to set of PLINK files you pass to Regenie and 'extract.file' is the file you pass to --extract
in Regenie (for PGEN you would use 3rd column in .pvar and for BGEN you could use a .bgi index file to get variant IDs).
Does that show "587,583" or "571,257" ?
Hi, Thanks so much for getting back to me about this!
It does indeed show 571,257. I will look into why my geno.bim file is missing those SNPs. Thanks!
Thank you for creating this great tool!
I have a question regarding SNP selection in step 1. I am using a snplist containing 587,583 SNPs to indicate to regenie with the --extract command which SNPs to keep. It runs fine but the log file indicates that it only considers 571,257 SNPs. I wondered if you had an answer as to why regenie excludes ~16,000 SNPs that originally survived my quality control? Does it perform checks on SNP IDs in the background?
Thanks so much for taking the time to read this!