meyer-lab-cshl / plinkQC

R package for quality control of plink genetic datasets
Other
55 stars 28 forks source link

Issue with data pruning for ancestry estimation #54

Open safamajeed opened 8 months ago

safamajeed commented 8 months ago

I'm trying to estimate ancestry using the plink tutorial by Hannah Meyer. I've gotten to the Prune study data step but when I try running:

plink2 --bfile $qcdir/$name.no_ac_gt_snps \ --exclude range $refdir/$highld \ --indep-pairwise 50 5 0.2 \ --allow-extra-chr \ --out $qcdir/$name.no_ac_gt_snps

I get the following error: 13362315 variants remaining after main filters. Error: --indep-pairwise requires unique variant IDs. (--set-all-var-ids and/or --rm-dup may help.)

I've tried adding --set-missing-var-ids '@:#$r,$a' (which works but I still get the same error) and --rm-dup list (but I get Error: 286705 duplicate IDs with inconsistent genotype data or variant). If I overwrite all variant ids with --set-all-var-ids then the SNP ids are all replaced and it cannot correlate variants in the next step: plink2 --bfile $qcdir/$name.no_ac_gt_snps \ --extract $qcdir/$name.no_ac_gt_snps.prune.in \ --make-bed \ --allow-extra-chr \ --out $qcdir/$name.pruned

--extract: 0 variants remaining. Error: No variants remaining after main filters.

I'm using iOS, PLINK v2.00a4.4LM 64-bit Intel