kauwelab / PolyRiskScore

PRSKB is a website and command-line interface tool for calculating polygenic risk scores using GWA studies from the NHGRI-EBI Catalog.
23 stars 1 forks source link

Falsely claiming duplicated SNPs #408

Closed tantrev closed 1 year ago

tantrev commented 2 years ago

Greetings,

I am trying to run PRSKB on my genome with this command

./runPrsCLI.sh -f /mnt/c/Users/tantr/Downloads/trevor.sorted.vcf/trevor.sorted.vcf -o trevor_test.no_impute.tsv -h 1 -r hg19 -c 0.05 -p EUR

and the command keeps failing, telling me:

Found multiple lines for single SNP .. Please consolidate into a single line in the input file and run again.
ERROR DURING CREATION OF FILTERED INPUT FILE... Quitting

The problem is if I run this command:

cut -f 1,2 trevor.sorted.vcf | sort | uniq -c | wc -l

I get 4847675

And if I run cut -f 1,2 trevor.sorted.vcf | wc -l

I again get 4847675.

Any idea what's going on? Thank you in advance.

mpage21 commented 2 years ago

Hi @tantrev, could you send us a few lines from your vcf so that we can try to replicate the issue? Also, could you check to make sure that all the SNP IDs in column 3 (if present) are unique as well?