simonhmartin / genomics_general

General tools for genomic analyses.
343 stars 93 forks source link

error with popgenWindows.py: "All populations must be represented by at least one sample." #107

Closed mayjean21 closed 1 year ago

mayjean21 commented 1 year ago

Hi Simon,

I am having an issue with popgenWindows.py. I am trying to calculate pi, Fst and Dxy (as seen in your example). I use:

popgenWindows.py -g shrike2.6.3.geno.gz -o shrike2.6.3.Fst.Dxy.pi.csv.gz -f phased -w 20000 -m 10000 -s 20000 -p PundPyt -p NyerPyt -p NyerMak -p PundMak -p kivu --popsFile pop.info

And get the error: Traceback (most recent call last): File "/Volumes/Mayjean/14_ManhattanPlot/./popgenWindows.py", line 275, in for p in popInds: assert len(p) >= 1, "All populations must be represented by at least one sample." AssertionError: All populations must be represented by at least one sample.

Could this be due to an error in my pop.info file? This is what it looks like currently where one column is the sample name (which matches the vcf and geno files) and the other column is the population name, following the format in your example:

1_1 popL 2_1 popL 3_1 popS 4_1 popS 5_1 popS 6_1 popL 7_1 popL 8_1 popL 9_1 popL 10_1 popC 11_1 popC 12_1 popC 13_1 popC 14_1 popC 15_1 popC 16_1 popL

I hope I can get your assistance. Thank you!

simonhmartin commented 1 year ago

Hi, The population names in your populations file need to match the population names in your command.

mayjean21 commented 1 year ago

Ah oops, I missed that out. Thank you very much!