ksamuk / pixy

Software for painlessly estimating average nucleotide diversity within and between populations
https://pixy.readthedocs.io/
MIT License
115 stars 14 forks source link

Persite Fst - 'float' object is not iterable #36

Closed erikenbody closed 3 years ago

erikenbody commented 3 years ago

Describe the bug

Hi there,

I receive an error when running with a window size of 1 using the same input that runs when I run the same command, but with a larger window size. i.e. the persite estimate won't calculate, but I am able to calculate windowed estimates. I receive the same error with another dataset (different VCF), which made me wonder if this could be a true bug.

Is it possible that sites without data could be giving a problem?

The pixy command and error message

pixy --stats fst \
  --vcf $VCF \
  --chromosomes $INTERVAL \
  --window_size 1 \
  --populations $POPS_FILE \
  --bypass_invariant_check yes \
  --output_folder output_pixy1.0_PERSITE \
  --output_prefix pixy_${INTERVAL}_persite_nomaf

stdout:

[pixy] pixy 1.0.0.beta1
[pixy] See documentation at https://pixy.readthedocs.io/en/latest/

[pixy] Validating VCF and input parameters...
[pixy] Checking write access...OK
[pixy] Checking CPU configuration...OK
[pixy] Checking for invariant sites...OK
[pixy] Checking chromosome data...OK
[pixy] Checking intervals/sites...OK
[pixy] Checking sample data...OK
[pixy] All initial checks past!

[pixy] Preparing for calculation of summary statistics: fst
[pixy] Data set contains 2 population(s), 1 chromosome(s), and 27 sample(s)
[pixy] Window size: 1 bp

[pixy] Started calculations at 19:04:30 on 2021-05-28
[pixy] Using 1 out of 20 available CPU cores

[pixy] Processing chromosome/contig independent_chr...
[pixy] Calculating statistics for region independent_chr:1-19330666...

error:

TypeError: 'float' object is not iterable```

OS information LINUX (HPC)

Sample files

My VCF and population file are standard and include invariant sites and biallelic SNPs. I am happy to send these along though if it is helpful. I suspect formatting is not the problem, given that the dataset runs in windowed mode.

Thank you for any ideas! Other than this, the code is a breeze and works great thank you!

Erik

ksamuk commented 3 years ago

Hi Erik, sorry for the delay here. You've probably moved on to a different tool, but I will have a chance to look at this in the next week or so, and will try to diagnose the issue.

erikenbody commented 3 years ago

Thanks Kieran! Happy to send a subset of my data if it is helpful to you, I'd be curious to get this working to keep everything (windowed and persite) within pixy. Cheers!

ksamuk commented 3 years ago

Hi Erik, I've pushed a fix for this in the new version of pixy (1.2.0.beta1), which should now be up on conda-forge. If you have a minute, it would be great if you could give it a whirl and let me know if its working for you.

erikenbody commented 3 years ago

Brilliant, thanks for the push - I checked it just now. Sadly now I receive the following error:

TypeError: bad number of dimensions: expected 3; found 4

And I confirmed it still runs in windowed mode. Can shoot you the dataset still if you like, it is quite small.

Thank you! Erik

ksamuk commented 3 years ago

Darn sorry about that! OK yes, please do send me the dataset if possible, you can post it here or send it to ksamuk@gmail.com

ksamuk commented 3 years ago

Hi Erik, thanks for sending me your VCF for troubleshooting. I've updated how genotype data are handled for the FST calculations, and the calculations now work for your data as well as the test data. This update, pixy 1.2.2.beta1 is now on conda-forge. When you have a moment, it would be great if you could update your version and give it a try.

Also, the single site calculations are generally a lot slower, so you might want to make use of the --n_cores argument.

erikenbody commented 3 years ago

Thanks so much @ksamuk - I really appreciate your troubleshooting! Works great now with 1.2.2.beta1 (and fast!) and results mirror my own calculations.

Cheers