ksamuk / pixy

Software for painlessly estimating average nucleotide diversity within and between populations
https://pixy.readthedocs.io/
MIT License
115 stars 14 forks source link

Fst estimation error #42

Closed Gandasegui closed 3 years ago

Gandasegui commented 3 years ago

Hello,

I am running pixy in a small dataset composed by 18 individuals, 1273 scaffolds and around 200,000 SNPs. I am trying to estimate in the same run pi, dxy and fst and also dividing the 18 individuals in different groups. While pi and dxy worked well, Fst estiamtion stops at scaffold number 550, so it is not estimated genome wide.

The error is reported as: KeyError: ('fst', in 'nDi.2.2.scaf00550')

The code I used is: pixy --stats pi fst dxy \ --vcf D_immitis.cohort.ALL.10K.vcf.gz \ --n_cores 44 \ --output_prefix aus_usa \ --populations aus_usa.txt \ --window_size 10000

I do not think it is an error in my popfiles nor my dataset, because pi and dxy are well estimated. Do you know any solution for this? Do you want me to send any files?

Thanks in advance.

ksamuk commented 3 years ago

Hi! That sounds like a potential bug in the FST function, please send me your VCF (or a subset of it that reproduces the error), along with your populations file. You can send a dropbox/drive link to ksamuk@gmail.com, or link them here. Thanks!

Gandasegui commented 3 years ago

Hi! I have created a drive folder with a README file that explains the analysis. I have also attached the original vcf file, population files and pixy output. This is the link:

https://drive.google.com/drive/folders/1zyy4B_dcn1XSqZ0E2AFqBr3PrsY3ZNia?usp=sharing

Thanks in advance.

Gandasegui commented 3 years ago

Hi,

Could you have a look at my files? Do you think I reported a bug or there is any problem with my vcf file?

Thanks

ksamuk commented 3 years ago

Hi @Gandasegui, I've been working on this, but it has been a little harder to track down. There should be a fix incoming in the next few days. So far, I don't think there is anything wrong with your VCF.

Gandasegui commented 3 years ago

Thanks @ksamuk

ksamuk commented 3 years ago

Hi there, thanks for posting your files, very helpful for tracking this down. I've posted a fix to conda-forge (pixy 1.2.4.beta1), can you try upgrading to the new version and seeing if it solves your problem?

Gandasegui commented 3 years ago

Hi @ksamuk,

I am now finding errors when trying to install pixy and htslib is the same environment. Either of both can be installed being the first to install, but later, installing the second package always gives an error. Could this be done to the modification you did? Is there any alternative?

Actually, if I install htslib first and subsequently pixy, I get this error message:

The environment is inconsistent, please check the package plan carefully The following packages are causing the inconsistency:

ksamuk commented 3 years ago

Hi there, this looks like a conda issue or one with your local environment and unrelated to pixy, but the quick fix is to make a new environment, install pixy, and then htslib. Hope that helps!

Gandasegui commented 3 years ago

Thanks a lot @ksamuk! Pixy is now working with the following message printed on the terminal:

[pixy] NOTE: The following chromosomes/scaffolds did not have sufficient data to estimate FST: nDi.2.2.scaf00550, nDi.2.2.scaf00586, nDi.2.2.scaf00610, nDi.2.2.scaf00721, nDi.2.2.scaf00757, nDi.2.2.scaf00761, nDi.2.2.scaf00781, nDi.2.2.scaf00838, nDi.2.2.scaf00871, nDi.2.2.scaf00932, nDi.2.2.scaf01066, nDi.2.2.scaf01127, nDi.2.2.scaf01165, nDi.2.2.scaf01166, nDi.2.2.scaf01168, nDi.2.2.scaf01193, nDi.2.2.scaf01201, nDi.2.2.scaf01207, nDi.2.2.scaf01220

ksamuk commented 3 years ago

Excellent, that is working as expected! Thanks for your help troubleshooting this. Let me know if you run into any other issues.