ksamuk / pixy

Software for painlessly estimating average nucleotide diversity within and between populations
https://pixy.readthedocs.io/
MIT License
115 stars 14 forks source link

Pi calculation error - ValueError: Cannot convert non-finite values (NA or inf) to integer #26

Closed mishaploid closed 3 years ago

mishaploid commented 3 years ago

Running into what appears to be a numpy/pandas issue when attempting to calculate pi, which may be related to the size of the input VCF (see https://github.com/pandas-dev/pandas/issues/35227). May also be a consequence of installing via pip. Any advice much appreciated!

Log info:

[pixy] pixy 1.0.0.beta1
[pixy] See documentation at https://pixy.readthedocs.io/en/latest/

[pixy] Validating VCF and input parameters...
[pixy] Checking write access...OK
[pixy] Checking CPU configuration...OK
[pixy] Checking for invariant sites...OK
[pixy] Checking chromosome data...OK
[pixy] Checking intervals/sites...OK
[pixy] Checking sample data...OK
[pixy] All initial checks past!

[pixy] Preparing for calculation of summary statistics: pi
[pixy] Data set contains 18 population(s), 1 chromosome(s), and 242 sample(s)
[pixy] Window size: 10000 bp

[pixy] Started calculations at 09:22:09 on 2021-03-25
[pixy] Using 16 out of 96 available CPU cores

[pixy] Processing chromosome/contig C9...
[pixy] Calculating statistics for region C9:1-63239560...
Traceback (most recent call last):
  File "/home/sdturner/.conda/envs/bo-demography/bin/pixy", line 8, in <module>
    sys.exit(main())
  File "/home/sdturner/.conda/envs/bo-demography/lib/python3.5/site-packages/pixy/__main__.py", line 323, in main
    outsorted[cols] = outsorted[cols].astype('Int64')
  File "/home/sdturner/.conda/envs/bo-demography/lib/python3.5/site-packages/pandas/util/_decorators.py", line 178, in wrapper
    return func(*args, **kwargs)
  File "/home/sdturner/.conda/envs/bo-demography/lib/python3.5/site-packages/pandas/core/generic.py", line 5001, in astype
    **kwargs)
  File "/home/sdturner/.conda/envs/bo-demography/lib/python3.5/site-packages/pandas/core/internals.py", line 3714, in astype
    return self.apply('astype', dtype=dtype, **kwargs)
  File "/home/sdturner/.conda/envs/bo-demography/lib/python3.5/site-packages/pandas/core/internals.py", line 3581, in apply
    applied = getattr(b, f)(**kwargs)
  File "/home/sdturner/.conda/envs/bo-demography/lib/python3.5/site-packages/pandas/core/internals.py", line 575, in astype
    **kwargs)
  File "/home/sdturner/.conda/envs/bo-demography/lib/python3.5/site-packages/pandas/core/internals.py", line 664, in _astype
    values = astype_nansafe(values.ravel(), dtype, copy=True)
  File "/home/sdturner/.conda/envs/bo-demography/lib/python3.5/site-packages/pandas/core/dtypes/cast.py", line 702, in astype_nansafe
    raise ValueError('Cannot convert non-finite values (NA or inf) to '

OS information linux-64

ksamuk commented 3 years ago

Hmm, that's a tough one. The repaired version is now up on conda-forge, would it be possible to try a re-install via conda in your snakemake environment? Alternatively, could you try to see if you get this error outside of your custom environment (e.g. in a fresh pixy environment)? Also, maybe a quick fix: if this is a df size limit thing in pandas, try reducing the chunk size to something like '--chunk_size 50000'. Let me know how it goes!

mishaploid commented 3 years ago

Thanks so much for the quick reply! And apologies, forgot to share my original command but I did try '--chunk_size 50000' for the previous run. Currently running in a fresh pixy environment and will share the outcome :)

ksamuk commented 3 years ago

Sounds good! You can actually go even lower on the chunk size, (e.g. 10000, the same as the window size), although there is performance decrease.

mishaploid commented 3 years ago

Success with the fresh environment! Seems like it's an issue with the snakemake env. Thanks!