qiime2 / q2-feature-table

QIIME 2 plugin supporting operations on feature tables.
BSD 3-Clause "New" or "Revised" License
2 stars 37 forks source link

qiime feature-table rarefy --p-with-replacement: sum(pvals[:-1]) > 1.0 #245

Open nick-youngblut opened 3 years ago

nick-youngblut commented 3 years ago

Bug Description Running qiime feature-table rarefy --p-with-replacement sometimes generated the error: sum(pvals[:-1]) > 1.0

This is likely a float rounding issue.

Steps to reproduce the behavior

The counts per sample for my feature table are:

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
2381744 2763866 2855986 2929877 2987576 4566374 

...so it's not a problem that just occurs with a very small sample size (eg., n = 1 or n = 10)

Computation Environment

thermokarst commented 3 years ago

Thanks for reporting, @nick-youngblut. I think this error might be originating in the biom package, would you mind running this little bit of python code (using the offending data) to see if you can recreate it in pure biom?

import qiime2
import biom

artifact = qiime2.Artifact.load('table.qza')
table = artifact.view(biom.Table)

table.subsample(500000, axis='sample', by_id=False, with_replacement=True)
nick-youngblut commented 3 years ago

I can't seem to reproduce the error, so it appears to occur rarely.

thermokarst commented 3 years ago

Thanks @nick-youngblut. I don't think this issue can be resolved in this QIIME 2 plugin - the rarefy method just wraps biom, so I'll keep this open for now, in case you find a more reliable test case. Thanks!

nick-youngblut commented 3 years ago

I ran into the issue again, and I was able to confirm that the issue is caused by biom:

Python 3.6.10 | packaged by conda-forge | (default, Apr 24 2020, 16:42:08)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import qiime2
>>> import biom
>>> artifact = qiime2.Artifact.load('otu.qza')
table = artifact.view(biom.Table)
>>> table.subsample(500000, axis='sample', by_id=False, with_replacement=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/ebio/abt3_projects/test_project/bin/llmgp/.snakemake/conda/5b653c1a/lib/python3.6/site-packages/biom/table.py", line 2824, in subsample
    _subsample(data, n, with_replacement)
  File "biom/_subsample.pyx", line 53, in biom._subsample._subsample
  File "mtrand.pyx", line 4214, in numpy.random.mtrand.RandomState.multinomial
ValueError: sum(pvals[:-1]) > 1.0

I guess that I should post the issue on https://github.com/biocore/biom-format

mortonjt commented 7 months ago

See the link above -- @nick-youngblut maybe it is possible that you had fractional values in the biom table? Rounding to ints seems to have resolved this issue.

wasade commented 7 months ago

Thank you, @mortonjt, for opening the issue on the biom-format tracker. I was unaware of this edge case, we'll look at getting it addressed in the next release.

wasade commented 6 months ago

This issue was addressed in https://github.com/biocore/biom-format/pull/961 and it may make sense to close this issue.

As a general comment, please do consider opening issues when appropriate with affected projects so problems can be resolved in a timely manner.