qiime2 / q2-diversity

BSD 3-Clause "New" or "Revised" License
4 stars 45 forks source link

beta_phylogenetic_alt: data must be symmetric and cannot contain NaNs. #165

Closed sjanssen2 closed 7 years ago

sjanssen2 commented 7 years ago

Last of the following three command exists with (I am using qiime2-2017.9):

Traceback (most recent call last):
  File "/home/sjanssen/miniconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/q2cli/commands.py", line 218, in __call__
    results = action(**arguments)
  File "<decorator-gen-258>", line 2, in beta_phylogenetic_alt
  File "/home/sjanssen/miniconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/qiime2/sdk/action.py", line 201, in callable_wrapper
    output_types, provenance)
  File "/home/sjanssen/miniconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/qiime2/sdk/action.py", line 334, in _callable_executor_
    output_views = callable(**view_args)
  File "/home/sjanssen/miniconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/q2_diversity/_beta/_method.py", line 110, in beta_phylogenetic_alt
    variance_adjusted=variance_adjusted, bypass_tips=bypass_tips)
  File "/home/sjanssen/miniconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/unifrac/_methods.py", line 103, in unweighted
    variance_adjusted, 1.0, bypass_tips, threads)
  File "unifrac/_api.pyx", line 90, in unifrac._api.ssu
  File "/home/sjanssen/miniconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/skbio/stats/distance/_base.py", line 107, in __init__
    self._validate(data, ids)
  File "/home/sjanssen/miniconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/skbio/stats/distance/_base.py", line 872, in _validate
    "Data must be symmetric and cannot contain NaNs.")
skbio.stats.distance._base.DistanceMatrixError: Data must be symmetric and cannot contain NaNs.

Plugin error from diversity:

  Data must be symmetric and cannot contain NaNs.

See above for debug info.

qiime tools import --input-path input.biom --type FeatureTable[Frequency] --source-format BIOMV210Format --output-path input

qiime tools import --input-path reference.tree --output-path reference_tree.qza --type Phylogeny[Rooted]

qiime diversity beta-phylogenetic-alt --i-table input.qza --i-phylogeny reference_tree.qza --p-metric unweighted_unifrac --p-n-jobs 10 --o-distance-matrix unweighted_unifrac

How comes that the plugin supposed to produce a skbio DistanceMatrix object violates it's properties? Am I doing something wrong here, or is that a real bug?

input.biom.zip reference.tree.zip

sjanssen2 commented 7 years ago

hm maybe more a https://github.com/biocore/unifrac question @wasade @ElDeveloper ?

wasade commented 7 years ago

The tree and table produce a valid distance matrix when run directly against ssu which is the underlying code used by beta-phylogenetic-alt. The resulting matrix is almost entirely zero as to be expected with an almost fully dense table and unweighted UniFrac, and it loads into an skbio.DistanceMatrix without error.

Have you tested this against the implementation in beta-phylogenetic?

sjanssen2 commented 7 years ago

I did and got back this distance matrix (after exporting it from the q2 artifact):

beta_closedref_unweighted_unifrac.txt

sjanssen2 commented 7 years ago

Same with good old qiime1

wasade commented 7 years ago

I cannot reproduce the same beta_closedref_unweighted_unifrac.txt result. However, the results I obtain using the provided inputs with ssu directly, beta-phylogenetic (2017.9), beta-phylogenetic-alt (2017.9), and QIIME v191 are all consistent.

It is possible there is a pathological edge case here as the input is quite unusual in that the matrix is nearly entirely dense (and unweighted UniFrac is qualitative).

Stefan, perhaps we should just discuss this after lab meeting tomorrow?

ebolyen commented 7 years ago

It looks like this is potentially a bug in beta-phylogenetic-alt. I'm going to rename the title and label as such. Let us know what you find!

wasade commented 7 years ago

On a followup discussion with @sjanssen2, the issue stems from requesting more threads than there are targets for. This is an edge case in so much as requesting a large number of threads for a very small number of samples should be uncommon, but it is an important edge case to handle appropriately nonetheless.

ElDeveloper commented 7 years ago

Should this issue be closed in favor of #166 then?

ebolyen commented 7 years ago

@ElDeveloper, looks like it, that's a prettier issue. Thanks @sjanssen2!