qiime2 / q2-longitudinal

QIIME 2 plugin for paired sample comparisons
BSD 3-Clause "New" or "Revised" License
9 stars 18 forks source link

`pairwise-distances` fails when samples without state, group, or individual are in metadata but not distance matrix #172

Closed gregcaporaso closed 2 years ago

gregcaporaso commented 2 years ago

Bug Description When running pairwise-differences and pairwise-distances, I notice a discrepancy in how samples that are present in the metadata but not the data (e.g., alpha diversity for pairwise-differences, or a distance matrix for pairwise-distances) are handled. These "extra samples" in the metadata are ignored by pairwise-differences, but can cause a failure with pairwise-distances if they don't contain information in the state, group, and individual columns. Ignoring these "extra samples," as pairwise-differences does, is more convenient as it allows the user to maintain a single metadata file, even if there are some samples in there that aren't relevant for the pairwise-* analyses. As it stands, I have to comment out or remove the "extra samples" from my metadata file to use it with pairwise-distances.

Steps to reproduce the behavior

  1. Download the three files used in the q2-longitudinal tutorial.
  2. Edit the metadata file to add a few extra samples. All columns other than sample-id can be left blank for these. For example, the end of my modified metadata file looks like:
10249.C057.11SS y   277 Cesarean    bd  eb  C   9   9.1 Female  57
control-1
control-2
control-3
  1. Run the following command - it will succeed.
    qiime longitudinal pairwise-differences \
    --m-metadata-file ecam-sample-metadata.tsv \
    --m-metadata-file shannon.qza \
    --p-metric shannon \
    --p-group-column delivery \
    --p-state-column month \
    --p-state-1 0 \
    --p-state-2 12 \
    --p-individual-id-column studyid \
    --p-replicate-handling random \
    --o-visualization pairwise-differences-w-controls.qzv
  2. Run the following command - it will fail with the following error message:

    
    qiime longitudinal pairwise-distances \
    --i-distance-matrix unweighted_unifrac_distance_matrix.qza \
    --m-metadata-file ecam-sample-metadata-w-controls.tsv \
    --p-group-column delivery \
    --p-state-column month \
    --p-state-1 0 \
    --p-state-2 12 \
    --p-individual-id-column studyid \
    --p-replicate-handling random \
    --o-visualization pairwise-distances-w-controls.qzv
    Plugin error from longitudinal:
    
    State 0.0 is not represented by any members of nan group in metadata. Consider using a different group_column or state value.

Debug info has been saved to /var/folders/jp/_7s6840d2gdf_gkg1291p7x00000gn/T/qiime2-q2cli-err-e4aw2h6b.log


**Expected behavior**
`pairwise-distances` should ignore those extra samples. 

**Computation Environment**
- OS: macOS Big Sur
- QIIME 2 version: 2022.2.1
- q2-lognitudinal version: 2022.2.0

**Full traceback from the above error**

Traceback (most recent call last): File "/Users/gregcaporaso/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/q2cli/commands.py", line 339, in call results = action(arguments) File "", line 2, in pairwise_distances File "/Users/gregcaporaso/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 245, in bound_callable outputs = self._callableexecutor(scope, callable_args, File "/Users/gregcaporaso/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 453, in _callableexecutor ret_val = self._callable(output_dir=temp_dir, view_args) File "/Users/gregcaporaso/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/q2_longitudinal/_longitudinal.py", line 101, in pairwise_distances _validate_input_values(metadata, None, individual_id_column, group_column, File "/Users/gregcaporaso/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/q2_longitudinal/_utilities.py", line 59, in _validate_input_values _validate_state_in_dataframe( File "/Users/gregcaporaso/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/q2_longitudinal/_utilities.py", line 68, in _validate_state_in_dataframe raise ValueError( ValueError: State 0.0 is not represented by any members of nan group in metadata. Consider using a different group_column or state value.

lizgehret commented 2 years ago

Hey @gregcaporaso! I took a look at this with @ebolyen today - this should be resolvable by filtering down the group_column to only include rows with data in them (i.e. non-empty rows). I'll link a PR here once I have a working draft with those changes!

lizgehret commented 2 years ago

PR with fix can be found here: #174