qiime2 / q2-diversity

BSD 3-Clause "New" or "Revised" License
4 stars 45 forks source link

BUG: more descriptive error message for too few samples retained in 'alpha-group-significance' visualization #346

Closed lizgehret closed 9 months ago

lizgehret commented 12 months ago

Bug Description: When running alpha-group-significance, this is one of the possible errors that can be produced:

Screen Shot 2023-12-01 at 3 43 08 PM

The language in this message implies that the error stems from the contents of the metadata file (i.e. doesn't contain columns that meet requirements such as containing categorical data, not being empty, etc). However, a forum user reached out with this issue (forum post link) and upon examination of their data, the source of the error was due to the fact that they had chosen a sampling depth that only retained 2 samples. This, in turn, resulted in the metadata file pulling information on those two samples that didn't meet the stated requirements of the visualizer - specifically, that values in the column were seen as unique, since there were only two samples and the column in question had groupings that were different for the two samples.

This error message should include details regarding the sample IDs that are being looked up in the metadata file, and the language should be changed to include the possibility that one's chosen sampling depth may be too deep, causing too few samples to be retained for the visualizer.

Steps to reproduce the behavior:

  1. Download the files shared from the OP in the linked forum post above.

  2. In a 2023.9 QIIME 2 Amplicon environment, run the following command (in the same dir as the above files):

    qiime diversity alpha-group-significance \
    --i-alpha-diversity shannon_vector.qza \
    --m-metadata-file Testmetadata.tsv \
    --o-visualization shannon_vector.qzv \
    --verbose

    (Error message from above will be produced)

  3. Unzip the shannon_vector.qza artifact and examine the alpha-diversity.tsv file contained within the data directory. The following two IDs will be the only ones present: 98039, W190020

  4. Open Testmetadata.tsv and examine the contents of the metadata associated with the two IDs above. These do not meet the stated requirements of the visualizer because within the Location column, the value associated with sample 98039 is zoo and the value associated with sample W190020 is wild.

These don't satisfy the following requirement:

There must be at least one metadata column doesn't consist of unique values.

Expected behavior: The error message should include something like 'the contents of the metadata file associated with the samples present in the alpha-diversity metric' and should include the sample IDs in the error message. It is probably worth re-writing the message to clearly state that this error can result of either a metadata file that doesn't meet the viz requirements, or samples whose associated metadata doesn't meet the requirements (with a suggestion to check the diversity metric to ensure an appropriate sampling depth was selected).

lizgehret commented 12 months ago

@hagenjp I'm assigning this one to you to work on when you have time. Let me know if you have any questions!