qiime2 / q2-composition

BSD 3-Clause "New" or "Revised" License
5 stars 27 forks source link

BUG: tabulate viz does not handle single reference level values or default dummy coding #115

Closed lizgehret closed 1 year ago

lizgehret commented 1 year ago

Bug Description When running ancombc in q2-composition, if the --p-reference-levels parameter includes only a single column::value pair, or is left blank, the tabulate visualizer produces undesirable behavior.

For the single column::value pair, the paragraph tag that contains the following text: 'Groups use to define the intercept: ...etc' produces a column separated split string for the column::value pair. Screenshot example:

Screen Shot 2023-04-05 at 11 32 10 AM

For the case where --p-reference-levels is left blank, the default dummy coding column::value pair is not included in the 'Groups used to define the intercept...' tag, it is just left blank. Screenshot example:

Screen Shot 2023-04-05 at 11 32 20 AM

Steps to reproduce the behavior The example data can be used for qiime composition ancombc (single and multi formula group data can be used, the table and metadata files are the same).

To produce the examples above, either of these configurations for ancombc can be run (for single and missing reference levels, respectively):

  qiime composition ancombc \
    --i-table table.qza \
    --m-metadata-file metadata.tsv \
    --p-formula bodysite \
    --p-reference-levels 'bodysite::tongue'
    --o-differentials dataloaf.qza
  qiime composition ancombc \
    --i-table table.qza \
    --m-metadata-file metadata.tsv \
    --p-formula bodysite \
    --o-differentials dataloaf.qza

Expected behavior The tabulate visualizer should produce the following format for the chosen reference levels:

With formulacolumn1 and formulacolumn2 referring to the chosen column(s) from the formula parameter, and alphabetizedvalue1 and alphabetizedvalue2 referring to the default dummy coding intercept within each column (which corresponds to the highest value in alphabetical order for any categorical column).

Computation Environment