Closed lizgehret closed 3 months ago
To make this easier to experiment with I am testing now with the moving pictures tutorial sample metadata and feature table. @ebolyen suggested that escaping with back-ticks might work here. That seems to get further, but is still erroring.
First, without back-ticks:
$ qiime composition ancombc --i-table table.qza --m-metadata-file sample-metadata.tsv --p-formula 'body-site' --o-differentials body-site-ancombc.qza --verbose
Traceback (most recent call last):
File "/Users/gregcaporaso/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2cli/commands.py", line 352, in __call__
results = action(**arguments)
File "<decorator-gen-206>", line 2, in ancombc
File "/Users/gregcaporaso/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 234, in bound_callable
outputs = self._callable_executor_(scope, callable_args,
File "/Users/gregcaporaso/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 381, in _callable_executor_
output_views = self._callable(**view_args)
File "/Users/gregcaporaso/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_composition/_ancombc.py", line 39, in ancombc
return _ancombc(
File "/Users/gregcaporaso/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_composition/_ancombc.py", line 104, in _ancombc
_column_validation(meta, 'formula', term)
File "/Users/gregcaporaso/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_composition/_ancombc.py", line 76, in _column_validation
raise ValueError('Value provided in the `%s` parameter was not found'
ValueError: Value provided in the `formula` parameter was not found in any of the metadata columns. Please make sure to only include values that are present within the metadata columns.
Value that was not found as a metadata column: "body"
Plugin error from composition:
Value provided in the `formula` parameter was not found in any of the metadata columns. Please make sure to only include values that are present within the metadata columns.
Value that was not found as a metadata column: "body"
See above for debug info.
And next with back-ticks:
$ qiime composition ancombc --i-table table.qza --m-metadata-file sample-metadata.tsv --p-formula '`body-site`' --o-differentials body-site-ancombc.qza --verbose
/Users/gregcaporaso/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/core/cache.py:435: UserWarning: Your temporary cache was found to be in an inconsistent state. It has been recreated.
warnings.warn(
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.
Command: run_ancombc.R --inp_abundances_path ./input.biom.tsv --inp_metadata_path ./input.map.txt --formula `body-site` --p_adj_method holm --prv_cut 0.1 --lib_cut 0 --reference_levels --neg_lb False --tol 1e-05 --max_iter 100 --conserve False --alpha 0.05 --output_loaf /var/folders/jp/_7s6840d2gdf_gkg1291p7x00000gn/T/q2-DataLoafPackageDirFmt-xdnc8oyk
── Attaching packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0 ✔ purrr 1.0.0
✔ tibble 3.1.8 ✔ dplyr 1.0.10
✔ tidyr 1.2.1 ✔ stringr 1.5.0
✔ readr 2.1.3 ✔ forcats 0.5.2
── Conflicts ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
R version 4.2.2 (2022-10-31)
'ancombc' is deprecated
Use 'ancombc2' instead
`tax_level` is not speficified
No agglomeration will be performed
Otherwise, please speficy `tax_level` by one of the following:
Species
Error in eval(predvars, data, env) : object 'body-site' not found
Calls: ancombc ... model.matrix.default -> model.frame -> model.frame.default -> eval -> eval
7: eval(predvars, data, env)
6: eval(predvars, data, env)
5: model.frame.default(object, data, xlev = xlev)
4: model.frame(object, data, xlev = xlev)
3: model.matrix.default(formula(paste0("~", formula)), data = meta_data)
2: model.matrix(formula(paste0("~", formula)), data = meta_data)
1: ancombc(data = data, formula = formula, p_adj_method = p_adj_method,
prv_cut = prv_cut, lib_cut = lib_cut, neg_lb = neg_lb, tol = tol,
max_iter = max_iter, conserve = conserve, alpha = alpha)
Traceback (most recent call last):
File "/Users/gregcaporaso/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_composition/_ancombc.py", line 153, in _ancombc
run_commands([cmd])
File "/Users/gregcaporaso/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_composition/_ancombc.py", line 30, in run_commands
subprocess.run(cmd, check=True)
File "/Users/gregcaporaso/miniconda3/envs/qiime2-2022.11/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['run_ancombc.R', '--inp_abundances_path', './input.biom.tsv', '--inp_metadata_path', './input.map.txt', '--formula', '`body-site`', '--p_adj_method', 'holm', '--prv_cut', '0.1', '--lib_cut', '0', '--reference_levels', '', '--neg_lb', 'False', '--tol', '1e-05', '--max_iter', '100', '--conserve', 'False', '--alpha', '0.05', '--output_loaf', '/var/folders/jp/_7s6840d2gdf_gkg1291p7x00000gn/T/q2-DataLoafPackageDirFmt-xdnc8oyk']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/gregcaporaso/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2cli/commands.py", line 352, in __call__
results = action(**arguments)
File "<decorator-gen-206>", line 2, in ancombc
File "/Users/gregcaporaso/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 234, in bound_callable
outputs = self._callable_executor_(scope, callable_args,
File "/Users/gregcaporaso/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 381, in _callable_executor_
output_views = self._callable(**view_args)
File "/Users/gregcaporaso/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_composition/_ancombc.py", line 39, in ancombc
return _ancombc(
File "/Users/gregcaporaso/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_composition/_ancombc.py", line 155, in _ancombc
raise Exception('An error was encountered while running ANCOM-BC'
Exception: An error was encountered while running ANCOM-BC in R (return code 1), please inspect stdout and stderr to learn more.
Plugin error from composition:
An error was encountered while running ANCOM-BC in R (return code 1), please inspect stdout and stderr to learn more.
See above for debug info.
I get the same error if I don't include the single quotes on the command line, but instead escape the back ticks:
qiime composition ancombc --i-table table.qza --m-metadata-file sample-metadata.tsv --p-formula \`body-site\` --o-differentials body-site-ancombc.qza --verbose
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.
Command: run_ancombc.R --inp_abundances_path ./input.biom.tsv --inp_metadata_path ./input.map.txt --formula `body-site` --p_adj_method holm --prv_cut 0.1 --lib_cut 0 --reference_levels --neg_lb False --tol 1e-05 --max_iter 100 --conserve False --alpha 0.05 --output_loaf /var/folders/jp/_7s6840d2gdf_gkg1291p7x00000gn/T/q2-DataLoafPackageDirFmt-wiab843a
── Attaching packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0 ✔ purrr 1.0.0
✔ tibble 3.1.8 ✔ dplyr 1.0.10
✔ tidyr 1.2.1 ✔ stringr 1.5.0
✔ readr 2.1.3 ✔ forcats 0.5.2
── Conflicts ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
R version 4.2.2 (2022-10-31)
'ancombc' is deprecated
Use 'ancombc2' instead
`tax_level` is not speficified
No agglomeration will be performed
Otherwise, please speficy `tax_level` by one of the following:
Species
Error in eval(predvars, data, env) : object 'body-site' not found
Calls: ancombc ... model.matrix.default -> model.frame -> model.frame.default -> eval -> eval
7: eval(predvars, data, env)
6: eval(predvars, data, env)
5: model.frame.default(object, data, xlev = xlev)
4: model.frame(object, data, xlev = xlev)
3: model.matrix.default(formula(paste0("~", formula)), data = meta_data)
2: model.matrix(formula(paste0("~", formula)), data = meta_data)
1: ancombc(data = data, formula = formula, p_adj_method = p_adj_method,
prv_cut = prv_cut, lib_cut = lib_cut, neg_lb = neg_lb, tol = tol,
max_iter = max_iter, conserve = conserve, alpha = alpha)
Traceback (most recent call last):
File "/Users/gregcaporaso/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_composition/_ancombc.py", line 153, in _ancombc
run_commands([cmd])
File "/Users/gregcaporaso/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_composition/_ancombc.py", line 30, in run_commands
subprocess.run(cmd, check=True)
File "/Users/gregcaporaso/miniconda3/envs/qiime2-2022.11/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['run_ancombc.R', '--inp_abundances_path', './input.biom.tsv', '--inp_metadata_path', './input.map.txt', '--formula', '`body-site`', '--p_adj_method', 'holm', '--prv_cut', '0.1', '--lib_cut', '0', '--reference_levels', '', '--neg_lb', 'False', '--tol', '1e-05', '--max_iter', '100', '--conserve', 'False', '--alpha', '0.05', '--output_loaf', '/var/folders/jp/_7s6840d2gdf_gkg1291p7x00000gn/T/q2-DataLoafPackageDirFmt-wiab843a']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/gregcaporaso/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2cli/commands.py", line 352, in __call__
results = action(**arguments)
File "<decorator-gen-206>", line 2, in ancombc
File "/Users/gregcaporaso/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 234, in bound_callable
outputs = self._callable_executor_(scope, callable_args,
File "/Users/gregcaporaso/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 381, in _callable_executor_
output_views = self._callable(**view_args)
File "/Users/gregcaporaso/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_composition/_ancombc.py", line 39, in ancombc
return _ancombc(
File "/Users/gregcaporaso/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_composition/_ancombc.py", line 155, in _ancombc
raise Exception('An error was encountered while running ANCOM-BC'
Exception: An error was encountered while running ANCOM-BC in R (return code 1), please inspect stdout and stderr to learn more.
Plugin error from composition:
An error was encountered while running ANCOM-BC in R (return code 1), please inspect stdout and stderr to learn more.
See above for debug info.
Happy to try something else out - just let me know if I can do anything to help!
(EDIT: Also, I'm not blocked on this as I'm just changing the -
to an _
in my metadata file, which is an easy work-around.)
Hey @gregcaporaso thanks for this detailed troubleshooting! I did some testing on my end, and I'm seeing the same object 'body-site' not found
error that you are when attempting to use back ticks to escape a column containing -
. I talked with @ebolyen about this today, and formulaic should allow for special character escaping with back ticks (referenced in the docs here). I'll keep playing around with the formatting and see if I can find something that works!
Worth noting that the escaping did seem to work as the error is now:
Error in eval(predvars, data, env) : object 'body-site' not found
Which means it passed through our check for metadata, which also means that the column should exist. So something else weird must be going on.
Adding notes here for future reference:
The issue is originating from Huang's implementation of ancombc, and occurs because of a default behavior within R. When the phyloseq object is handed into ancombc (or any accepted data input, such as a TreeSummarizedExperiment, etc), the columns containing special characters are replaced with '.' (e.g. so body-site would become body.site). This ends up causing a situation where even though we've successfully character escaped within the formula parameter, ancombc can't find the column name in question (i.e. body-site != body.site). There isn't an easy way to get around this on our end.
A couple of options that @ebolyen and I discussed:
I could submit a PR to Huang's repo, but this would have to be something that he is okay with (and the R user base), since this would require over-riding the default behavior in R of changing special characters to '.'
Another option would be to modify the output on our end, so that we are also changing special characters to '.' - this could be problematic if someone has multiple special characters in their columns though (e.g. 'body-site' and 'body+site' as two separate columns would end up with the same output - 'body.site').
We're going to consider this a known issue related to interfacing with R, and we're not going to try to fix it.
The formula parser within ancombc currently doesn't treat (+-/*) as regular characters, because the formula needs to handle those characters as operators. This can be inconvenient for users who have metadata columns that contain those characters, since those characters will act as term separators and will unintentionally split up metadata column names.
Example from @gregcaporaso:
Character escaping would be nice to add, but not sure if this is possible through the current formula parser. I am currently looking into this and will aim to add this to our 2022.11 patch release if there is a way to handle with current machinery.