Closed ShaiberAlon closed 5 years ago
I ran this again with the latest R
version:
R version 3.6.1 (2019-07-05) -- "Action of the Toes"
And, sadly, I still get the same error.
@adw96 , do you have an idea regarding where this error is coming from? Maybe some R
expert in your lab could help me get to the bottom of this?
@ShaiberAlon, when I run the enrichment test using your TEST-PACKAGE.tar.gz
this is how things went:
First try:
anvi-get-enriched-functions-per-pan-group -p PAN.db \
> -g GENOMES.db \
> -o Functional_enrichment_2_groups.txt \
> --category-variable light \
> --annotation-source COG_FUNCTION
Genomes storage .............................................: Initialized (storage hash: hash0cde9439)
Num genomes in storage ......................................: 31
Num genomes will be used ....................................: 31
Pan DB ......................................................: Initialized: PAN.db (v. 13)
Gene cluster homogeneity estimates ..........................: Functional: [NO]; Geometric: [NO]; Combined: [NO]
* Gene clusters are initialized for all 7383 gene clusters in the database.
Config Error: The following R packages are required in order to run this program, but are
missing: qvalue. You can install these packages using conda by running the
following commands: "conda install -c bioconda bioconductor-qvalue"
I first tried to install bioconductor-qvalue
through the R terminal, and of course it wasn't available. So I tried the conda, and this is how it went:
## Package Plan ##
environment location: /Users/meren/miniconda3
added / updated specs:
- bioconductor-qvalue
The following packages will be downloaded:
package | build
---------------------------|-----------------
_r-mutex-1.0.0 | anacondar_1 2 KB
bioconductor-qvalue-2.8.0 | 0 2.7 MB bioconda
ca-certificates-2019.8.28 | 0 133 KB
certifi-2019.9.11 | py36_0 154 KB
conda-4.7.12 | py36_0 3.0 MB
curl-7.61.1 | ha441bb4_0 122 KB
icu-58.2 | h4b95b61_1 10.1 MB
jpeg-9b | he5867d9_2 201 KB
libgcc-4.8.5 | hdbeacc1_10 250 KB
libpng-1.6.37 | ha441bb4_0 262 KB
libtiff-4.0.10 | hcb84e12_2 394 KB
openssl-1.0.2t | h1de35cc_1 2.0 MB
pcre-8.43 | h0a44026_0 185 KB
r-3.3.1 | r3.3.1_0 620 B
r-base-3.3.1 | 0 47.2 MB
r-boot-1.3_18 | r3.3.1_0 576 KB
r-class-7.3_14 | r3.3.1_0 82 KB
r-cluster-2.0.4 | r3.3.1_0 472 KB
r-codetools-0.2_14 | r3.3.1_0 45 KB
r-colorspace-1.2_6 | r3.3.1_0 375 KB
r-dichromat-2.0_0 | r3.3.1_2 146 KB
r-digest-0.6.9 | r3.3.1_0 113 KB
r-foreign-0.8_66 | r3.3.1_0 225 KB
r-ggplot2-2.1.0 | r3.3.1_0 2.0 MB bioconda
r-gtable-0.2.0 | r3.3.1_0 57 KB
r-kernsmooth-2.23_15 | r3.3.1_0 81 KB
r-labeling-0.3 | r3.3.1_2 40 KB
r-lattice-0.20_33 | r3.3.1_0 697 KB
r-magrittr-1.5 | r3.3.1_2 155 KB
r-mass-7.3_45 | r3.3.1_0 1.0 MB
r-matrix-1.2_6 | r3.3.1_0 3.1 MB
r-mgcv-1.8_12 | r3.3.1_0 1.9 MB
r-munsell-0.4.3 | r3.3.1_0 130 KB
r-nlme-3.1_128 | r3.3.1_0 2.0 MB
r-nnet-7.3_12 | r3.3.1_0 98 KB
r-plyr-1.8.4 | r3.3.1_0 738 KB
r-rcolorbrewer-1.1_2 | r3.3.1_3 28 KB
r-rcpp-0.12.5 | r3.3.1_0 2.2 MB
r-recommended-3.3.1 | r3.3.1_0 767 B
r-reshape2-1.4.1 | r3.3.1_2 103 KB
r-rpart-4.1_10 | r3.3.1_0 863 KB
r-scales-0.4.1 | r3.3.1_1 204 KB bioconda
r-spatial-7.3_11 | r3.3.1_0 121 KB
r-stringi-1.1.1 | r3.3.1_0 10.8 MB
r-stringr-1.1.0 | r3.3.1_0 113 KB bioconda
r-survival-2.39_4 | r3.3.1_0 4.5 MB
------------------------------------------------------------
Total: 99.5 MB
The following NEW packages will be INSTALLED:
_r-mutex pkgs/r/osx-64::_r-mutex-1.0.0-anacondar_1
bioconductor-qval~ bioconda/osx-64::bioconductor-qvalue-2.8.0-0
curl pkgs/main/osx-64::curl-7.61.1-ha441bb4_0
icu pkgs/main/osx-64::icu-58.2-h4b95b61_1
jpeg pkgs/main/osx-64::jpeg-9b-he5867d9_2
libgcc pkgs/main/osx-64::libgcc-4.8.5-hdbeacc1_10
libpng pkgs/main/osx-64::libpng-1.6.37-ha441bb4_0
libtiff pkgs/main/osx-64::libtiff-4.0.10-hcb84e12_2
pcre pkgs/main/osx-64::pcre-8.43-h0a44026_0
r pkgs/r/osx-64::r-3.3.1-r3.3.1_0
r-base pkgs/r/osx-64::r-base-3.3.1-0
r-boot pkgs/r/osx-64::r-boot-1.3_18-r3.3.1_0
r-class pkgs/r/osx-64::r-class-7.3_14-r3.3.1_0
r-cluster pkgs/r/osx-64::r-cluster-2.0.4-r3.3.1_0
r-codetools pkgs/r/osx-64::r-codetools-0.2_14-r3.3.1_0
r-colorspace pkgs/r/osx-64::r-colorspace-1.2_6-r3.3.1_0
r-dichromat pkgs/r/osx-64::r-dichromat-2.0_0-r3.3.1_2
r-digest pkgs/r/osx-64::r-digest-0.6.9-r3.3.1_0
r-foreign pkgs/r/osx-64::r-foreign-0.8_66-r3.3.1_0
r-ggplot2 bioconda/osx-64::r-ggplot2-2.1.0-r3.3.1_0
r-gtable pkgs/r/osx-64::r-gtable-0.2.0-r3.3.1_0
r-kernsmooth pkgs/r/osx-64::r-kernsmooth-2.23_15-r3.3.1_0
r-labeling pkgs/r/osx-64::r-labeling-0.3-r3.3.1_2
r-lattice pkgs/r/osx-64::r-lattice-0.20_33-r3.3.1_0
r-magrittr pkgs/r/osx-64::r-magrittr-1.5-r3.3.1_2
r-mass pkgs/r/osx-64::r-mass-7.3_45-r3.3.1_0
r-matrix pkgs/r/osx-64::r-matrix-1.2_6-r3.3.1_0
r-mgcv pkgs/r/osx-64::r-mgcv-1.8_12-r3.3.1_0
r-munsell pkgs/r/osx-64::r-munsell-0.4.3-r3.3.1_0
r-nlme pkgs/r/osx-64::r-nlme-3.1_128-r3.3.1_0
r-nnet pkgs/r/osx-64::r-nnet-7.3_12-r3.3.1_0
r-plyr pkgs/r/osx-64::r-plyr-1.8.4-r3.3.1_0
r-rcolorbrewer pkgs/r/osx-64::r-rcolorbrewer-1.1_2-r3.3.1_3
r-rcpp pkgs/r/osx-64::r-rcpp-0.12.5-r3.3.1_0
r-recommended pkgs/r/osx-64::r-recommended-3.3.1-r3.3.1_0
r-reshape2 pkgs/r/osx-64::r-reshape2-1.4.1-r3.3.1_2
r-rpart pkgs/r/osx-64::r-rpart-4.1_10-r3.3.1_0
r-scales bioconda/osx-64::r-scales-0.4.1-r3.3.1_1
r-spatial pkgs/r/osx-64::r-spatial-7.3_11-r3.3.1_0
r-stringi pkgs/r/osx-64::r-stringi-1.1.1-r3.3.1_0
r-stringr bioconda/osx-64::r-stringr-1.1.0-r3.3.1_0
r-survival pkgs/r/osx-64::r-survival-2.39_4-r3.3.1_0
The following packages will be UPDATED:
ca-certificates conda-forge::ca-certificates-2019.6.1~ --> pkgs/main::ca-certificates-2019.8.28-0
certifi conda-forge::certifi-2019.6.16-py36_1 --> pkgs/main::certifi-2019.9.11-py36_0
conda conda-forge::conda-4.7.10-py36_0 --> pkgs/main::conda-4.7.12-py36_0
openssl conda-forge::openssl-1.0.2r-h1de35cc_0 --> pkgs/main::openssl-1.0.2t-h1de35cc_1
Then I got this error:
anvi-get-enriched-functions-per-pan-group -p PAN.db \
> -g GENOMES.db \
> -o Functional_enrichment_2_groups.txt \
> --category-variable light \
> --annotation-source COG_FUNCTION
Genomes storage .............................................: Initialized (storage hash: hash0cde9439)
Num genomes in storage ......................................: 31
Num genomes will be used ....................................: 31
Pan DB ......................................................: Initialized: PAN.db (v. 13)
Gene cluster homogeneity estimates ..........................: Functional: [NO]; Geometric: [NO]; Combined: [NO]
* Gene clusters are initialized for all 7383 gene clusters in the database.
Traceback for debugging
================================================================================
File "/Users/meren/github/anvio/bin/anvi-get-enriched-functions-per-pan-group", line 69, in <module>
main(args)
File "/Users/meren/github/anvio/bin/anvi-get-enriched-functions-per-pan-group", line 38, in main
s.functional_enrichment_stats()
File "/Users/meren/github/anvio/anvio/summarizer.py", line 333, in functional_enrichment_stats
ret_val = utils.run_command(["Rscript", "-e", "library('%s')" % lib], log_file)
File "/Users/meren/github/anvio/anvio/utils.py", line 396, in run_command
raise ConfigError("command was terminated")
================================================================================
Config Error: command was terminated
Then, this is what R
says:
(base) (anvio-master) meren ~/Downloads/TEST-PACKAGE $ R
dyld: Library not loaded: @rpath/libicuuc.54.dylib
Referenced from: /Users/meren/miniconda3/lib/R/lib/libR.dylib
Reason: image not found
Abort trap: 6
As a result, I first run this,
conda uninstall r r-base
And now am back to this:
anvi-get-enriched-functions-per-pan-group -p PAN.db \
> -g GENOMES.db \
> -o Functional_enrichment_2_groups.txt \
> --category-variable light \
> --annotation-source COG_FUNCTION
Genomes storage .............................................: Initialized (storage hash: hash0cde9439)
Num genomes in storage ......................................: 31
Num genomes will be used ....................................: 31
Pan DB ......................................................: Initialized: PAN.db (v. 13)
Gene cluster homogeneity estimates ..........................: Functional: [NO]; Geometric: [NO]; Combined: [NO]
* Gene clusters are initialized for all 7383 gene clusters in the database.
Traceback for debugging
================================================================================
File "/Users/meren/github/anvio/bin/anvi-get-enriched-functions-per-pan-group", line 69, in <module>
main(args)
File "/Users/meren/github/anvio/bin/anvi-get-enriched-functions-per-pan-group", line 38, in main
s.functional_enrichment_stats()
File "/Users/meren/github/anvio/anvio/summarizer.py", line 342, in functional_enrichment_stats
', '.join(['"%s"' % package_dict[i] for i in missing_packages])))
================================================================================
Config Error: The following R packages are required in order to run this program, but are
missing: qvalue. You can install these packages using conda by running the
following commands: "conda install -c bioconda bioconductor-qvalue"
R version is this:
(base) (anvio-master) meren ~/Downloads/TEST-PACKAGE $ R
R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin18.2.0 (64-bit)
I hope it is useful for anything.
@meren, do you mind installing the latest version of R and trying again? To see if then the qvalue package installs properly?
I installed it using conda after installing the latest R and it went ok.
@ShaiberAlon, I took your advice and installed the latest version:
(base) (anvio-master) meren ~/Downloads/TEST-PACKAGE $ R
R version 3.6.1 (2019-07-05) -- "Action of the Toes"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Then I could get the same missing package error:
anvi-get-enriched-functions-per-pan-group -p PAN.db \
> -g GENOMES.db \
> -o Functional_enrichment_2_groups.txt \
> --category-variable light \
> --annotation-source COG_FUNCTION
Genomes storage .............................................: Initialized (storage hash: hash0cde9439)
Num genomes in storage ......................................: 31
Num genomes will be used ....................................: 31
Pan DB ......................................................: Initialized: PAN.db (v. 13)
Gene cluster homogeneity estimates ..........................: Functional: [NO]; Geometric: [NO]; Combined: [NO]
* Gene clusters are initialized for all 7383 gene clusters in the database.
Config Error: The following R packages are required in order to run this program, but are
missing: qvalue. You can install these packages using conda by running the
following commands: "conda install -c bioconda bioconductor-qvalue"
Trying to install conda gave me this output, and I nope'd the F-out:
conda install -c bioconda bioconductor-qvalue
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /Users/meren/miniconda3
added / updated specs:
- bioconductor-qvalue
The following NEW packages will be INSTALLED:
_r-mutex pkgs/r/osx-64::_r-mutex-1.0.0-anacondar_1
bioconductor-qval~ bioconda/osx-64::bioconductor-qvalue-2.8.0-0
curl pkgs/main/osx-64::curl-7.61.1-ha441bb4_0
icu pkgs/main/osx-64::icu-58.2-h4b95b61_1
jpeg pkgs/main/osx-64::jpeg-9b-he5867d9_2
libgcc pkgs/main/osx-64::libgcc-4.8.5-hdbeacc1_10
libiconv pkgs/main/osx-64::libiconv-1.15-hdd342a3_7
libpng pkgs/main/osx-64::libpng-1.6.37-ha441bb4_0
libtiff pkgs/main/osx-64::libtiff-4.0.10-hcb84e12_2
libxml2 pkgs/main/osx-64::libxml2-2.9.9-hf6e021a_1
pcre pkgs/main/osx-64::pcre-8.43-h0a44026_0
r pkgs/r/osx-64::r-3.3.1-r3.3.1_0
r-base pkgs/r/osx-64::r-base-3.3.1-0
r-boot pkgs/r/osx-64::r-boot-1.3_18-r3.3.1_0
r-class pkgs/r/osx-64::r-class-7.3_14-r3.3.1_0
r-cluster pkgs/r/osx-64::r-cluster-2.0.4-r3.3.1_0
r-codetools pkgs/r/osx-64::r-codetools-0.2_14-r3.3.1_0
r-colorspace pkgs/r/osx-64::r-colorspace-1.2_6-r3.3.1_0
r-dichromat pkgs/r/osx-64::r-dichromat-2.0_0-r3.3.1_2
r-digest pkgs/r/osx-64::r-digest-0.6.9-r3.3.1_0
r-foreign pkgs/r/osx-64::r-foreign-0.8_66-r3.3.1_0
r-ggplot2 bioconda/osx-64::r-ggplot2-2.1.0-r3.3.1_0
r-gtable pkgs/r/osx-64::r-gtable-0.2.0-r3.3.1_0
r-kernsmooth pkgs/r/osx-64::r-kernsmooth-2.23_15-r3.3.1_0
r-labeling pkgs/r/osx-64::r-labeling-0.3-r3.3.1_2
r-lattice pkgs/r/osx-64::r-lattice-0.20_33-r3.3.1_0
r-magrittr pkgs/r/osx-64::r-magrittr-1.5-r3.3.1_2
r-mass pkgs/r/osx-64::r-mass-7.3_45-r3.3.1_0
r-matrix pkgs/r/osx-64::r-matrix-1.2_6-r3.3.1_0
r-mgcv pkgs/r/osx-64::r-mgcv-1.8_12-r3.3.1_0
r-munsell pkgs/r/osx-64::r-munsell-0.4.3-r3.3.1_0
r-nlme pkgs/r/osx-64::r-nlme-3.1_128-r3.3.1_0
r-nnet pkgs/r/osx-64::r-nnet-7.3_12-r3.3.1_0
r-plyr pkgs/r/osx-64::r-plyr-1.8.4-r3.3.1_0
r-rcolorbrewer pkgs/r/osx-64::r-rcolorbrewer-1.1_2-r3.3.1_3
r-rcpp pkgs/r/osx-64::r-rcpp-0.12.5-r3.3.1_0
r-recommended pkgs/r/osx-64::r-recommended-3.3.1-r3.3.1_0
r-reshape2 pkgs/r/osx-64::r-reshape2-1.4.1-r3.3.1_2
r-rpart pkgs/r/osx-64::r-rpart-4.1_10-r3.3.1_0
r-scales bioconda/osx-64::r-scales-0.4.1-r3.3.1_1
r-spatial pkgs/r/osx-64::r-spatial-7.3_11-r3.3.1_0
r-stringi pkgs/r/osx-64::r-stringi-1.1.1-r3.3.1_0
r-stringr bioconda/osx-64::r-stringr-1.1.0-r3.3.1_0
r-survival pkgs/r/osx-64::r-survival-2.39_4-r3.3.1_0
zstd pkgs/main/osx-64::zstd-1.3.7-h5bba6e5_0
Proceed ([y]/n)? n
CondaSystemExit: Exiting.
Instead I started an R
shell, and did this, which was a smooth sail and solved that complaint:
install.packages("BiocManager")
BiocManager::install("qvalue")
And then this is what happened, which is the error you are stuck with :)
Genomes storage .............................................: Initialized (storage hash: hash0cde9439)
Num genomes in storage ......................................: 31
Num genomes will be used ....................................: 31
Pan DB ......................................................: Initialized: PAN.db (v. 13)
Gene cluster homogeneity estimates ..........................: Functional: [NO]; Geometric: [NO]; Combined: [NO]
* Gene clusters are initialized for all 7383 gene clusters in the database.
Category ....................................................: light
Functional annotation source ................................: COG_FUNCTION
Exclude ungrouped ...........................................: False
Functional occurrence summary ...............................: /var/folders/x5/gt4031w53fs63csv1fp0r_3w0000gn/T/tmp0w795f9d
Config Error: It looks like something went wrong during the functional enrichment analysis. We
don't know what happened, but this log file could contain some clues:
/var/folders/x5/gt4031w53fs63csv1fp0r_3w0000gn/T/tmpzjniu0xx
(base) (anvio-master) meren ~/Downloads/TEST-PACKAGE $ cat /var/folders/x5/gt4031w53fs63csv1fp0r_3w0000gn/T/tmpzjniu0xx
# DATE: 04 Oct 19 10:21:09
# CMD LINE: anvi-run-enrichment-analysis.R --input /var/folders/x5/gt4031w53fs63csv1fp0r_3w0000gn/T/tmp0w795f9d --output Functional_enrichment_2_groups.txt
Parsed with column specification:
cols(
COG_FUNCTION = col_character(),
function_accession = col_character(),
gene_clusters_ids = col_character(),
associated_groups = col_character(),
p_HL = col_double(),
p_LL = col_double(),
N_HL = col_double(),
N_LL = col_double()
)
Error: Column `function_accession` can't be modified because it's a grouping variable
Execution halted
So it is reproducible!
Thank you @meren!
@adw96 , it is officially not only me.... HELP??
HELP??
I read it like this:
Poor @adw96. She has a million things to do. DON'T WORRY, AMY, WE WILL BE FINE :')))))
Would it be possible to post just the input file used for this command?
anvi-run-enrichment-analysis.R --input /var/folders/x5/gt4031w53fs63csv1fp0r_3w0000gn/T/tmp0w795f9d --output Functional_enrichment_2_groups.txt
Hi @mooreryan ,
Here is the input: functional_enrichment_input_2_groups.txt
We now merged to master
, so if you are on master
, you can do:
anvi-run-enrichment-analysis.R --input functional_enrichment_input_2_groups.txt --output output-file
Thank you!
Interesting...so I just tried running the command with the data you sent and it ran without error.
It runs everywhere except our lab. It is time, guys.
[person.fire() for person in http://merenlab.org/people]
It may be something weird in your R dependencies...could you list the packages and the versions that are currently loaded?
Here is a little Rscript that you can run which will load the same packages as anvi-run-enrichment-analysis.R
and then tell you which versions of all the packages that it loads:
https://gist.github.com/mooreryan/b0bbb7388c14324b7b2fed2612a7a362
Thank you, @mooreryan :) Here is the output:
── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 3.2.1 ✔ purrr 0.3.2
✔ tibble 2.1.3 ✔ dplyr 0.8.3
✔ tidyr 1.0.0 ✔ stringr 1.4.0
✔ readr 1.3.1 ✔ forcats 0.4.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
Attaching package: ‘magrittr’
The following object is masked from ‘package:purrr’:
set_names
The following object is masked from ‘package:tidyr’:
extract
"Package","Version"
"qvalue","2.16.0"
"magrittr","1.5"
"forcats","0.4.0"
"stringr","1.4.0"
"dplyr","0.8.3"
"purrr","0.3.2"
"readr","1.3.1"
"tidyr","1.0.0"
"tibble","2.1.3"
"ggplot2","3.2.1"
"tidyverse","1.2.1"
"optparse","1.6.4"
"stats","3.6.1"
"graphics","3.6.1"
"grDevices","3.6.1"
"utils","3.6.1"
"datasets","3.6.1"
"methods","3.6.1"
"base","3.6.1"
You do have a couple of different libs than I do.
dplyr
: You have 0.8.3
, I have 0.8.1
. Inside a running Docker container, I upgraded to 0.8.3
, but it still worked for me, so it probably isn't dplyr
.tidyr
: You have 1.0.0
, I have 0.8.3
. When I upgraded to 1.0.0
, the script broke and I got the same error as you all did! Will continue checking the others.
Yep, so I have discovered that the problem is caused by something in the tidyr
package that has changed somewhere in between 0.8.3
to 1.0.0
.
Okay, so I've figured out the problem. It's here: https://github.com/merenlab/anvio/blob/2c9179f037b43502901582ad7eea61f5dbfc3131/bin/anvi-run-enrichment-analysis.R#L81
If you're running version 1.0.0
of tidyr
, then that line needs to change to nest_legacy %>%
, but if you are running somewhere below that, it needs to be nest
. That function got new syntax in the update.
OMG! Awesome, @mooreryan!!! Thank you VERY much :)
I think we should switch to nest_legacy, and ask people to update their tidyr
versions if we hear about this.
@ShaiberAlon, @adw96, is this agreeable?
Actually, I'm about to open a pull request addressing this....
:+1:
By the way, I tested Ryan's solution in #1249:
(anvio-master) (base) meren ~/Downloads/TEST-PACKAGE $ anvi-get-enriched-functions-per-pan-group -p PAN.db \
> -g GENOMES.db \
> -o Functional_enrichment_2_groups.txt \
> --category-variable light \
> --annotation-source COG_FUNCTION
> --annotation-source COG_FUNCTION
Genomes storage .............................................: Initialized (storage hash: hash0cde9439)
Num genomes in storage ......................................: 31
Num genomes will be used ....................................: 31
Pan DB ......................................................: Initialized: PAN.db (v. 13)
Gene cluster homogeneity estimates ..........................: Functional: [NO]; Geometric: [NO]; Combined: [NO]
* Gene clusters are initialized for all 7383 gene clusters in the database.
Category ....................................................: light
Functional annotation source ................................: COG_FUNCTION
Exclude ungrouped ...........................................: False
Functional occurrence summary ...............................: /var/folders/x5/gt4031w53fs63csv1fp0r_3w0000gn/T/tmpv_fkk210
Functional enrichment summary log file: .....................: /var/folders/x5/gt4031w53fs63csv1fp0r_3w0000gn/T/tmplce2sbn7
Functional enrichment summary ...............................: Functional_enrichment_2_groups.txt
🥇
Dear @meren @ShaiberAlon @mooreryan
amy.isback()
I'm OOO for 3 days and this is what happens?! Unbelievable...
@mooreryan -- great work finding and solving this -- thank you. I didn't know nest
changed between tidyr
versions. I am shocked to find that nest <- nest_legacy
is the tidyverse
sanctioned solution, but since it is and you are an outstanding programmer I agree this is the right fix (ref #1249 ).
If we have future problems with this approach I will rethink this (for future @adw96 : look at pack
and chop
), but for now I think this is great. tidyr
1.0.0 was released less than a month ago, so this was a dangerous time to use a previously very stable package. MEREN I PROMISE THIS WILL NEVER HAPPEN AGAIN PLEASE DON'T FIRE ME
Amy
TODO(Amy) Does being explicit about the order of input arguments solve ambiguities between nest
and nest_legacy
? Investigate.
If there is anyone to fire it is always me! :)
In fact we were discussing yesterday how much we appreciate working with you, @mooreryan, @xvazquezc, and others who are willing to share their expertise with us. We are very thankful for your time.
Re-opening this issue, because I now get an error when I run the functional enrichment on our multiple group (i.e. more than two) example from our pangenomic tutorial:
When running this:
anvi-get-enriched-functions-per-pan-group -p PROCHLORO/Prochlorococcus_Pan-PAN.db \
-g PROCHLORO-GENOMES.db \
--category clade\
--annotation-source COG_FUNCTION \
-o PROCHLORO-PAN-enriched-functions-clade.txt \
--functional-occurrence-table-output PROCHLORO-functions-occurrence.txt
I get this:
Config Error: It looks like something went wrong during the functional enrichment analysis. We
don't know what happened, but this log file could contain some clues:
/var/folders/4n/gwkhlcx13cg04n64tybzyshr0000gn/T/tmp5lrmcx9n
And here is the aforementioned log file (/var/folders/4n/gwkhlcx13cg04n64tybzyshr0000gn/T/tmp5lrmcx9n
:
# DATE: 16 Oct 19 07:36:36
# CMD LINE: anvi-script-run-functional-enrichment-stats --input /var/folders/4n/gwkhlcx13cg04n64tybzyshr0000gn/T/tmp5xfe51yd --output PROCHLORO-PAN-enriched-functions-clade.txt
Warning message:
package ‘optparse’ was built under R version 3.5.1
Warning messages:
1: package ‘ggplot2’ was built under R version 3.5.1
2: package ‘tibble’ was built under R version 3.5.1
3: package ‘tidyr’ was built under R version 3.5.1
4: package ‘readr’ was built under R version 3.5.1
5: package ‘purrr’ was built under R version 3.5.1
6: package ‘dplyr’ was built under R version 3.5.1
7: package ‘stringr’ was built under R version 3.5.1
tidyr major version >= 1. Using nest_legacy.
Parsed with column specification:
cols(
COG_FUNCTION = col_character(),
function_accession = col_character(),
gene_clusters_ids = col_character(),
associated_groups = col_character(),
p_LL_IV = col_double(),
p_HL_I = col_double(),
p_LL_III = col_double(),
p_LL_II = col_double(),
p_LL_I = col_double(),
p_HL_II = col_double(),
N_LL_IV = col_double(),
N_HL_I = col_double(),
N_LL_III = col_double(),
N_LL_II = col_double(),
N_LL_I = col_double(),
N_HL_II = col_double()
)
Error: Each row of output must be identified by a unique combination of keys.
Keys are shared for 15984 rows:
* 7993, 10657, 11989, 13321
* 7994, 10658, 11990, 13322
* 7995, 10659, 11991, 13323
* 7996, 10660, 11992, 13324
* 7997, 10661, 11993, 13325
* 7998, 10662, 11994, 13326
* 7999, 10663, 11995, 13327
* 8000, 10664, 11996, 13328
* 8001, 10665, 11997, 13329
* 8002, 10666, 11998, 13330
* 8003, 10667, 11999, 13331
* 8004, 10668, 12000, 13332
* 8005, 10669, 12001, 13333
* 8006, 10670, 12002, 13334
* 8007, 10671, 12003, 13335
* 8008, 10672, 12004, 13336
* 8009, 10673, 12005, 13337
* 8010, 10674, 12006, 13338
* 8011, 10675, 12007, 13339
* 8012, 10676, 12008, 13340
* 8013, 10677, 12009, 13341
* 8014, 10678, 12010, 13342
* 8015, 10679, 12011, 13343
* 8016, 10680, 12012, 13344
* 8017, 10681, 12013, 13345
* 8018, 10682, 12014, 13346
* 8019, 10683, 12015, 13347
* 8020, 10684, 12016, 13348
* 8021, 10685, 12017, 13349
* 8022, 10686, 12018, 13350
* 8023, 10687, 12019, 13351
* 8024, 10688, 12020,
In addition: Warning message:
Expected 2 pieces. Additional pieces discarded in 15984 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
Execution halted
Here is the input file for the R
script (i.e. the aforementioned /var/folders/4n/gwkhlcx13cg04n64tybzyshr0000gn/T/tmp5xfe51yd
:
functional_enrichment_input_5_groups.txt
So you can run it like this:
anvi-script-run-functional-enrichment-stats --input functional_enrichment_input_5_groups.txt \
--output functional_enrichment_output_5_groups.txt
Notice that this five group example is pretty bad, because one of the groups has only one member, but I also tested this with the following input:
COG_FUNCTION function_accession gene_clusters_ids associated_groups p_LL_IV p_HL_I p_LL_III p_LL_II p_LL_I p_HL_II N_LL_IV N_HL_I N_LL_III N_LL_II N_LL_I N_HL_II
Deoxyribose-phosphate aldolase COG0274 GC_00001115, GC_00002224, GC_00003647, GC_00003952 1 1 1 1 1 110 20 20 10 25 17
function2 FAKE_ID GC_00001115, GC_00002224, GC_00003647, GC_00003952 1 0.1 0 0.2 1 1 10 20 20 10 25 17
(small_multi_group_example.txt)
And this:
anvi-script-run-functional-enrichment-stats --input small_multi_group_example.txt \
--output small_output.txt
And it also fails in the same way.
THIS IS MY FAULT FOR NOT TESTING WITH THE MULTIPLE GROUPS AFTER THE CHANGES. SORRY!
I found the problem, it is because the names of the groups have this format LL_I
, LL_II
, etc. the _
is messing up the way the R
script considers the names of groups. If I remove the _
then things work, so for example:
COG_FUNCTION function_accession gene_clusters_ids associated_groups p_LLIV p_HLI p_LLIII p_LLII p_LLI p_HLII N_LLIV N_HLI N_LLIII N_LLII N_LLI N_HLII
Deoxyribose-phosphate aldolase COG0274 GC_00001115, GC_00002224, GC_00003647, GC_00003952 1 1 1 1 1 110 20 20 10 25 17
function2 FAKE_ID GC_00001115, GC_00002224, GC_00003647, GC_00003952 1 0.1 0 0.2 1 1 10 20 20 10 25 17
(small_multi_group_example_fixed.txt)
Run:
anvi-script-run-functional-enrichment-stats --input small_multi_group_example_fixed.txt \
--output small_output.txt
Works!
So we need to fix this. The test should definitely be ok with group names having _
in them. We should also be explicit about this (for example, if we are not ok with spaces then we should mention that, and I can add a sanity check in the python part to see if names are illegal and raise a useful error.
@mooreryan , @adw96 , if one of you has a chance to take a look and fix this, I would greatly appreciate that! My R
fluency is not good enough for that...
I see the problem. It's in this line: https://github.com/merenlab/anvio/blob/df0a36849a24f5af29a18f7f9a0495d791fe1493/sandbox/anvi-script-run-functional-enrichment-stats#L127
It's not separating the type
column as it assumes a single _
separating type and group. But it looks like in the original data, group
had an _
in the name.
This indeed solved it. Thank you very much @mooreryan !
When I run
anvi-get-enriched-functions-per-pan-group
, I get the following error:And the aforementioned log file includes this error:
Potential solution - R version
I am using:
So this could be an issue with my local version, but if so, then maybe we should add something to check R version, or at least include a message that we require a certain minimal version of R.
I will try installing the latest
R
version and test this again.Reproducing this
To reproduce this, you can download the following data package: https://drive.google.com/file/d/1crwvvDpK_AqC2ngivcZfETOyj7brDhPL/view?usp=sharing
Uncompress the data folder and
cd
into it:And then run the enrichment test: