liberjul / CONSTAXv2

MIT License
8 stars 2 forks source link

problems running constax #3

Open ramiroricardo opened 3 years ago

ramiroricardo commented 3 years ago

Dear all,

I have been trying to test Constax, but I have been facing a problem, which I think might be related to the pathfile, but I have not been able to figure out a solution. I installed constax in a conda environment using the commands given in the instructions and I have constax v2.0.9 installed in a Ubuntu server, v 18.04.5 LTS.

I have tried to run constax both with and without indicating the pathfile that is located in the conda environment (the result is the same). My code looks like:

constax \
--num_threads 10 \
--mem 32000 \
--db /database/UNITE/sh_general_release_04.02.2020/sh_general_release_dynamic_04.02.2020.fasta \
--train \
--input /database/UNITE/unite_test_query.fasta \
--input /database/UNITE/unite_test_query.fasta \
--isolates /database/UNITE/unite_test_isos.fasta \
--trainfile /database/UNITE/training_files \
--tax /database/UNITE/taxonomy_assignements \
--output /database/UNITE/taxonomy_assignements \
--make_plot \
--conf 0.8 \
--make_plot \
--pathfile /biotools/miniconda3/envs/constax2/opt/constax-2.0.9/pathfile.txt

and I get the following output

Welcome to CONSTAX version 2.0.9 build 0 - The CONSensus TAXonomy classifier
This software is distributed under MIT License
© Copyright 2020, Julian A. Liber, Gian M. N. Benucci & Gregory M. Bonito
https://github.com/liberjul/CONSTAXv2
https://constax.readthedocs.io/

Please cite us as:
CONSTAX2: Improved taxonomic classification of environmental DNA markers
Julian Aaron Liber, Gregory Bonito, Gian Maria Niccolò Benucci
bioRxiv 2021.02.15.430803; doi: https://doi.org/10.1101/2021.02.15.430803
Training, with output to /database/UNITE/training_files...
Pathfile input not found in local directory ...
Pathfile input not found at /biotools/miniconda3/envs/constax2/pkgs/constax-2.0.9-0/opt/constax-2.0.9/pathfile.txt ...
Pathfile input not found at /biotools/miniconda3/envs/constax2/pkgs/constax-2.0.9-placeholder/opt/constax-2.0.9/pathfile.txt ...
Pathfile input found at /biotools/miniconda3/envs/constax2/opt/constax-2.0.9/pathfile.txt ...
All needed executables exist.
SINTAX: vsearch
RDP: classifier
CONSTAX: /biotools/miniconda3/envs/constax2/opt/constax-2.0.9
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 208: [: -gt: unary operator expected
Memory size: 32000mb
Importing subscripts from /biotools/miniconda3/envs/constax2/opt/constax-2.0.9

____________________________________________________________________
Reformatting database

UNITE format detected

Reference database FASTAs formatted in 1.404342263 seconds...

        Training Taxonomy

        Adding Full Lineage

Database formatting complete
____________________________________________________________________

__________________________________________________________________________
Training SINTAX Classifier
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 247: vsearch: command not found
__________________________________________________________________________
Training UTAX Classifier
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 263: : command not found
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 266: : command not found
__________________________________________________________________________
Training RDP Classifier
Error: Unable to access jarfile classifier
__________________________________________________________________________
Assigning taxonomy to OTU's representative sequences
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 334: vsearch: command not found
sed: can't read /database/UNITE/taxonomy_assignements/otu_taxonomy.sintax: No such file or directory
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 350: : command not found
Error: Unable to access jarfile classifier
__________________________________________________________________________
Comparing to Isolates
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 373: makeblastdb: command not found
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 375: blastn: command not found
rm: cannot remove '/database/UNITE/taxonomy_assignements/unite_test_isos__BLAST.n*': No such file or directory
Combining Taxonomies
Traceback (most recent call last):
  File "/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/CombineTaxonomy.py", line 532, in <module>
    open(file_name,"r")
FileNotFoundError: [Errno 2] No such file or directory: '/database/UNITE/taxonomy_assignements/otu_taxonomy.rdp'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/CombineTaxonomy.py", line 534, in <module>
    raise FileNotFoundError(F"{classifier.upper()} file could not be opened.")
FileNotFoundError: RDP file could not be opened.
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 413: Rscript: command not found

Perhaps there is some simple mistake that I am making?

thanks for any help

liberjul commented 3 years ago

Hi ramiroricardo,

There are two things that I have noticed might need to be fixed: 1) For conda installed constax, you must use the -b or --blast flag, unless you intend to use the UTAX implementation which requires a separate download. See here for more details.

2) For some reason the vsearch, classifier, makeblastdb, blastn, and Rscript commands are not working when executed. Try executing those commands outside the constax script to see if they are valid. If they work at all, is it only when the environment is activated?

ramiroricardo commented 3 years ago

Hi @liberjul ,

Thanks for your quick reply. When I run the same code, but with the --blast flag, I get essentially the same output:

Welcome to CONSTAX version 2.0.9 build 0 - The CONSensus TAXonomy classifier
This software is distributed under MIT License
© Copyright 2020, Julian A. Liber, Gian M. N. Benucci & Gregory M. Bonito
https://github.com/liberjul/CONSTAXv2
https://constax.readthedocs.io/

Please cite us as:
CONSTAX2: Improved taxonomic classification of environmental DNA markers
Julian Aaron Liber, Gregory Bonito, Gian Maria Niccolò Benucci
bioRxiv 2021.02.15.430803; doi: https://doi.org/10.1101/2021.02.15.430803
Training, with output to /database/UNITE/training_files...
Pathfile input not found in local directory ...
Pathfile input not found at /biotools/miniconda3/envs/constax2/pkgs/constax-2.0.9-0/opt/constax-2.0.9/pathfile.txt ...
Pathfile input not found at /biotools/miniconda3/envs/constax2/pkgs/constax-2.0.9-placeholder/opt/constax-2.0.9/pathfile.txt ...
Pathfile input found at /biotools/miniconda3/envs/constax2/opt/constax-2.0.9/pathfile.txt ...
All needed executables exist.
SINTAX: vsearch
RDP: classifier
CONSTAX: /biotools/miniconda3/envs/constax2/opt/constax-2.0.9
Memory size: 32000mb
Importing subscripts from /biotools/miniconda3/envs/constax2/opt/constax-2.0.9

____________________________________________________________________
Reformatting database

UNITE format detected

Reference database FASTAs formatted in 1.504352509 seconds...

        Training Taxonomy

        Adding Full Lineage

Database formatting complete
____________________________________________________________________

__________________________________________________________________________
Training SINTAX Classifier
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 247: vsearch: command not found
__________________________________________________________________________
Training BLAST Classifier
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 258: makeblastdb: command not found
__________________________________________________________________________
Training RDP Classifier
Error: Unable to access jarfile classifier
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 322: blastn: command not found
__________________________________________________________________________
Assigning taxonomy to OTU's representative sequences
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 334: vsearch: command not found
sed: can't read /database/UNITE/taxonomy_assignements/otu_taxonomy.sintax: No such file or directory
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 347: blastn: command not found
Error: Unable to access jarfile classifier
__________________________________________________________________________
Comparing to Isolates
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 373: makeblastdb: command not found
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 375: blastn: command not found
rm: cannot remove '/database/UNITE/taxonomy_assignements/unite_test_isos__BLAST.n*': No such file or directory
Combining Taxonomies
Traceback (most recent call last):
  File "/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/CombineTaxonomy.py", line 532, in <module>
    open(file_name,"r")
FileNotFoundError: [Errno 2] No such file or directory: '/database/UNITE/taxonomy_assignements/otu_taxonomy.rdp'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/CombineTaxonomy.py", line 534, in <module>
    raise FileNotFoundError(F"{classifier.upper()} file could not be opened.")
FileNotFoundError: RDP file could not be opened.
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 410: Rscript: command not found

About your 2nd question, none of the tools were working outside of the environment. Though I note that I was running this not as a bash script, but directly on terminal. I had previously ran it inside a bash script, also with the same errors, while adding the following before calling constax:

source /biotools/miniconda3/etc/profile.d/conda.sh
conda activate constax2

Inside the environment, all tools appeared to be working as these produced the expected output when calling -help, with the exception of Rscript. I have now installed R4.0.3 to the environment and Rscript is working. Though even after this installation, the output that I get is:

Welcome to CONSTAX version 2.0.9 build 0 - The CONSensus TAXonomy classifier
This software is distributed under MIT License
© Copyright 2020, Julian A. Liber, Gian M. N. Benucci & Gregory M. Bonito
https://github.com/liberjul/CONSTAXv2
https://constax.readthedocs.io/

Please cite us as:
CONSTAX2: Improved taxonomic classification of environmental DNA markers
Julian Aaron Liber, Gregory Bonito, Gian Maria Niccolò Benucci
bioRxiv 2021.02.15.430803; doi: https://doi.org/10.1101/2021.02.15.430803
Training, with output to /database/UNITE/training_files...
Pathfile input not found in local directory ...
Pathfile input not found at /biotools/miniconda3/envs/constax2/pkgs/constax-2.0.9-0/opt/constax-2.0.9/pathfile.txt ...
Pathfile input not found at /biotools/miniconda3/envs/constax2/pkgs/constax-2.0.9-placeholder/opt/constax-2.0.9/pathfile.txt ...
Pathfile input found at /biotools/miniconda3/envs/constax2/opt/constax-2.0.9/pathfile.txt ...
All needed executables exist.
SINTAX: vsearch
RDP: classifier
CONSTAX: /biotools/miniconda3/envs/constax2/opt/constax-2.0.9
Memory size: 32000mb
Importing subscripts from /biotools/miniconda3/envs/constax2/opt/constax-2.0.9

____________________________________________________________________
Reformatting database

UNITE format detected

Reference database FASTAs formatted in 1.523612179 seconds...

        Training Taxonomy

        Adding Full Lineage

Database formatting complete
____________________________________________________________________

__________________________________________________________________________
Training SINTAX Classifier
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 247: vsearch: command not found
__________________________________________________________________________
Training BLAST Classifier
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 258: makeblastdb: command not found
__________________________________________________________________________
Training RDP Classifier
Error: Unable to access jarfile classifier
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 322: blastn: command not found
__________________________________________________________________________
Assigning taxonomy to OTU's representative sequences
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 334: vsearch: command not found
sed: can't read /database/UNITE/taxonomy_assignements/otu_taxonomy.sintax: No such file or directory
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 347: blastn: command not found
Error: Unable to access jarfile classifier
__________________________________________________________________________
Comparing to Isolates
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 373: makeblastdb: command not found
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 375: blastn: command not found
rm: cannot remove '/database/UNITE/taxonomy_assignements/unite_test_isos__BLAST.n*': No such file or directory
Combining Taxonomies
Traceback (most recent call last):
  File "/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/CombineTaxonomy.py", line 532, in <module>
    open(file_name,"r")
FileNotFoundError: [Errno 2] No such file or directory: '/database/UNITE/taxonomy_assignements/otu_taxonomy.rdp'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/CombineTaxonomy.py", line 534, in <module>
    raise FileNotFoundError(F"{classifier.upper()} file could not be opened.")
FileNotFoundError: RDP file could not be opened.
Gian77 commented 3 years ago

Hello @ramiroricardo,

Can you please post what you have in your pathfile.txt? Also, it should be placed in your working directory not in the conda environment. See details https://constax.readthedocs.io/en/latest/tutorial1.html

ramiroricardo commented 3 years ago

Hi @Gian77 ,

Thanks for your reply.

My path file, that was in the conda environment folders, has the following:

export SINTAXPATH=vsearch
export RDPPATH=classifier
export CONSTAXPATH=/biotools/miniconda3/envs/constax2/opt/constax-2.0.9

Note that I did not do any modification to this, it came when I created the conda environment.

If I create a pathfile in the working directory as:

CONSTAXPATH=/biotools/miniconda3/envs/constax2/opt/constax-2.0.9
RDPPATH=/biotools/miniconda3/envs/constax2/bin/classifier
SINTAXPATH=/biotools/miniconda3/envs/constax2/bin/vsearch

and run:

constax \
--num_threads 10 \
--mem 32000 \
--db /database/UNITE/sh_general_release_04.02.2020/sh_general_release_dynamic_04.02.2020.fasta \
--train \
--input /database/UNITE/unite_test_query.fasta \
--input /database/UNITE/unite_test_query.fasta \
--isolates /database/UNITE/unite_test_isos.fasta \
--trainfile /database/UNITE/training_files \
--tax /database/UNITE/taxonomy_assignements \
--output /database/UNITE/taxonomy_assignements \
--blast \
--make_plot \
--conf 0.8 \
--make_plot \
--pathfile /database/UNITE/pathfile.txt

I get this:

Welcome to CONSTAX version 2.0.9 build 0 - The CONSensus TAXonomy classifier
This software is distributed under MIT License
© Copyright 2020, Julian A. Liber, Gian M. N. Benucci & Gregory M. Bonito
https://github.com/liberjul/CONSTAXv2
https://constax.readthedocs.io/

Please cite us as:
CONSTAX2: Improved taxonomic classification of environmental DNA markers
Julian Aaron Liber, Gregory Bonito, Gian Maria Niccolò Benucci
bioRxiv 2021.02.15.430803; doi: https://doi.org/10.1101/2021.02.15.430803
Training, with output to /database/UNITE/training_files...
Pathfile input not found in local directory ...
Pathfile input not found at /biotools/miniconda3/envs/constax2/pkgs/constax-2.0.9-0/opt/constax-2.0.9/pathfile.txt ...
Pathfile input not found at /biotools/miniconda3/envs/constax2/pkgs/constax-2.0.9-placeholder/opt/constax-2.0.9/pathfile.txt ...
Pathfile input found at /biotools/miniconda3/envs/constax2/opt/constax-2.0.9/pathfile.txt ...
All needed executables exist.
SINTAX: vsearch
RDP: classifier
CONSTAX: /biotools/miniconda3/envs/constax2/opt/constax-2.0.9
Memory size: 32000mb
Importing subscripts from /biotools/miniconda3/envs/constax2/opt/constax-2.0.9

____________________________________________________________________
Reformatting database

UNITE format detected

Reference database FASTAs formatted in 1.753519917 seconds...

        Training Taxonomy

        Adding Full Lineage

Database formatting complete
____________________________________________________________________

__________________________________________________________________________
Training SINTAX Classifier
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 247: vsearch: command not found
__________________________________________________________________________
Training BLAST Classifier
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 258: makeblastdb: command not found
__________________________________________________________________________
Training RDP Classifier
Error: Unable to access jarfile classifier
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 322: blastn: command not found
__________________________________________________________________________
Assigning taxonomy to OTU's representative sequences
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 334: vsearch: command not found
sed: can't read /database/UNITE/taxonomy_assignements/otu_taxonomy.sintax: No such file or directory
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 347: blastn: command not found
Error: Unable to access jarfile classifier
__________________________________________________________________________
Comparing to Isolates
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 373: makeblastdb: command not found
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 375: blastn: command not found
rm: cannot remove '/database/UNITE/taxonomy_assignements/unite_test_isos__BLAST.n*': No such file or directory
Combining Taxonomies
Traceback (most recent call last):
  File "/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/CombineTaxonomy.py", line 532, in <module>
    open(file_name,"r")
FileNotFoundError: [Errno 2] No such file or directory: '/database/UNITE/taxonomy_assignements/otu_taxonomy.rdp'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/CombineTaxonomy.py", line 534, in <module>
    raise FileNotFoundError(F"{classifier.upper()} file could not be opened.")
FileNotFoundError: RDP file could not be opened.
/biotools/miniconda3/envs/constax2/opt/constax-2.0.9/constax_no_inputs.sh: line 410: Rscript: command not found
Gian77 commented 3 years ago

Beside you have `--make-plot' twice in your script. Do you have vsearch in the same conda environment?

Please post

conda info --envs
conda activate < your constax environment> 
conda list
ramiroricardo commented 3 years ago

hi @Gian77 ,

Thanks for pointing out the error with --make_plot, but removing the duplication has no effect on output.

Here are the outputs of the commands that you asked for:

conda info --envs

# conda environments:
#
base                  *  /biotools/miniconda3
abricate                 /biotools/miniconda3/envs/abricate
aspera                   /biotools/miniconda3/envs/aspera
assemblers               /biotools/miniconda3/envs/assemblers
bowtie2                  /biotools/miniconda3/envs/bowtie2
checkm                   /biotools/miniconda3/envs/checkm
constax2                 /biotools/miniconda3/envs/constax2
coverm                   /biotools/miniconda3/envs/coverm
fastp                    /biotools/miniconda3/envs/fastp
instrain                 /biotools/miniconda3/envs/instrain
iqtree                   /biotools/miniconda3/envs/iqtree
panacota                 /biotools/miniconda3/envs/panacota
parallel-fastq-dump      /biotools/miniconda3/envs/parallel-fastq-dump
prodigal                 /biotools/miniconda3/envs/prodigal
py2                      /biotools/miniconda3/envs/py2
quast                    /biotools/miniconda3/envs/quast
samtools                 /biotools/miniconda3/envs/samtools
trimal                   /biotools/miniconda3/envs/trimal

conda activate constax2 conda list

# packages in environment at /biotools/miniconda3/envs/constax2:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
_r-mutex                  1.0.1               anacondar_1    conda-forge
binutils_impl_linux-64    2.35.1               h193b22a_2    conda-forge
binutils_linux-64         2.35                h67ddf6f_30    conda-forge
blast                     2.5.0                hc0b0e79_3    bioconda
boost                     1.75.0           py39h5472131_0    conda-forge
boost-cpp                 1.75.0               hc6e9bd1_0    conda-forge
bwidget                   1.9.14               ha770c72_0    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.17.1               h7f98852_1    conda-forge
ca-certificates           2020.12.5            ha878542_0    conda-forge
cairo                     1.16.0            h6cf1ce9_1008    conda-forge
certifi                   2020.12.5        py39hf3d152e_1    conda-forge
constax                   2.0.9                hdfd78af_0    bioconda
curl                      7.76.1               h979ede3_1    conda-forge
fontconfig                2.13.1            hba837de_1005    conda-forge
freetype                  2.10.4               h0708190_1    conda-forge
fribidi                   1.0.10               h36c2ea0_0    conda-forge
gcc_impl_linux-64         9.3.0               h70c0ae5_19    conda-forge
gcc_linux-64              9.3.0               hf25ea35_30    conda-forge
gettext                   0.19.8.1          h0b5b191_1005    conda-forge
gfortran_impl_linux-64    9.3.0               hc4a2995_19    conda-forge
gfortran_linux-64         9.3.0               hdc58fab_30    conda-forge
graphite2                 1.3.13            h58526e2_1001    conda-forge
gsl                       2.6                  he838d99_2    conda-forge
gxx_impl_linux-64         9.3.0               hd87eabc_19    conda-forge
gxx_linux-64              9.3.0               h3fbe746_30    conda-forge
harfbuzz                  2.8.0                h83ec7ef_1    conda-forge
icu                       68.1                 h58526e2_0    conda-forge
jpeg                      9d                   h36c2ea0_0    conda-forge
kernel-headers_linux-64   2.6.32              h77966d4_13    conda-forge
krb5                      1.17.2               h926e7f8_0    conda-forge
ld_impl_linux-64          2.35.1               hea4e1c9_2    conda-forge
libblas                   3.9.0                8_openblas    conda-forge
libcblas                  3.9.0                8_openblas    conda-forge
libcurl                   7.76.1               hc4aaa36_1    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libffi                    3.3                  h58526e2_2    conda-forge
libgcc-devel_linux-64     9.3.0               h7864c58_19    conda-forge
libgcc-ng                 9.3.0               h2828fa1_19    conda-forge
libgfortran-ng            9.3.0               hff62375_19    conda-forge
libgfortran5              9.3.0               hff62375_19    conda-forge
libglib                   2.68.1               h3e27bee_0    conda-forge
libgomp                   9.3.0               h2828fa1_19    conda-forge
libiconv                  1.16                 h516909a_0    conda-forge
liblapack                 3.9.0                8_openblas    conda-forge
libnghttp2                1.43.0               h812cca2_0    conda-forge
libopenblas               0.3.12          pthreads_h4812303_1    conda-forge
libpng                    1.6.37               h21135ba_2    conda-forge
libssh2                   1.9.0                ha56f1ee_6    conda-forge
libstdcxx-devel_linux-64  9.3.0               hb016644_19    conda-forge
libstdcxx-ng              9.3.0               h6de172a_19    conda-forge
libtiff                   4.2.0                hdc55705_1    conda-forge
libuuid                   2.32.1            h7f98852_1000    conda-forge
libwebp-base              1.2.0                h7f98852_2    conda-forge
libxcb                    1.13              h7f98852_1003    conda-forge
libxml2                   2.9.10               h72842e0_4    conda-forge
lz4-c                     1.9.3                h9c3ff4c_0    conda-forge
make                      4.3                  hd18ef5c_1    conda-forge
ncurses                   6.2                  h58526e2_4    conda-forge
numpy                     1.20.2           py39hdbf815f_0    conda-forge
openjdk                   8.0.282              h7f98852_0    conda-forge
openssl                   1.1.1k               h7f98852_0    conda-forge
pandas                    1.2.4            py39hde0f152_0    conda-forge
pango                     1.48.4               hb8ff022_0    conda-forge
pcre                      8.44                 he1b5a44_0    conda-forge
pcre2                     10.36                h032f7d1_1    conda-forge
pip                       21.0.1             pyhd8ed1ab_0    conda-forge
pixman                    0.40.0               h36c2ea0_0    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
python                    3.9.2           hffdb5ce_0_cpython    conda-forge
python-dateutil           2.8.1                      py_0    conda-forge
python_abi                3.9                      1_cp39    conda-forge
pytz                      2021.1             pyhd8ed1ab_0    conda-forge
r-base                    4.0.3                h349a78a_8    conda-forge
rdptools                  2.0.3                hdfd78af_1    bioconda
readline                  8.0                  he28a2e2_2    conda-forge
sed                       4.8                  he412f7d_0    conda-forge
setuptools                49.6.0           py39hf3d152e_3    conda-forge
six                       1.15.0             pyh9f0ad1d_0    conda-forge
sqlite                    3.35.4               h74cdb3f_0    conda-forge
sysroot_linux-64          2.12                h77966d4_13    conda-forge
tk                        8.6.10               h21135ba_1    conda-forge
tktable                   2.10                 hb7b940f_3    conda-forge
tzdata                    2021a                he74cb21_0    conda-forge
vsearch                   2.17.0               h95f258a_1    bioconda
wheel                     0.36.2             pyhd3deb0d_0    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.0.10               h7f98852_0    conda-forge
xorg-libsm                1.2.3             hd9c2040_1000    conda-forge
xorg-libx11               1.7.0                h7f98852_0    conda-forge
xorg-libxau               1.0.9                h7f98852_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h7f98852_1    conda-forge
xorg-libxrender           0.9.10            h7f98852_1003    conda-forge
xorg-libxt                1.2.1                h7f98852_2    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h7f98852_1002    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
zlib                      1.2.11            h516909a_1010    conda-forge
zstd                      1.4.9                ha95c52a_0    conda-forge

Thanks again for your help

liberjul commented 3 years ago

Hi @ramiroricardo,

It appears that the environment is not activated when subscripts were called, so I updated the constax_wrapper.py script to activate the conda environment in subprocesses. You can replace your script in /biotools/miniconda3/envs/constax2/opt/constax-2.0.9/ with this new one from https://github.com/liberjul/CONSTAXv2/blob/master/constax_wrapper.py. It should already be symbolically linked to your binary directory and able to be run without additional steps.

If you rerun constax after replacing this script, does it work? Please show the output if not.

Julian

ramiroricardo commented 3 years ago

Hi all,

Sorry that I have not replied in a while, but only managed to get back to this now. I reinstalled constax with conda and am now using constax version 2.0.13. I have also tried to replace the constax_wrapper.py, but irrespectively of whether I keep the constax_wrapper.py file installed with conda or use this one, I get the same result. So when I run:

constax --num_threads 10 --mem 32000 --db /database/UNITE/sh_general_release_04.02.2020/sh_general_release_dynamic_04.02.2020.fasta --train --input /database/UNITE/unite_test_query.fasta --isolates /database/UNITE/unite_test_isos.fasta --trainfile /database/UNITE/training_files --tax /database/UNITE/taxonomy_assignements --output /database/UNITE/taxonomy_assignements --blast --make_plot --conf 0.8 --pathfile /database/UNITE/pathfile.txt

I get the errors

Welcome to CONSTAX version 2.0.13 build 0 - The CONSensus TAXonomy classifier
This software is distributed under MIT License
© Copyright 2021, Julian A. Liber, Gian M. N. Benucci & Gregory M. Bonito
https://github.com/liberjul/CONSTAXv2
https://constax.readthedocs.io/

Please cite us as:
CONSTAX2: Improved taxonomic classification of environmental DNA markers
Julian Aaron Liber, Gregory Bonito, Gian Maria Niccolò Benucci
bioRxiv 2021.02.15.430803; doi: https://doi.org/10.1101/2021.02.15.430803
Overwritting previous classification...
Overwritting previous taxonomy assignments...
Performing training and overwritting training files...
Pathfile input not found in local directory ...
Pathfile input not found at /biotools/miniconda3/envs/constax2/pkgs/constax-2.0.13-0/opt/constax-2.0.13/pathfile.txt ...
Pathfile input not found at /biotools/miniconda3/envs/constax2/pkgs/constax-2.0.13-placeholder/opt/constax-2.0.13/pathfile.txt ...
Pathfile input found at /biotools/miniconda3/envs/constax2/opt/constax-2.0.13/pathfile.txt ...
All needed executables exist.
SINTAX: vsearch
RDP: classifier
CONSTAX: /biotools/miniconda3/envs/constax2/opt/constax-2.0.13
Memory size: 32000mb
Importing subscripts from /biotools/miniconda3/envs/constax2/opt/constax-2.0.13

____________________________________________________________________
Reformatting database

UNITE format detected

Reference database FASTAs formatted in 0.9438644399999999 seconds...

        Training Taxonomy

        Adding Full Lineage

Database formatting complete
____________________________________________________________________

__________________________________________________________________________
Training SINTAX Classifier
__________________________________________________________________________
Training BLAST Classifier
__________________________________________________________________________
Training RDP Classifier
Error: Unable to access jarfile classifier
__________________________________________________________________________
Assigning taxonomy to OTU's representative sequences
__________________________________________________________________________
Comparing to Isolates
Combining Taxonomies
bash: activate: No such file or directory
Welcome to CONSTAX version 2.0.13 build 0 - The CONSensus TAXonomy classifier
This software is distributed under MIT License
© Copyright 2021, Julian A. Liber, Gian M. N. Benucci & Gregory M. Bonito
https://github.com/liberjul/CONSTAXv2
https://constax.readthedocs.io/

Please cite us as:
CONSTAX2: Improved taxonomic classification of environmental DNA markers
Julian Aaron Liber, Gregory Bonito, Gian Maria Niccolò Benucci
bioRxiv 2021.02.15.430803; doi: https://doi.org/10.1101/2021.02.15.430803
Overwritting previous classification...
Overwritting previous taxonomy assignments...
Performing training and overwritting training files...
Pathfile input not found in local directory ...
Pathfile input not found at /biotools/miniconda3/envs/constax2/pkgs/constax-2.0.13-0/opt/constax-2.0.13/pathfile.txt ...
Pathfile input not found at /biotools/miniconda3/envs/constax2/pkgs/constax-2.0.13-placeholder/opt/constax-2.0.13/pathfile.txt ...
Pathfile input found at /biotools/miniconda3/envs/constax2/opt/constax-2.0.13/pathfile.txt ...
All needed executables exist.
SINTAX: vsearch
RDP: classifier
CONSTAX: /biotools/miniconda3/envs/constax2/opt/constax-2.0.13
Memory size: 32000mb
Importing subscripts from /biotools/miniconda3/envs/constax2/opt/constax-2.0.13

____________________________________________________________________
Reformatting database

UNITE format detected

Reference database FASTAs formatted in 0.8888717220000001 seconds...

        Training Taxonomy

        Adding Full Lineage

Database formatting complete
____________________________________________________________________

__________________________________________________________________________
Training SINTAX Classifier
/biotools/miniconda3/envs/constax2/opt/constax-2.0.13/constax_no_inputs.sh: line 252: vsearch: command not found
__________________________________________________________________________
Training BLAST Classifier
/biotools/miniconda3/envs/constax2/opt/constax-2.0.13/constax_no_inputs.sh: line 263: makeblastdb: command not found
__________________________________________________________________________
Training RDP Classifier
Error: Unable to access jarfile classifier
/biotools/miniconda3/envs/constax2/opt/constax-2.0.13/constax_no_inputs.sh: line 327: blastn: command not found
__________________________________________________________________________
Assigning taxonomy to OTU's representative sequences
/biotools/miniconda3/envs/constax2/opt/constax-2.0.13/constax_no_inputs.sh: line 339: vsearch: command not found
sed: can't read /database/UNITE/taxonomy_assignements/otu_taxonomy.sintax: No such file or directory
/biotools/miniconda3/envs/constax2/opt/constax-2.0.13/constax_no_inputs.sh: line 352: blastn: command not found
Error: Unable to access jarfile classifier
__________________________________________________________________________
Comparing to Isolates
/biotools/miniconda3/envs/constax2/opt/constax-2.0.13/constax_no_inputs.sh: line 378: makeblastdb: command not found
/biotools/miniconda3/envs/constax2/opt/constax-2.0.13/constax_no_inputs.sh: line 380: blastn: command not found
rm: cannot remove '/database/UNITE/taxonomy_assignements/unite_test_isos__BLAST.n*': No such file or directory
Combining Taxonomies
Traceback (most recent call last):
  File "/biotools/miniconda3/envs/constax2/opt/constax-2.0.13/CombineTaxonomy.py", line 565, in <module>
    open(file_name,"r")
FileNotFoundError: [Errno 2] No such file or directory: '/database/UNITE/taxonomy_assignements/otu_taxonomy.rdp'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/biotools/miniconda3/envs/constax2/opt/constax-2.0.13/CombineTaxonomy.py", line 567, in <module>
    raise FileNotFoundError(F"{classifier.upper()} file could not be opened.")
FileNotFoundError: RDP file could not be opened.
/biotools/miniconda3/envs/constax2/opt/constax-2.0.13/constax_no_inputs.sh: line 421: Rscript: command not found

If you have any idea on what could be leading to this, I would like to keep trying to solve it. Thanks for your help

ramiroricardo commented 3 years ago

Hi all,

I have tried to run the same code in a different virtual machine and now it does seem to run. Not sure what was the problem in the other VM. However, I am getting an error at the end. I am not sure, but it seems to me that constax is running twice, which might be leading to that final error.

So I ran:

constax \
> --num_threads 10 \
> --mem 32000 \
> --db constax2/UNITE/sh_general_release_04.02.2020/sh_general_release_dynamic_04.02.2020.fasta \
> --train \
--input constax2/UNITE/unite_test_query.fasta \
> --input constax2/UNITE/unite_test_query.fasta \
> --isolates constax2/UNITE/unite_test_isos.fasta \
> --trainfile constax2/UNITE/training_files \
> --tax constax2/UNITE/taxonomy_assignements \
> --output constax2/UNITE/taxonomy_assignements \
> --blast \
> --make_plot \
> --conf 0.8

and got:

Welcome to CONSTAX version 2.0.13 build 0 - The CONSensus TAXonomy classifier
This software is distributed under MIT License
© Copyright 2021, Julian A. Liber, Gian M. N. Benucci & Gregory M. Bonito
https://github.com/liberjul/CONSTAXv2
https://constax.readthedocs.io/

Please cite us as:
CONSTAX2: Improved taxonomic classification of environmental DNA markers
Julian Aaron Liber, Gregory Bonito, Gian Maria Niccolò Benucci
bioRxiv 2021.02.15.430803; doi: https://doi.org/10.1101/2021.02.15.430803
Training, with output to constax2/UNITE/training_files...
Pathfile input not found in local directory ...
Pathfile input not found at /software/miniconda3/envs/constax2/pkgs/constax-2.0.13-0/opt/constax-2.0.13/pathfile.txt ...
Pathfile input not found at /software/miniconda3/envs/constax2/pkgs/constax-2.0.13-placeholder/opt/constax-2.0.13/pathfile.txt ...
Pathfile input found at /software/miniconda3/envs/constax2/opt/constax-2.0.13/pathfile.txt ...
All needed executables exist.
SINTAX: vsearch
RDP: classifier
CONSTAX: /software/miniconda3/envs/constax2/opt/constax-2.0.13
Memory size: 32000mb
Importing subscripts from /software/miniconda3/envs/constax2/opt/constax-2.0.13

____________________________________________________________________
Reformatting database

UNITE format detected

Reference database FASTAs formatted in 0.6641772109999999 seconds...

        Training Taxonomy

        Adding Full Lineage

Database formatting complete
____________________________________________________________________

__________________________________________________________________________
Training SINTAX Classifier
__________________________________________________________________________
Training BLAST Classifier

Building a new DB, current time: 06/24/2021 16:03:20
New DB name:   /home/rramiro/constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__BLAST
New DB title:  constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__RDP_trained.fasta
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 47741 sequences in 2.98552 seconds.
__________________________________________________________________________
Training RDP Classifier
edu.msu.cme.rdp.classifier.train.NameRankDupException: Error: duplicate taxon name and rank in the taxonomy file.
cylindrium      genus   2
cenangiopsis    genus   2
brevicollum     genus   2
cryptococcus    genus   2
aleurina        genus   2

        at edu.msu.cme.rdp.classifier.train.TreeFactory.creatTaxidMap(TreeFactory.java:126)
        at edu.msu.cme.rdp.classifier.train.TreeFactory.<init>(TreeFactory.java:61)
        at edu.msu.cme.rdp.classifier.train.ClassifierTraineeMaker.<init>(ClassifierTraineeMaker.java:63)
        at edu.msu.cme.rdp.classifier.train.ClassifierTraineeMaker.main(ClassifierTraineeMaker.java:170)
        at edu.msu.cme.rdp.classifier.cli.ClassifierMain.main(ClassifierMain.java:77)
RDP training error, redoing with duplicate taxa
Importing subscripts from /software/miniconda3/envs/constax2/opt/constax-2.0.13

____________________________________________________________________
Reformatting database

UNITE format detected

Reference database FASTAs formatted in 0.658936341 seconds...

        Training Taxonomy

        Adding Full Lineage

Database formatting complete
____________________________________________________________________

RDP training error overcome, continuing with classification after SINTAX is retrained
__________________________________________________________________________
Assigning taxonomy to OTU's representative sequences
__________________________________________________________________________
Comparing to Isolates

Building a new DB, current time: 06/24/2021 16:13:00
New DB name:   /home/rramiro/constax2/UNITE/taxonomy_assignements/unite_test_isos__BLAST
New DB title:  constax2/UNITE/taxonomy_assignements/isolates_formatted.fasta
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 10 sequences in 0.00118494 seconds.
Combining Taxonomies

____________________________________________________________________
Reformatting RDP file

        Done

Reformatting SINTAX file

        Done

Reformatting BLAST file

        Done

Reformatting isolate result file

        Done

Generating consensus taxonomy & combined taxonomy table

        Done

Generating classification counts & summary table

        Done

____________________________________________________________________

                                                                                                                                                                   V1
1                                                                                                                                                              OTU_ID
2 Entoloma_vindobonense|JX454802|SH1569086.08FU|refs|k__Fungi;p__Basidiomycota;c__Agaricomycetes;o__Agaricales;f__Entolomataceae;g__Entoloma;s__Entoloma_vindobonense
3     Entolomataceae_sp|FR682185|SH1569069.08FU|reps|k__Fungi;p__Basidiomycota;c__Agaricomycetes;o__Agaricales;f__Entolomataceae;g__unidentified;s__Entolomataceae_sp
4    Entoloma_pallescens|UDB025007|SH1569094.08FU|reps|k__Fungi;p__Basidiomycota;c__Agaricomycetes;o__Agaricales;f__Entolomataceae;g__Entoloma;s__Entoloma_pallescens
5   Entolomataceae_sp|UDB0729740|SH1569083.08FU|reps|k__Fungi;p__Basidiomycota;c__Agaricomycetes;o__Agaricales;f__Entolomataceae;g__unidentified;s__Entolomataceae_sp
6    Entoloma_byssisedum|UDB015478|SH1569062.08FU|refs|k__Fungi;p__Basidiomycota;c__Agaricomycetes;o__Agaricales;f__Entolomataceae;g__Entoloma;s__Entoloma_byssisedum
       V2              V3               V4           V5               V6
1 Kingdom          Phylum            Class        Order           Family
2 Fungi_1 Basidiomycota_1 Agaricomycetes_1 Agaricales_1 Entolomataceae_1
3 Fungi_1 Basidiomycota_1 Agaricomycetes_1 Agaricales_1 Entolomataceae_1
4 Fungi_1 Basidiomycota_1 Agaricomycetes_1 Agaricales_1 Entolomataceae_1
5 Fungi_1 Basidiomycota_1 Agaricomycetes_1 Agaricales_1 Entolomataceae_1
6 Fungi_1 Basidiomycota_1 Agaricomycetes_1 Agaricales_1 Entolomataceae_1
          V7                     V8                             V9
1      Genus                Species                        Isolate
2 Entoloma_1 Entoloma vindobonense  Entoloma_vindobonense|JX454802
3 Entoloma_1                            Entolomataceae_sp|FR682185
4 Entoloma_1   Entoloma pallescens   Entoloma_pallescens|UDB025007
5                                     Entolomataceae_sp|UDB0729740
6 Entoloma_1   Entoloma byssisedum   Entoloma_byssisedum|UDB015478
                 V10                 V11
1 Isolate_percent_id Isolate_query_cover
2            100.000                 100
3            100.000                 100
4            100.000                 100
5            100.000                 100
6            100.000                 100
   user  system elapsed
  0.003   0.000   0.003
bash: activate: No such file or directory
Welcome to CONSTAX version 2.0.13 build 0 - The CONSensus TAXonomy classifier
This software is distributed under MIT License
© Copyright 2021, Julian A. Liber, Gian M. N. Benucci & Gregory M. Bonito
https://github.com/liberjul/CONSTAXv2
https://constax.readthedocs.io/

Please cite us as:
CONSTAX2: Improved taxonomic classification of environmental DNA markers
Julian Aaron Liber, Gregory Bonito, Gian Maria Niccolò Benucci
bioRxiv 2021.02.15.430803; doi: https://doi.org/10.1101/2021.02.15.430803
Overwritting previous classification...
Overwritting previous taxonomy assignments...
Performing training and overwritting training files...
Pathfile input not found in local directory ...
Pathfile input not found at /software/miniconda3/envs/constax2/pkgs/constax-2.0.13-0/opt/constax-2.0.13/pathfile.txt ...
Pathfile input not found at /software/miniconda3/envs/constax2/pkgs/constax-2.0.13-placeholder/opt/constax-2.0.13/pathfile.txt ...
Pathfile input found at /software/miniconda3/envs/constax2/opt/constax-2.0.13/pathfile.txt ...
All needed executables exist.
SINTAX: vsearch
RDP: classifier
CONSTAX: /software/miniconda3/envs/constax2/opt/constax-2.0.13
Memory size: 32000mb
Importing subscripts from /software/miniconda3/envs/constax2/opt/constax-2.0.13

____________________________________________________________________
Reformatting database

UNITE format detected

Reference database FASTAs formatted in 0.667302839 seconds...

        Training Taxonomy

        Adding Full Lineage

Database formatting complete
____________________________________________________________________

__________________________________________________________________________
Training SINTAX Classifier
vsearch v2.17.1_linux_x86_64, 115.3GB RAM, 20 cores
https://github.com/torognes/vsearch

Reading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasta 100%
26124492 nt in 47741 seqs, min 141, max 3526, avg 547
Masking 100%
Counting k-mers 100%
Creating k-mer index 100%
Writing UDB file 100%
__________________________________________________________________________
Training BLAST Classifier

Building a new DB, current time: 06/24/2021 16:13:17
New DB name:   /home/rramiro/constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__BLAST
New DB title:  constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__RDP_trained.fasta
Sequence type: Nucleotide
Deleted existing Nucleotide BLAST database named /home/rramiro/constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__BLAST
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 47741 sequences in 3.02434 seconds.
__________________________________________________________________________
Training RDP Classifier
edu.msu.cme.rdp.classifier.train.NameRankDupException: Error: duplicate taxon name and rank in the taxonomy file.
cylindrium      genus   2
cenangiopsis    genus   2
brevicollum     genus   2
cryptococcus    genus   2
aleurina        genus   2

        at edu.msu.cme.rdp.classifier.train.TreeFactory.creatTaxidMap(TreeFactory.java:126)
        at edu.msu.cme.rdp.classifier.train.TreeFactory.<init>(TreeFactory.java:61)
        at edu.msu.cme.rdp.classifier.train.ClassifierTraineeMaker.<init>(ClassifierTraineeMaker.java:63)
        at edu.msu.cme.rdp.classifier.train.ClassifierTraineeMaker.main(ClassifierTraineeMaker.java:170)
        at edu.msu.cme.rdp.classifier.cli.ClassifierMain.main(ClassifierMain.java:77)
RDP training error, redoing with duplicate taxa
Importing subscripts from /software/miniconda3/envs/constax2/opt/constax-2.0.13

____________________________________________________________________
Reformatting database

UNITE format detected

Reference database FASTAs formatted in 0.654153012 seconds...

        Training Taxonomy

        Adding Full Lineage

Database formatting complete
____________________________________________________________________

RDP training error overcome, continuing with classification after SINTAX is retrained
vsearch v2.17.1_linux_x86_64, 115.3GB RAM, 20 cores
https://github.com/torognes/vsearch

Reading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasReading file constax2/UNITE/training_files/sh_general_release_dynamic_04.02.2020__UTAX.fasta 100%
26124492 nt in 47741 seqs, min 141, max 3526, avg 547
Masking 100%
Counting k-mers 100%
Creating k-mer index 100%
Writing UDB file 100%
__________________________________________________________________________
Assigning taxonomy to OTU's representative sequences
vsearch v2.17.1_linux_x86_64, 115.3GB RAM, 20 cores
https://github.com/torognes/vsearch

Reading UDB file constax2/UNITE/training_files/sintax.db 100%
Reorganizing data in memory 100%
Creating bitmaps 100%
Parsing abundances 100%
26124492 nt in 47741 seqs, min 141, max 3526, avg 547
Classifying sequences 100%
Classified 10 of 10 sequences (100.00%)
__________________________________________________________________________
Comparing to Isolates

Building a new DB, current time: 06/24/2021 16:22:46
New DB name:   /home/rramiro/constax2/UNITE/taxonomy_assignements/unite_test_isos__BLAST
New DB title:  constax2/UNITE/taxonomy_assignements/isolates_formatted.fasta
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 10 sequences in 0.0012219 seconds.
Combining Taxonomies

____________________________________________________________________
Reformatting RDP file

        Done

Reformatting SINTAX file

        Done

Reformatting BLAST file

        Done

Reformatting isolate result file

        Done

Generating consensus taxonomy & combined taxonomy table

        Done

Generating classification counts & summary table

        Done

____________________________________________________________________

Loading required package: ggplot2
                                                                                                                                                                   V1
1                                                                                                                                                              OTU_ID
2 Entoloma_vindobonense|JX454802|SH1569086.08FU|refs|k__Fungi;p__Basidiomycota;c__Agaricomycetes;o__Agaricales;f__Entolomataceae;g__Entoloma;s__Entoloma_vindobonense
3     Entolomataceae_sp|FR682185|SH1569069.08FU|reps|k__Fungi;p__Basidiomycota;c__Agaricomycetes;o__Agaricales;f__Entolomataceae;g__unidentified;s__Entolomataceae_sp
4    Entoloma_pallescens|UDB025007|SH1569094.08FU|reps|k__Fungi;p__Basidiomycota;c__Agaricomycetes;o__Agaricales;f__Entolomataceae;g__Entoloma;s__Entoloma_pallescens
5   Entolomataceae_sp|UDB0729740|SH1569083.08FU|reps|k__Fungi;p__Basidiomycota;c__Agaricomycetes;o__Agaricales;f__Entolomataceae;g__unidentified;s__Entolomataceae_sp
6    Entoloma_byssisedum|UDB015478|SH1569062.08FU|refs|k__Fungi;p__Basidiomycota;c__Agaricomycetes;o__Agaricales;f__Entolomataceae;g__Entoloma;s__Entoloma_byssisedum
       V2              V3               V4           V5               V6
1 Kingdom          Phylum            Class        Order           Family
2 Fungi_1 Basidiomycota_1 Agaricomycetes_1 Agaricales_1 Entolomataceae_1
3 Fungi_1 Basidiomycota_1 Agaricomycetes_1 Agaricales_1 Entolomataceae_1
4 Fungi_1 Basidiomycota_1 Agaricomycetes_1 Agaricales_1 Entolomataceae_1
5 Fungi_1 Basidiomycota_1 Agaricomycetes_1 Agaricales_1 Entolomataceae_1
6 Fungi_1 Basidiomycota_1 Agaricomycetes_1 Agaricales_1 Entolomataceae_1
          V7                     V8                             V9
1      Genus                Species                        Isolate
2 Entoloma_1 Entoloma vindobonense  Entoloma_vindobonense|JX454802
3 Entoloma_1                            Entolomataceae_sp|FR682185
4 Entoloma_1   Entoloma pallescens   Entoloma_pallescens|UDB025007
5                                     Entolomataceae_sp|UDB0729740
6 Entoloma_1   Entoloma byssisedum   Entoloma_byssisedum|UDB015478
                 V10                 V11
1 Isolate_percent_id Isolate_query_cover
2            100.000                 100
3            100.000                 100
4            100.000                 100
5            100.000                 100
6            100.000                 100
   user  system elapsed
  0.002   0.000   0.003
Error in `$<-.data.frame`(`*tmp*`, Classifier, value = c("RDP", "BLAST",  :
  replacement has 28 rows, data has 11
Calls: $<- -> $<-.data.frame
Execution halted

Also, in the tutorial, you show a consensus_taxonomy.txt file, which I don't see in my results. Was this replaced by the constax_taxonomy.txt output?

Best, Ramiro

liberjul commented 3 years ago

Hi Ramiro,

I am currently working on a fix to some of the pathfile issues which seem to be present. Also, I need to update the tutorial to reflect that we changed consensus_taxonomy.txt to constax_taxonomy.txt.

It probably ran twice because an error was detected, but this error was overcome in the script. I will work to make the double run not occur in that case.

Thank you again for the feedback, and I will be posted another update shortly so you can run yours on a local machine.

Julian

liberjul commented 3 years ago

It appears that running twice was not caused by the RDP duplicate taxa error, but instead something else. Could you upload the contents of the log file found in your working directory? It should be named log_constax2_<year>-<month>-<day>_<hr>-<min>-<sec>.txt

I'll also look into the error in the plotting script.

ramiroricardo commented 3 years ago

Hi Julian,

I was just testing and the --make_plot option appears to be the culprit for this to run twice. If I run constax without it, it runs once with no errors.

I am attaching two log files run with and without this option. The latest one is without the option.

log_constax2_2021-06-24_17-27-05.txt log_constax2_2021-06-24_17-28-57.txt

liberjul commented 3 years ago

Thanks for sending those. I fixed the Rscript (new one below), and will push a new version once I get these path issues worked out.

###############################################
#       Taxonomy Assignment Comparison        #
#               Gian MN Benucci               #
#             benucci[at]msu.edu              #
###############################################

if(!require(ggplot2)){
  install.packages("ggplot2")
  library(ggplot2)
}

args <- commandArgs(trailingOnly=TRUE)
output_dir <- args[1]
blast <- as.logical(args[2])
format <- args[3]

comb_tax = read.table(paste(output_dir, "combined_taxonomy.txt", sep=""), header=TRUE, row.names=1, sep="\t")
head(comb_tax)
system.time(comb_tax[comb_tax==''|comb_tax==' ']<-NA)

sapply(comb_tax, function(x) sum(is.na(x))) -> unassigned_comb_tax
comb_tax_df <- as.data.frame(unassigned_comb_tax)
if (format == "UNITE"){
  if (blast){
    comb_tax_df$Classifier <- rep(c("RDP", "BLAST", "SINTAX", "CONSTAX"), 7)

    comb_tax_df$Rank <- row.names(comb_tax_df)
    comb_tax_df$Assigned <- sqrt((comb_tax_df$unassigned_comb_tax -nrow(comb_tax))^2)
    comb_tax_df

    comb_tax_df$Classifier <- factor(comb_tax_df$Classifier, levels = c("RDP","BLAST","SINTAX","CONSENSUS"))
  } else {
    comb_tax_df$Classifier <- rep(c("RDP", "SINTAX", "UTAX", "CONSTAX"), 7)

    comb_tax_df$Rank <- row.names(comb_tax_df)
    comb_tax_df$Assigned <- sqrt((comb_tax_df$unassigned_comb_tax -nrow(comb_tax))^2)
    comb_tax_df

    comb_tax_df$Classifier <- factor(comb_tax_df$Classifier, levels = c("RDP","UTAX","SINTAX","CONSENSUS"))
  }
} else {
  rank_count <- (dim(comb_tax)[1]-1)/3
  if (blast){
    comb_tax_df$Classifier <- rep(c("RDP", "BLAST", "SINTAX", "CONSENSUS"), rank_count)

    comb_tax_df$Rank <- row.names(comb_tax_df)
    comb_tax_df$Assigned <- sqrt((comb_tax_df$unassigned_comb_tax -nrow(comb_tax))^2)
    comb_tax_df

    comb_tax_df$Classifier <- factor(comb_tax_df$Classifier, levels = c("RDP","BLAST","SINTAX","CONSENSUS"))
  } else {
    comb_tax_df$Classifier <- rep(c("RDP", "SINTAX", "UTAX", "CONSENSUS"), rank_count)

    comb_tax_df$Rank <- row.names(comb_tax_df)
    comb_tax_df$Assigned <- sqrt((comb_tax_df$unassigned_comb_tax -nrow(comb_tax))^2)
    comb_tax_df

    comb_tax_df$Classifier <- factor(comb_tax_df$Classifier, levels = c("RDP","UTAX","SINTAX","CONSENSUS"))
  }
}

pdf(paste(output_dir, "TaxonomicAssignmentComparison_plot.pdf", sep=""))
ggplot(comb_tax_df, aes(x = Rank, y = Assigned, fill= Classifier)) +
  geom_bar(stat = "identity") +
  scale_x_discrete(limits=comb_tax_df$Rank) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1),
        panel.grid=element_blank(),
        panel.background=element_blank()) +
  #scale_fill_manual(values=mycols) +
  ggtitle("Taxonomy Assignments Comparison") +
  labs(x="Taxonomic Ranks", y="Number of classified OTUs") +
  theme(axis.text.x = element_text(vjust=0.5, size=8)) +
  theme(axis.text.y = element_text(hjust=0.5, size=8)) +
  theme(plot.title = element_text(size = 15, face = "bold", hjust = 0.5))
dev.off()
mtva0001 commented 2 years ago

Hi,

I have similar issue but my log file has a bit different error messages, namely:

/opt/anaconda3/envs/CONSTAX/opt/constax-2.0.15-0/constax_no_inputs.sh: line 127: blastn: command not found Traceback (most recent call last): File "/opt/anaconda3/envs/CONSTAX/opt/constax-2.0.15-0/check_input_names.py", line 8, in import numpy as np ModuleNotFoundError: No module named 'numpy' /opt/anaconda3/envs/CONSTAX/opt/constax-2.0.15-0/constax_no_inputs.sh: line 343: vsearch: command not found sed: taxonomy_assignements//otu_taxonomy.sintax: No such file or directory Traceback (most recent call last): File "/opt/anaconda3/envs/CONSTAX/opt/constax-2.0.15-0/split_inputs.py", line 20, in with open(args.input, "r") as ifile: FileNotFoundError: [Errno 2] No such file or directory: '' /opt/anaconda3/envs/CONSTAX/opt/constax-2.0.15-0/constax_noinputs.sh: line 360: blastn: command not found rm: *.fasta: No such file or directory The operation couldn’t be completed. Unable to locate a Java Runtime. Please visit http://www.java.com for information on installing Java.

rm: : No such file or directory Traceback (most recent call last): File "/opt/anaconda3/envs/CONSTAX/opt/constax-2.0.15-0/CombineTaxonomy.py", line 15, in import pandas as pd ModuleNotFoundError: No module named 'pandas'

It is odd because outside of constax it is possible to run all of the mentioned modules.

This is the command I used: constax --num_threads 8 --db sh_general_release_dynamic_10.05.2021.fasta --trainfile training_files/ --input otus.fasta --tax taxonomy_assignements/ --output taxonomy_assignements/ --conf 0.8 --blast

And this is what the Terminal prints out:

Overwritting previous classification... Overwritting previous taxonomy assignments... Classifying without training... SINTAX executable does not match the executable used to generate the training files, if SINTAX error occurs, change your executable or use -t flag. Using the user-supplied pathfile at /opt/anaconda3/envs/CONSTAX/opt/constax-2.0.15-0/pathfile.txt All needed executables exist. SINTAX: vsearch RDP: classifier CONSTAX: /opt/anaconda3/envs/CONSTAX/opt/constax-2.0.15-0 Memory size: 32000mb


Assigning taxonomy to OTU's representative sequences Input FASTA:


Combining Taxonomies

In the end, there are empty blast files in the taxonomy_assignements folder. The rdp_train.out file says this: The operation couldn’t be completed. Unable to locate a Java Runtime.

So I guess the issue relates to my Apple M1 chip(?) Any solution to this?

liberjul commented 2 years ago

Hi @mtva0001, Thanks for reaching out with this issue. There's something wrong with how paths are being intepretted, but I don't currently have a fix to this. One possible but imperfect solution is to try installing constax in the base environment, but I understand if you don't want to do this. I'll try to get back to you soon with some potential fixes.

liberjul commented 2 years ago

Hi @mtva0001, I may have a fix, which involves changes to the constax_wrapper.py script. You can directly overwrite the script located at /opt/anaconda3/envs/CONSTAX/opt/constax-2.0.15-0/constax_wrapper.py with the newest pushed one, downloaded from https://raw.githubusercontent.com/liberjul/CONSTAXv2/master/constax_wrapper.py. Let me know if this works!

mtva0001 commented 2 years ago

Thanks a lot for your quick help! I did as you suggested, there is a difference this time but still not running properly:

Command I ran: constax --num_threads 8 --db sh_general_release_dynamic_10.05.2021.fasta --trainfile ./training_files/ --input otus.fasta --tax ./taxonomy_assignements/ --output ./taxonomy_assignements/ --conf 0.8 --blast --make_plot

The log file: usage: grep [-abcDEFGHhIiJLlmnOoqRSsUVvwxZ] [-A num] [-B num] [-C[num]] [-e pattern] [-f file] [--binary-files=value] [--color=when] [--context[=num]] [--directories=action] [--label] [--line-buffered] [--null] [pattern] [file ...] vsearch v2.16.0_macos_x86_64, 16.0GB RAM, 8 cores https://github.com/torognes/vsearch

Reading file ./training_files//sh_general_release_dynamic_10.05.2021__UTAX.fasta 100% 29627932 nt in 58440 seqs, min 140, max 4921, avg 507 Masking 100% Counting k-mers 100% Creating k-mer index 100% Writing UDB file 100% /opt/anaconda3/envs/CONSTAX/opt/constax-2.0.15-0/constax_no_inputs.sh: line 267: makeblastdb: command not found /opt/anaconda3/envs/CONSTAX/opt/constax-2.0.15-0/constax_no_inputs.sh: line 331: blastn: command not found Traceback (most recent call last): File "/opt/anaconda3/envs/CONSTAX/opt/constax-2.0.15-0/check_input_names.py", line 8, in import numpy as np ModuleNotFoundError: No module named 'numpy' vsearch v2.16.0_macos_x86_64, 16.0GB RAM, 8 cores https://github.com/torognes/vsearch

Reading UDB file ./training_files//sintax.db 100% Reorganizing data in memory 100% Creating bitmaps 100% Parsing abundances 100% 29627932 nt in 58440 seqs, min 140, max 4921, avg 507

Fatal error: Unable to open file for reading () Traceback (most recent call last): File "/opt/anaconda3/envs/CONSTAX/opt/constax-2.0.15-0/split_inputs.py", line 20, in with open(args.input, "r") as ifile: FileNotFoundError: [Errno 2] No such file or directory: '' /opt/anaconda3/envs/CONSTAX/opt/constax-2.0.15-0/constax_noinputs.sh: line 360: blastn: command not found rm: *.fasta: No such file or directory The operation couldn’t be completed. Unable to locate a Java Runtime. Please visit http://www.java.com for information on installing Java.

rm: : No such file or directory Traceback (most recent call last): File "/opt/anaconda3/envs/CONSTAX/opt/constax-2.0.15-0/CombineTaxonomy.py", line 15, in import pandas as pd ModuleNotFoundError: No module named 'pandas' Loading required package: ggplot2 Error in contrib.url(repos, type) : trying to use CRAN without setting a mirror Calls: install.packages -> startsWith -> contrib.url In addition: Warning message: In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, : there is no package called ‘ggplot2’ Execution halted

liberjul commented 2 years ago

Hi @mtva0001, I pushed new versions of the constax_wrapper.py and constax_no_inputs.sh scripts. At least one of your errors could be traced to OSX-incompatible grep -P commands which have been replaced. Try downloading and replacing these scripts in your /opt/anaconda3/envs/CONSTAX/opt/constax-2.0.15-0/ directory and let me know how it goes.

Also, it would be helpful to see the packages installed both inside and outside of your CONSTAX environment. You can print these with conda list, and upload them as files because they made be very long.

mtva0001 commented 2 years ago

Hi! Thanks again for your quick response! This time I got this:

vsearch v2.16.0_macos_x86_64, 16.0GB RAM, 8 cores https://github.com/torognes/vsearch

Reading file ./training_files//sh_general_release_dynamic_10.05.2021__UTAX.fasta 100% 29627932 nt in 58440 seqs, min 140, max 4921, avg 507 Masking 100% Counting k-mers 100% Creating k-mer index 100% Writing UDB file 100% /opt/anaconda3/envs/CONSTAX/opt/constax-2.0.15-0/constax_no_inputs.sh: line 267: makeblastdb: command not found /opt/anaconda3/envs/CONSTAX/opt/constax-2.0.15-0/constax_no_inputs.sh: line 331: blastn: command not found Traceback (most recent call last): File "/opt/anaconda3/envs/CONSTAX/opt/constax-2.0.15-0/check_input_names.py", line 8, in import numpy as np ModuleNotFoundError: No module named 'numpy' vsearch v2.16.0_macos_x86_64, 16.0GB RAM, 8 cores https://github.com/torognes/vsearch

Reading UDB file ./training_files//sintax.db 100% Reorganizing data in memory 100% Creating bitmaps 100% Parsing abundances 100% 29627932 nt in 58440 seqs, min 140, max 4921, avg 507

Fatal error: Unable to open file for reading () Traceback (most recent call last): File "/opt/anaconda3/envs/CONSTAX/opt/constax-2.0.15-0/split_inputs.py", line 20, in with open(args.input, "r") as ifile: FileNotFoundError: [Errno 2] No such file or directory: '' /opt/anaconda3/envs/CONSTAX/opt/constax-2.0.15-0/constax_noinputs.sh: line 360: blastn: command not found rm: *.fasta: No such file or directory The operation couldn’t be completed. Unable to locate a Java Runtime. Please visit http://www.java.com for information on installing Java.

rm: : No such file or directory Traceback (most recent call last): File "/opt/anaconda3/envs/CONSTAX/opt/constax-2.0.15-0/CombineTaxonomy.py", line 15, in import pandas as pd ModuleNotFoundError: No module named 'pandas' Loading required package: ggplot2 Error in contrib.url(repos, type) : trying to use CRAN without setting a mirror Calls: install.packages -> startsWith -> contrib.url In addition: Warning message: In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, : there is no package called ‘ggplot2’ Execution halted

Conda lists: conda_list.txt

conda_list_outside.txt

liberjul commented 2 years ago

Hello @mtva0001, I made an additional edit to the constax_wrapper.py script to hopefully fix the module load errors. I am still not sure why blastn, java, and other non-python commands fail, but will hopefully figure it out soon. I also update the ComparisonBars.R script to fix the package installation error. Update the script using the same link as above. https://github.com/liberjul/CONSTAXv2/issues/3#issuecomment-950051122

mtva0001 commented 2 years ago

Hi,

Sorry for the late reply. I just tried it but it gives me the same error: /opt/anaconda3/envs/CONSTAX/opt/constax-2.0.15-0/constax_no_inputs.sh: line 256: vsearch: command not found /opt/anaconda3/envs/CONSTAX/opt/constax-2.0.15-0/constax_no_inputs.sh: line 267: makeblastdb: command not found /opt/anaconda3/envs/CONSTAX/opt/constax-2.0.15-0/constax_no_inputs.sh: line 331: blastn: command not found Traceback (most recent call last): File "/opt/anaconda3/envs/CONSTAX/opt/constax-2.0.15-0/check_input_names.py", line 8, in import numpy as np ModuleNotFoundError: No module named 'numpy' /opt/anaconda3/envs/CONSTAX/opt/constax-2.0.15-0/constax_no_inputs.sh: line 343: vsearch: command not found sed: ./taxonomy_assignements//otu_taxonomy.sintax: No such file or directory Traceback (most recent call last): File "/opt/anaconda3/envs/CONSTAX/opt/constax-2.0.15-0/split_inputs.py", line 20, in with open(args.input, "r") as ifile: FileNotFoundError: [Errno 2] No such file or directory: '' /opt/anaconda3/envs/CONSTAX/opt/constax-2.0.15-0/constax_noinputs.sh: line 360: blastn: command not found rm: *.fasta: No such file or directory The operation couldn’t be completed. Unable to locate a Java Runtime. Please visit http://www.java.com for information on installing Java.

rm: : No such file or directory Traceback (most recent call last): File "/opt/anaconda3/envs/CONSTAX/opt/constax-2.0.15-0/CombineTaxonomy.py", line 15, in import pandas as pd ModuleNotFoundError: No module named 'pandas'

YingtongAamandaWu commented 1 year ago

Dear all,

I have a similar issue. I was running:

constax \ --num_threads 12 \ --mem 32000 \ --db /home/groups/fukamit/ytwu/Wu_CA_oak_pilot/02_Analysis/itsxpress_dada2_FungalTraits/sh_general_release_dynamic_29.11.2022.fasta \ --train \ --input /home/groups/fukamit/ytwu/Wu_CA_oak_pilot/02_Analysis/itsxpress_dada2_FungalTraits/ASV_nochim_fun.fasta \ --isolates /home/groups/fukamit/ytwu/Wu_CA_oak_pilot/02_Analysis/itsxpress_dada2_FungalTraits/sh_general_release_dynamic_29.11.2022.fasta \ --trainfile training_files/ \ --tax taxonomy_assignements/ \ --output taxonomy_assignements/ \ --conf 0.8 \ --blast \ --pathfile pathfile.txt

And here is the error from the log file: Traceback (most recent call last): File "/home/groups/fukamit/ytwu/software/miniconda3/envs/constax/opt/constax-2.0.18-0/CombineTaxonomy.py", line 576, in raise ValueError("Input file not in RDP format. Please Reformat As Below:\nOTU### Root rootrank 1.0 Fungi Kingdom 0.98 Zygomycota Phylum 0.05 Zygomycota_Incertaesedis Class 0.05 Mucorales Order 0.04 Syncephalastraceae Family 0.01 Fennellomyces Genus 0.01 Fennellomyces linderi Species 0.01") ValueError: Input file not in RDP format. Please Reformat As Below: OTU### _ Root rootrank 1.0 Fungi Kingdom 0.98 Zygomycota Phylum 0.05 Zygomycota_Incertae_sedis Class 0.05 Mucorales Order 0.04 Syncephalastraceae Family 0.01 Fennellomyces Genus 0.01 Fennellomyces linderi Species 0.01

I will greatly appreciate some help! Thank you in advance.

YingtongAamandaWu commented 1 year ago

image

This is how my fasta file looks like. Is the problem caused by how I name them? Should I change all of them to OTU_xx?

liberjul commented 1 year ago

HI @YingtongAamandaWu ,

I don't believe the sequence headers are the issues, but instead the RDP classification output may not be consistent with the expected format. Can you upload taxonomy_assignements/otu_taxonomy.rdp?

YingtongAamandaWu commented 1 year ago

@liberjul

Thanks for the timely response. From my side, the otu_taxonomy.rdp is an empty file: image. I am sending the whole taxonomy_assignements folder and the log file, so that you can check the details. Thank you again! BTW, rdp_train.out is also an empty file from my side. taxonomy_assignements.zip log_constax2_2022-12-30_13-01-03.txt

liberjul commented 1 year ago

Hi @YingtongAamandaWu,

It appears that one of the files produced by RDP when training was not present at the time of classification. You will need to retrain the classifier, using -t/--train. It is possible that if you trained the classifiers earlier the RDP training failed, usually due to not enough memory. I have not yet trained on the newest release so I am unsure of the memory requirement, but I would estimate that 64 GB would be sufficient.

YingtongAamandaWu commented 1 year ago

Yes, that was exactly why. I used 8GB at first, and then I used 128GB and it ran. Thank you for helping!