liberjul / CONSTAXv2

MIT License
7 stars 2 forks source link

issue with constax :( #15

Open mashalcopperman opened 3 months ago

mashalcopperman commented 3 months ago

hi there, I'm having an issue I hope you can help with. the checks were okay, but there is an error during the training process. I attached the log file,(let me know if you can access that alright).

log_constax2_2024-03-18_13-13-54.txt

(constax2) [copperm2@dev-amd20 Cecilia]$ constax -d /mnt/home/copperm2/Databases/sh_general_release_dynamic_s_all_25.07.2023.fasta -t -f /mnt/home/copperm2/Databases/trainfiles/UNITE --mem 128 -n 16 -i outputs/14_constax_euk/train.fasta -b Welcome to CONSTAX version 2.0.19 build 0 - The CONSensus TAXonomy classifier This software is distributed under MIT License © Copyright 2022, Julian A. Liber, Gian M. N. Benucci & Gregory M. Bonito https://github.com/liberjul/CONSTAXv2 https://constax.readthedocs.io/

Please cite us as: CONSTAX2: Improved taxonomic classification of environmental DNA markers Julian Aaron Liber, Gregory Bonito, Gian Maria Niccolò Benucci Bioinformatics, Volume 37, Issue 21, 1 November 2021, Pages 3941–3943; doi: https://doi.org/10.1093/bioinformatics/btab347 Overwriting previous classification... Performing training and overwriting training files... Using the user-supplied pathfile at /mnt/home/copperm2/miniconda3/envs/constax2/opt/constax-2.0.19-0/pathfile.txt All needed executables exist. SINTAX: vsearch RDP: classifier CONSTAX: /mnt/home/copperm2/miniconda3/envs/constax2/opt/constax-2.0.19-0 python /mnt/home/copperm2/miniconda3/envs/constax2/opt/constax-2.0.19-0/detect_format.py -d /mnt/home/copperm2/Databases/sh_general_release_dynamic_s_all_25.07.2023.fasta 2>&1 UNITE Memory size: 128mb python /mnt/home/copperm2/miniconda3/envs/constax2/opt/constax-2.0.19-0/FormatRefDB.py -d /mnt/home/copperm2/Databases/sh_general_release_dynamic_s_all_25.07.2023.fasta -t /mnt/home/copperm2/Databases/trainfiles/UNITE -f UNITE -p /mnt/home/copperm2/miniconda3/envs/constax2/opt/constax-2.0.19-0 Importing subscripts from /mnt/home/copperm2/miniconda3/envs/constax2/opt/constax-2.0.19-0


Reformatting database

UNITE format detected

Reference database FASTAs formatted in 2.970266915 seconds...

Training Taxonomy

Adding Full Lineage

Database formatting complete



Training SINTAX Classifier vsearch -makeudb_usearch /mnt/home/copperm2/Databases/trainfiles/UNITE/sh_general_release_dynamic_s_all_25.07.2023UTAX.fasta -output /mnt/home/copperm2/Databases/trainfiles/UNITE/sintax.db ^[____ Training BLAST Classifier makeblastdb -in /mnt/home/copperm2/Databases/trainfiles/UNITE/sh_general_release_dynamic_s_all_25.07.2023__RDP_trained.fasta -dbtype nucl -out /mnt/home/copperm2/Databases/trainfiles/UNITE/sh_general_release_dynamic_s_all_25.07.2023__BLAST

Building a new DB, current time: 03/18/2024 13:17:43 New DB name: /mnt/home/copperm2/Databases/trainfiles/UNITE/sh_general_release_dynamic_s_all_25.07.2023BLAST New DB title: /mnt/home/copperm2/Databases/trainfiles/UNITE/sh_general_release_dynamic_s_all_25.07.2023RDP_trained.fasta Sequence type: Nucleotide Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 326727 sequences in 80.4233 seconds.


Training RDP Classifier classifier train -o /mnt/home/copperm2/Databases/trainfiles/UNITE/. -s /mnt/home/copperm2/Databases/trainfiles/UNITE/sh_general_release_dynamic_s_all_25.07.2023__RDP_trained.fasta -t /mnt/home/copperm2/Databases/trainfiles/UNITE/sh_general_release_dynamic_s_all_25.07.2023__RDP_taxonomy_trained.txt -Xmx > rdp_train.out 2>&1 cp /mnt/home/copperm2/miniconda3/envs/constax2/opt/constax-2.0.19-0/rRNAClassifier.properties /mnt/home/copperm2/Databases/trainfiles/UNITE/


Assigning taxonomy to OTU's representative sequences python /mnt/home/copperm2/miniconda3/envs/constax2/opt/constax-2.0.19-0/check_input_names.py -i outputs/14_constax_euk/train.fasta vsearch -sintax -db /mnt/home/copperm2/Databases/trainfiles/UNITE/sintax.db -tabbedout ./taxonomy_assignments/otu_taxonomy.sintax -strand both -sintax_cutoff 0.8 -threads 16 sed -i'' -e 's|([0-1][.][0-9]{2}|&00|g' ./taxonomy_assignments/otu_taxonomy.sintax python /mnt/home/copperm2/miniconda3/envs/constax2/opt/constax-2.0.19-0/split_inputs.py -i Input FASTA:

./taxonomyassignments/blast.out blastn -query *.fasta -db /mnt/home/copperm2/Databases/trainfiles/UNITE/ -num_threads 16 -outfmt 7 qacc sacc evalue bitscore pident qcovs -max_target_seqs 10 >> ./taxonomy_assignments/blast.out python /mnt/home/copperm2/miniconda3/envs/constax2/opt/constax-2.0.19-0/blast_to_df.py -i ./taxonomy_assignments/blast.out -o ./taxonomy_assignments/otu_taxonomy.blast -d /mnt/home/copperm2/Databases/sh_general_release_dynamic_s_all_25.07.2023.fasta -t /mnt/home/copperm2/Databases/trainfiles/UNITE -f UNITE classifier classify --conf 0.8 --format allrank --train_propfile /mnt/home/copperm2/Databases/trainfiles/UNITE/rRNAClassifier.properties -o ./taxonomy_assignments/otu_taxonomy.rdp -Xmx Command Error: Failed to find input file "" usage: [options] [,idmappingfile] ... -b,--bootstrap_outfile the output file containing the number of matching assignments out of 100 bootstraps for major ranks. Default is null -c,--conf assignment confidence cutoff used to determine the assignment count for each taxon. Range [0-1], Default is 0.8. -d,--metadata the tab delimited metadata file for the samples, with first row containing attribute name and first column containing the sample name -f,--format tab-delimited output format: [allrank|fixrank|biom|filterbyconf|db]. Default is allRank. allrank: outputs the results for all ranks applied for each sequence: seqname, orientation, taxon name, rank, conf, ... fixrank: only outputs the results for fixed ranks in order: domain, phylum, class, order, family, genus biom: outputs rich dense biom format if OTU or metadata provided filterbyconf: only outputs the results for major ranks as in fixrank, results below the confidence cutoff were bin to a higher rank unclassified_node db: outputs the seqname, trainset_no, tax_id, conf. -g,--gene 16srrna, fungallsu, fungalits_warcup, fungalits_unite. Default is 16srrna. This option can be overwritten by -t option -h,--hier_outfile tab-delimited output file containing the assignment count for each taxon in the hierarchical format. Default is null. -m,--biomFile the input clluster biom file. The classification result will replace the taxonomy of the corresponding cluster id. -o,--outputFile tab-delimited text output file for classification assignment. -q,--queryFile legacy option, no longer needed -s,--shortseq_outfile the output file containing the sequence names that are too short to be classified -t,--train_propfile property file containing the mapping of the training files if not using the default. Note: the training files and the property file should be in the same directory. -w,--minWords minimum number of words for each bootstrap trial. Default(maximum) is 1/8 of the words of each sequence. Minimum is 5


Combining Taxonomies python /mnt/home/copperm2/miniconda3/envs/constax2/opt/constax-2.0.19-0/CombineTaxonomy.py -c 0.8 -o ./outputs/ -x ./taxonomy_assignments/ -b -e 1.0 -m 10 -p 0.0 -f UNITE -d /mnt/home/copperm2/Databases/sh_general_release_dynamic_s_all_25.07.2023.fasta -t /mnt/home/copperm2/Databases/trainfiles/UNITE -i False --hl null --iso_qc 75 --iso_id 1 --hl_qc 75 --hl_id 1 -s false -n false


packages in environment at /mnt/home/copperm2/miniconda3/envs/constax2:

#

Name Version Build Channel

_libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_gnu conda-forge blast 2.5.0 hc0b0e79_3 bioconda boost 1.80.0 py311h59ea3da_4 conda-forge boost-cpp 1.80.0 h75c5d50_0 conda-forge bzip2 1.0.8 h7f98852_4 conda-forge ca-certificates 2022.12.7 ha878542_0 conda-forge constax 2.0.19 pyhdfd78af_0 bioconda icu 70.1 h27087fc_0 conda-forge ld_impl_linux-64 2.39 hcc3a1bd_1 conda-forge libblas 3.9.0 16_linux64_openblas conda-forge libcblas 3.9.0 16_linux64_openblas conda-forge libffi 3.4.2 h7f98852_5 conda-forge libgcc-ng 12.2.0 h65d4601_19 conda-forge libgfortran-ng 12.2.0 h69a702a_19 conda-forge libgfortran5 12.2.0 h337968e_19 conda-forge libgomp 12.2.0 h65d4601_19 conda-forge liblapack 3.9.0 16_linux64_openblas conda-forge libnsl 2.0.0 h7f98852_0 conda-forge libopenblas 0.3.21 pthreads_h78a6416_3 conda-forge libsqlite 3.40.0 h753d276_0 conda-forge libstdcxx-ng 12.2.0 h46fd767_19 conda-forge libuuid 2.32.1 h7f98852_1000 conda-forge libzlib 1.2.13 h166bdaf_4 conda-forge ncurses 6.3 h27087fc_1 conda-forge numpy 1.24.1 pypi_0 pypi openjdk 8.0.332 h166bdaf_0 conda-forge openssl 3.0.7 h0b41bf4_1 conda-forge pandas 1.5.2 pypi_0 pypi pip 22.3.1 pyhd8ed1ab_0 conda-forge python 3.11.0 he550d4f_1_cpython conda-forge python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge python-tzdata 2024.1 pyhd8ed1ab_0 conda-forge python_abi 3.11 3_cp311 conda-forge pytz 2022.7.1 pyhd8ed1ab_0 conda-forge rdptools 2.0.3 hdfd78af_1 bioconda readline 8.1.2 h0f457ee_0 conda-forge setuptools 66.0.0 pyhd8ed1ab_0 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge tk 8.6.12 h27826a3_0 conda-forge tzdata 2022g h191b570_0 conda-forge vsearch 2.22.1 hf1761c0_0 bioconda wheel 0.38.4 pyhd8ed1ab_0 conda-forge xz 5.2.6 h166bdaf_0 conda-forge zlib 1.2.13 h166bdaf_4 conda-forge zstd 1.5.2 h6239696_4 conda-forge


(constax2) [copperm2@dev-amd20 Cecilia]$ constax -d /mnt/home/copperm2/Databases/sh_general_release_dynamic_s_all_25.07.2023.fasta -t -f /mnt/home/copperm2/Databases/trainfiles/UNITE --mem 128 --check -i outputs/14_constax_euk/train.fasta -b Welcome to CONSTAX version 2.0.19 build 0 - The CONSensus TAXonomy classifier This software is distributed under MIT License © Copyright 2022, Julian A. Liber, Gian M. N. Benucci & Gregory M. Bonito https://github.com/liberjul/CONSTAXv2 https://constax.readthedocs.io/

Please cite us as: CONSTAX2: Improved taxonomic classification of environmental DNA markers Julian Aaron Liber, Gregory Bonito, Gian Maria Niccolò Benucci Bioinformatics, Volume 37, Issue 21, 1 November 2021, Pages 3941–3943; doi: https://doi.org/10.1093/bioinformatics/btab347 Overwriting previous classification... Performing training and overwriting training files... Using the user-supplied pathfile at /mnt/home/copperm2/miniconda3/envs/constax2/opt/constax-2.0.19-0/pathfile.txt All needed executables exist. SINTAX: vsearch RDP: classifier CONSTAX: /mnt/home/copperm2/miniconda3/envs/constax2/opt/constax-2.0.19-0 python /mnt/home/copperm2/miniconda3/envs/constax2/opt/constax-2.0.19-0/detect_format.py -d /mnt/home/copperm2/Databases/sh_general_release_dynamic_s_all_25.07.2023.fasta 2>&1 UNITE Memory size: 128mb All checks passed, rerun without --check flag.

liberjul commented 3 months ago

Hi @mashalcopperman,

The issue I see in the log file is that the inputs didn't get formatted corrected due to the python module numpy and pandas being missing. You should be able to just install these on the command line with the right conda environment activated:

pip install numpy pandas

When you run CONSTAX again, you can remove the -t/--train flag, given that the training was successful.

I hope that helps,

Julian

mashalcopperman commented 3 months ago

hi julian, i installed numpy and pandas via pip. i got the same error... is there any way you can look at this with me?

On Mon, Mar 18, 2024 at 2:28 PM Julian Liber @.***> wrote:

Hi @mashalcopperman https://github.com/mashalcopperman,

The issue I see in the log file is that the inputs didn't get formatted corrected due to the python module numpy and pandas being missing. You should be able to just install these on the command line with the right conda environment activated:

pip install numpy pandas

When you run CONSTAX again, you can remove the -t/--train flag, given that the training was successful.

I hope that helps,

Julian

— Reply to this email directly, view it on GitHub https://github.com/liberjul/CONSTAXv2/issues/15#issuecomment-2004644119, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5TSC3M7WLECWMPATC7B66LYY4W4BAVCNFSM6AAAAABE4BZIF6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBUGY2DIMJRHE . You are receiving this because you were mentioned.Message ID: @.***>

-- Sincerely,

Mashal Rahmati

"Hold the Vision, Trust the Process"