jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis
GNU General Public License v3.0
380 stars 80 forks source link

configure_nodb.pl not working as expected #415

Closed nucleoli2 closed 2 years ago

nucleoli2 commented 2 years ago

Fresh install of Squeezemeta 1.5.0.

Attempted to use configure_nodb.pl from a running 1.5.0 installation on another machine to avoid entire DB download (data cap).

configure_nodb.pl failed...

Repeated download_databases.pl, which was successful.

This is just fyi as everything is working on the two machines (different locations) but it would be good to have configure_nodb.pl working....

Thanks!!

Best regards.

Details: (My comments/explanations prefaced with *****)

*****Used defaults for the SqueezeMeta 1.5.0 conda/mamba install on this machine, so is 1.5.0post3 as measured by:

conda search -c fpusan squeezemeta Loading channels: done Name Version Build Channel
squeezemeta 1.1.0rc1 pl526r36_0 fpusan
squeezemeta 1.1.0rc2 pl526r36_0 fpusan
squeezemeta 1.1.0rc3 pl526r36_0 fpusan
squeezemeta 1.1.0 pl526r36_0 fpusan
squeezemeta 1.1.1 pl526r36_0 fpusan
squeezemeta 1.2.0 pl526r36_0 fpusan
squeezemeta 1.3.0 pl526r36_0 fpusan
squeezemeta 1.3.1 pl526r36_0 fpusan
squeezemeta 1.4.0 pl526r36_0 fpusan
squeezemeta 1.5.0 py36pl5262r36h7b7c402_0 fpusan
squeezemeta 1.5.0post1 py36pl5262r36h7b7c402_0 fpusan
squeezemeta 1.5.0post2 py36pl5262r36h7b7c402_0 fpusan
squeezemeta 1.5.0post3 py36pl5262r36h7b7c402_0 fpusan

**Use the previously downloaded databases from another machine which were downloaded 20220111.

**(and... yes, nr.dmnd, etc, etc, was present in target directory BEFORE running configure_nodb.pl)

configure_nodb.pl /bio1/cer/SqueezeMeta150_Databases Make sure that /bio1/cer/SqueezeMeta150_Databases contains all the database files (nr.dmnd, etc...)

Downloading and unpacking RDP classifier... --2022-01-17 15:35:24-- http://silvani.cnb.csic.es/SqueezeMeta//classifier.tar.gz Resolving silvani.cnb.csic.es (silvani.cnb.csic.es)... 150.244.82.96 Connecting to silvani.cnb.csic.es (silvani.cnb.csic.es)|150.244.82.96|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 154722472 (148M) [application/x-gzip] Saving to: ‘/home/cer/anaconda3/envs/SQM150/SqueezeMeta/lib/classifier.tar.gz’

100%[=======================================================================================================================================================================================================================================>] 154,722,472 1.91MB/s in 88s

2022-01-17 15:36:53 (1.67 MB/s) - ‘/home/cer/anaconda3/envs/SQM150/SqueezeMeta/lib/classifier.tar.gz’ saved [154722472/154722472]

--2022-01-17 15:36:53-- http://silvani.cnb.csic.es/SqueezeMeta//classifier.md5 Resolving silvani.cnb.csic.es (silvani.cnb.csic.es)... 150.244.82.96 Connecting to silvani.cnb.csic.es (silvani.cnb.csic.es)|150.244.82.96|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 33 [application/x-md5] Saving to: ‘/home/cer/anaconda3/envs/SQM150/SqueezeMeta/lib/classifier.md5’

100%[=======================================================================================================================================================================================================================================>] 33 --.-K/s in 0s

2022-01-17 15:36:53 (2.77 MB/s) - ‘/home/cer/anaconda3/envs/SQM150/SqueezeMeta/lib/classifier.md5’ saved [33/33]

classifier/ classifier/classifier.jar classifier/lib/ classifier/lib/TaxonomyTree.jar classifier/lib/commons-cli-1.2.jar classifier/lib/ReadSeq.jar classifier/lib/jfreechart-1.0.13.jar classifier/lib/AlignmentTools.jar classifier/lib/jcommon-1.0.16.jar classifier/lib/commons-io-2.4.jar

Updating configuration... Done

*****Now check this install:

test_install.pl

Checking the OS linux OK

Checking that tree is installed tree --help OK

Checking that ruby is installed ruby -h OK

Checking that java is installed java -h OK

Checking that all the required perl libraries are available in this environment perl -e 'use Term::ANSIColor' OK perl -e 'use DBI' OK perl -e 'use DBD::SQLite::Constants' OK perl -e 'use Time::Seconds' OK perl -e 'use Tie::IxHash' OK perl -e 'use Linux::MemInfo' OK perl -e 'use Getopt::Long' OK perl -e 'use File::Basename' OK perl -e 'use DBD::SQLite' OK perl -e 'use Data::Dumper' OK perl -e 'use Cwd' OK perl -e 'use XML::LibXML' OK perl -e 'use XML::Parser' OK perl -e 'use Term::ANSIColor' OK

Checking that all the required python libraries are available in this environment python3 -h OK python3 -c 'import numpy' OK python3 -c 'import scipy' OK python3 -c 'import matplotlib' OK python3 -c 'import dendropy' OK python3 -c 'import pysam' OK python3 -c 'import Bio.Seq' OK python3 -c 'import pandas' OK python3 -c 'import sklearn' OK python3 -c 'import nose' OK python3 -c 'import cython' OK python3 -c 'import future' OK

Checking that all the required R libraries are available in this environment R -h OK R -e 'library(doMC)' OK R -e 'library(ggplot2)' OK R -e 'library(data.table)' OK R -e 'library(reshape2)' OK R -e 'library(pathview)' OK R -e 'library(DASTool)' OK R -e 'library(SQMtools)' OK

Checking that SqueezeMeta is properly configured... checking database in /bio1/cer/SqueezeMeta150_Databases SqueezeMeta_conf.pl says that databases are located in /bio1/cer/SqueezeMeta150_Databases but we can't find nr.db there, or it is corrupted


WARNING: Some SqueezeMeta dependencies could not be found in your environment! SqueezeMeta_conf.pl says that databases are located in /bio1/cer/SqueezeMeta150_Databases but we can't find nr.db there, or it is corrupted

*****?? Database directory contents seems to match running SqueezeMeta 1.5.0 machine... so just try SqueezeMeta 1.5.0 with Hadza:

SqueezeMeta.pl -m coassembly -p Hadza -s test.samples -f raw -t 40

SqueezeMeta v1.5.0, Dec 2021 - (c) J. Tamames, F. Puente-Sánchez CNB-CSIC, Madrid, SPAIN

Please cite: Tamames & Puente-Sanchez, Frontiers in Microbiology 9, 3349 (2019). doi: https://doi.org/10.3389/fmicb.2018.03349

Run started Mon Jan 17 15:44:48 2022 in coassembly mode Now creating directories Reading configuration from /bio1/cer/SqueezeMeta150_Databases/test/Hadza/SqueezeMeta_conf.pl Reading samples from /bio1/cer/SqueezeMeta150_Databases/test/Hadza/data/00.Hadza.samples 2 samples found: SRR1927149 SRR1929485

Now merging files [5 seconds]: STEP1 -> RUNNING CO-ASSEMBLY: 01.run_assembly.pl (megahit) Running assembly with megahit Running prinseq (Schmieder et al 2011, Bioinformatics 27(6):863-4) for selecting contigs longer than 200 Input and filter stats: Input sequences: 241,008 Input bases: 212,125,857 Input mean length: 880.16 Good sequences: 241,008 (100.00%) Good bases: 212,125,857 Good mean length: 880.16 Bad sequences: 0 (0.00%) Sequences filtered by specified parameters: none Renaming contigs Counting length of contigs Contigs stored in /bio1/cer/SqueezeMeta150_Databases/test/Hadza/results/01.Hadza.fasta Number of contigs: 241008 [28 minutes, 45 seconds]: STEP2 -> RNA PREDICTION: 02.rnas.pl Running barrnap (Seeman 2014, Bioinformatics 30, 2068-9) for predicting RNAs: Bacteria[16:13:33] Can't find database: /bio1/cer/SqueezeMeta150_Databases/bac.hmm Error running command: /home/cer/anaconda3/envs/SQM150/SqueezeMeta/bin/barrnap --quiet --threads 40 --kingdom bac --reject 0.1 /bio1/cer/SqueezeMeta150_Databases/test/Hadza/intermediate/02.Hadza.maskedrna.fasta --dbdir /bio1/cer/SqueezeMeta150_Databases > /bio1/cer/SqueezeMeta150_Databases/test/Hadza/temp/bac.gff at /home/cer/anaconda3/envs/SQM150/SqueezeMeta/scripts/02.rnas.pl line 54. Stopping in STEP2 -> 02.rnas.pl. Program finished abnormally

If you don't know what went wrong or want further advice, please look for similar issues in https://github.com/jtamames/SqueezeMeta/issues Feel free to open a new issue if you don't find the answer there. Please add a brief description of the problem and upload the /bio1/cer/SqueezeMeta150_Databases/test/Hadza/syslog file (zip it first)

*****NOPE... so download SqueezeMeta databases to this machine from scratch:

download_databases.pl /bio1/cer/SqueezeMetat150_Databases_rabbit

Downloading and unpacking test data...

--2022-01-17 16:26:47-- http://silvani.cnb.csic.es/SqueezeMeta//test.tar.gz Resolving silvani.cnb.csic.es (silvani.cnb.csic.es)... 150.244.82.96 Connecting to silvani.cnb.csic.es (silvani.cnb.csic.es)|150.244.82.96|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 3903775708 (3.6G) [application/x-gzip] Saving to: ‘/bio1/cer/SqueezeMetat150_Databases_rabbit/test.tar.gz’

100%[=====================================================================================================================================================================================================================================>] 3,903,775,708 1.53MB/s in 30m 10s

2022-01-17 16:56:57 (2.06 MB/s) - ‘/bio1/cer/SqueezeMetat150_Databases_rabbit/test.tar.gz’ saved [3903775708/3903775708]

--2022-01-17 16:56:57-- http://silvani.cnb.csic.es/SqueezeMeta//test.md5 Resolving silvani.cnb.csic.es (silvani.cnb.csic.es)... 150.244.82.96 Connecting to silvani.cnb.csic.es (silvani.cnb.csic.es)|150.244.82.96|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 33 [application/x-md5] Saving to: ‘/bio1/cer/SqueezeMetat150_Databases_rabbit/test.md5’

100%[=======================================================================================================================================================================================================================================>] 33 --.-K/s in 0s

2022-01-17 16:56:58 (1.92 MB/s) - ‘/bio1/cer/SqueezeMetat150_Databases_rabbit/test.md5’ saved [33/33]

test/ test/test.samples test/raw/ test/raw/SRR1927149_2.fastq.gz test/raw/SRR1929485_1.fastq.gz test/raw/SRR1929485_2.fastq.gz test/raw/SRR1927149_1.fastq.gz Downloading and unpacking database tarball... --2022-01-17 16:57:29-- http://silvani.cnb.csic.es/SqueezeMeta//SqueezeMetaDB.tar.gz Resolving silvani.cnb.csic.es (silvani.cnb.csic.es)... 150.244.82.96 Connecting to silvani.cnb.csic.es (silvani.cnb.csic.es)|150.244.82.96|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 153413390559 (143G) [application/x-gzip] Saving to: ‘/bio1/cer/SqueezeMetat150_Databases_rabbit/SqueezeMetaDB.tar.gz’

100%[===================================================================================================================================================================================================================================>] 153,413,390,559 19.1MB/s in 2h 9m

2022-01-17 19:07:25 (18.8 MB/s) - ‘/bio1/cer/SqueezeMetat150_Databases_rabbit/SqueezeMetaDB.tar.gz’ saved [153413390559/153413390559]

--2022-01-17 19:07:25-- http://silvani.cnb.csic.es/SqueezeMeta//SqueezeMetaDB.md5 Resolving silvani.cnb.csic.es (silvani.cnb.csic.es)... 150.244.82.96 Connecting to silvani.cnb.csic.es (silvani.cnb.csic.es)|150.244.82.96|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 33 [application/x-md5] Saving to: ‘/bio1/cer/SqueezeMetat150_Databases_rabbit/SqueezeMetaDB.md5’

100%[=======================================================================================================================================================================================================================================>] 33 --.-K/s in 0s

2022-01-17 19:07:26 (2.35 MB/s) - ‘/bio1/cer/SqueezeMetat150_Databases_rabbit/SqueezeMetaDB.md5’ saved [33/33]

db/ db/img/ db/img/img_metadata.tsv db/mito.hmm db/pfam/ db/pfam/Pfam-A.hmm.dat db/pfam/tigrfam2pfam.tsv db/test_data/ db/test_data/637000110.fna db/distributions/ db/distributions/gc_dist.txt db/distributions/td_dist.txt db/distributions/cd_dist.txt db/bac.scg.lookup db/arc.scg.faa db/genome_tree/ db/genome_tree/missing_duplicate_genes_50.tsv db/genome_tree/genome_tree_full.refpkg/ db/genome_tree/genome_tree_full.refpkg/CONTENTS.json db/genome_tree/genome_tree_full.refpkg/genome_tree.tre db/genome_tree/genome_tree_full.refpkg/genome_tree.log db/genome_tree/genome_tree_full.refpkg/genome_tree.fasta db/genome_tree/genome_tree_full.refpkg/phylo_modelEcOyPk.json db/genome_tree/genome_tree_reduced.refpkg/ db/genome_tree/genome_tree_reduced.refpkg/CONTENTS.json db/genome_tree/genome_tree_reduced.refpkg/genome_tree.tre db/genome_tree/genome_tree_reduced.refpkg/genome_tree.log db/genome_tree/genome_tree_reduced.refpkg/genome_tree.fasta db/genome_tree/genome_tree_reduced.refpkg/phylomodelJqWx6.json db/genome_tree/genome_tree.taxonomy.tsv db/genome_tree/genome_tree.metadata.tsv db/genome_tree/genome_tree.derep.txt db/genome_tree/missing_duplicate_genes_97.tsv db/arc.all.faa db/euk.hmm db/taxon_marker_sets.tsv db/arc.hmm db/hmms/ db/hmms/checkm.hmm.ssi db/hmms/phylo.hmm.ssi db/hmms/phylo.hmm db/hmms/checkm.hmm db/bac.scg.faa db/selected_marker_sets.tsv db/arc.scg.lookup db/hmms_ssu/ db/hmms_ssu/createHMMs.py db/hmms_ssu/SSU_archaea.hmm db/hmms_ssu/SSU_euk.hmm db/hmms_ssu/SSU_bacteria.hmm db/marker.hmm db/ReadMe db/.dmanifest db/bac.all.faa db/bacar_marker.hmm db/bac.hmm db/silva.nr_v132.align.md5 db/silva.nr_v132.align db/silva.nr_v132.tax.md5 db/silva.nr_v132.tax db/kegg.db.md5 db/keggdb.dmnd db/nr.dmnd db/nr.md5 db/eggnog.dmnd db/Pfam-A.hmm db/LCA_tax/ db/LCA_tax/parents.txt db/LCA_tax/taxid.db db/LCA_tax/taxid.md5 db/LCA_tax/parents.db db/DB_BUILD_DATE Make sure that /bio1/cer/SqueezeMetat150_Databases_rabbit/db contains all the database files (nr.dmnd, etc...)

Downloading and unpacking RDP classifier... --2022-01-17 19:58:44-- http://silvani.cnb.csic.es/SqueezeMeta//classifier.tar.gz Resolving silvani.cnb.csic.es (silvani.cnb.csic.es)... 150.244.82.96 Connecting to silvani.cnb.csic.es (silvani.cnb.csic.es)|150.244.82.96|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 154722472 (148M) [application/x-gzip] Saving to: ‘/home/cer/anaconda3/envs/SQM150/SqueezeMeta/lib/classifier.tar.gz’

100%[=======================================================================================================================================================================================================================================>] 154,722,472 8.89MB/s in 30s

2022-01-17 19:59:15 (4.86 MB/s) - ‘/home/cer/anaconda3/envs/SQM150/SqueezeMeta/lib/classifier.tar.gz’ saved [154722472/154722472]

--2022-01-17 19:59:15-- http://silvani.cnb.csic.es/SqueezeMeta//classifier.md5 Resolving silvani.cnb.csic.es (silvani.cnb.csic.es)... 150.244.82.96 Connecting to silvani.cnb.csic.es (silvani.cnb.csic.es)|150.244.82.96|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 33 [application/x-md5] Saving to: ‘/home/cer/anaconda3/envs/SQM150/SqueezeMeta/lib/classifier.md5’

100%[=======================================================================================================================================================================================================================================>] 33 --.-K/s in 0s

2022-01-17 19:59:15 (2.75 MB/s) - ‘/home/cer/anaconda3/envs/SQM150/SqueezeMeta/lib/classifier.md5’ saved [33/33]

classifier/ classifier/classifier.jar classifier/lib/ classifier/lib/TaxonomyTree.jar classifier/lib/commons-cli-1.2.jar classifier/lib/ReadSeq.jar classifier/lib/jfreechart-1.0.13.jar classifier/lib/AlignmentTools.jar classifier/lib/jcommon-1.0.16.jar classifier/lib/commons-io-2.4.jar

Updating configuration... Done

*****Test the download on this machine:

test_install.pl

Checking the OS linux OK

Checking that tree is installed tree --help OK

Checking that ruby is installed ruby -h OK

Checking that java is installed java -h OK

Checking that all the required perl libraries are available in this environment perl -e 'use Term::ANSIColor' OK perl -e 'use DBI' OK perl -e 'use DBD::SQLite::Constants' OK perl -e 'use Time::Seconds' OK perl -e 'use Tie::IxHash' OK perl -e 'use Linux::MemInfo' OK perl -e 'use Getopt::Long' OK perl -e 'use File::Basename' OK perl -e 'use DBD::SQLite' OK perl -e 'use Data::Dumper' OK perl -e 'use Cwd' OK perl -e 'use XML::LibXML' OK perl -e 'use XML::Parser' OK perl -e 'use Term::ANSIColor' OK

Checking that all the required python libraries are available in this environment python3 -h OK python3 -c 'import numpy' OK python3 -c 'import scipy' OK python3 -c 'import matplotlib' OK python3 -c 'import dendropy' OK python3 -c 'import pysam' OK python3 -c 'import Bio.Seq' OK python3 -c 'import pandas' OK python3 -c 'import sklearn' OK python3 -c 'import nose' OK python3 -c 'import cython' OK python3 -c 'import future' OK

Checking that all the required R libraries are available in this environment R -h OK R -e 'library(doMC)' OK R -e 'library(ggplot2)' OK R -e 'library(data.table)' OK R -e 'library(reshape2)' OK R -e 'library(pathview)' OK R -e 'library(DASTool)' OK R -e 'library(SQMtools)' OK

Checking that SqueezeMeta is properly configured... checking database in /bio1/cer/SqueezeMetat150_Databases_rabbit/db nr.db OK CheckM manifest OK LCA_tax DB OK

*****Ok... re-run Hadza on this machine in the 1.5.0 conda environment

SqueezeMeta.pl -m coassembly -p Hadza -s test.samples -f raw -t 40 SqueezeMeta v1.5.0, Dec 2021 - (c) J. Tamames, F. Puente-Sánchez CNB-CSIC, Madrid, SPAIN Please cite: Tamames & Puente-Sanchez, Frontiers in Microbiology 9, 3349 (2019). doi: https://doi.org/10.3389/fmicb.2018.03349 Run started Mon Jan 17 20:09:00 2022 in coassembly mode Now creating directories Reading configuration from /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/SqueezeMeta_conf.pl Reading samples from /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/data/00.Hadza.samples 2 samples found: SRR1929485 SRR1927149 Now merging files [6 seconds]: STEP1 -> RUNNING CO-ASSEMBLY: 01.run_assembly.pl (megahit) Running assembly with megahit Running prinseq (Schmieder et al 2011, Bioinformatics 27(6):863-4) for selecting contigs longer than 200 Input and filter stats: Input sequences: 241,008 Input bases: 212,125,857 Input mean length: 880.16 Good sequences: 241,008 (100.00%) Good bases: 212,125,857 Good mean length: 880.16 Bad sequences: 0 (0.00%) Sequences filtered by specified parameters: none Renaming contigs Counting length of contigs Contigs stored in /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/results/01.Hadza.fasta Number of contigs: 241008 [29 minutes, 43 seconds]: STEP2 -> RNA PREDICTION: 02.rnas.pl Running barrnap (Seeman 2014, Bioinformatics 30, 2068-9) for predicting RNAs: Bacteria Archaea Eukaryote Mitochondrial Running RDP classifier (Wang et al 2007, Appl Environ Microbiol 73, 5261-7) Running Aragorn (Laslett & Canback 2004, Nucleic Acids Res 31, 11-16) for tRNA/tmRNA prediction [31 minutes, 31 seconds]: STEP3 -> ORF PREDICTION: 03.run_prodigal.pl Running prodigal (Hyatt et al 2010, BMC Bioinformatics 11: 119) for predicting ORFs ORFs predicted: 382157 [47 minutes, 20 seconds]: STEP4 -> HOMOLOGY SEARCHES: 04.rundiamond.pl Setting block size for Diamond AVAILABLE (free) RAM memory: 483.47 Gb We will set Diamond block size to 16 (Gb RAM/8, Max 16). You can override this setting using the -b option when starting the project, or changing the $blocksize variable in SqueezeMeta_conf.pl taxa COGS Running Diamond (Buchfink et al 2015, Nat Methods 12, 59-60) for KEGG [3 hours, 11 minutes, 58 seconds]: STEP5 -> HMMER/PFAM: 05.run_hmmer.pl Running HMMER3 (Eddy 2009, Genome Inform 23, 205-11) for Pfam [6 hours, 34 minutes, 26 seconds]: STEP6 -> TAXONOMIC ASSIGNMENT: 06.lca.pl Splitting Diamond file Starting multithread LCA in 40 threads Creating /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/results/06.Hadza.fun3.tax.wranks file Creating /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/results/06.Hadza.fun3.tax.noidfilter.wranks file [6 hours, 37 minutes, 22 seconds]: STEP7 -> FUNCTIONAL ASSIGNMENT: 07.fun3assign.pl Functional assignment for COGS KEGG PFAM [6 hours, 37 minutes, 50 seconds]: STEP9 -> CONTIG TAX ASSIGNMENT: 09.summarycontigs3.pl Reading /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/results/06.Hadza.fun3.tax.wranks Writing output to /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/intermediate/09.Hadza.contiglog Reading /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/results/06.Hadza.fun3.tax.noidfilter.wranks Writing output to /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/intermediate/09.Hadza.contiglog.noidfilter [6 hours, 38 minutes, 58 seconds]: STEP10 -> MAPPING READS: 10.mapsamples.pl Reading samples from /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/data/00.Hadza.samples Metagenomes found: 2 Mapping with Bowtie2 (Langmead and Salzberg 2012, Nat Methods 9(4), 357-9) Creating reference from contigs Working with sample 1: SRR1927149 Getting raw reads Aligning to reference with bowtie Calculating contig coverage Reading contig length from /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/intermediate/01.Hadza.lon Counting with sqm_counter: Opening 40 threads 741741 reads counted 1483481 reads counted 2225221 reads counted 2966961 reads counted 3708701 reads counted 5192181 reads counted 4450441 reads counted 5933921 reads counted 6675661 reads counted 7417401 reads counted 8159141 reads counted 8900881 reads counted 9642621 reads counted 11126101 reads counted 10384361 reads counted 12609581 reads counted 11867841 reads counted 13351321 reads counted 14834801 reads counted 14093061 reads counted 15576541 reads counted 17060021 reads counted 16318281 reads counted 17801761 reads counted 18543501 reads counted 19285241 reads counted 20026981 reads counted 20768721 reads counted 21510461 reads counted 22993941 reads counted 22252201 reads counted 23735681 reads counted 24477421 reads counted 25219161 reads counted 25960901 reads counted 26702641 reads counted 27444381 reads counted 28186121 reads counted 28927861 reads counted 29669601 reads counted Working with sample 2: SRR1929485 Getting raw reads Aligning to reference with bowtie Calculating contig coverage Reading contig length from /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/intermediate/01.Hadza.lon Counting with sqm_counter: Opening 40 threads 237241 reads counted 474481 reads counted 711721 reads counted 948961 reads counted 1186201 reads counted 1423441 reads counted 1660681 reads counted 1897921 reads counted 2135161 reads counted 2372401 reads counted 2609641 reads counted 2846881 reads counted 3084121 reads counted 3321361 reads counted 3558601 reads counted 3795841 reads counted 4033081 reads counted 4270321 reads counted 4507561 reads counted 4744801 reads counted 4982041 reads counted 5219281 reads counted 5456521 reads counted 5693761 reads counted 5931001 reads counted 6168241 reads counted 6405481 reads counted 6642721 reads counted 6879961 reads counted 7117201 reads counted 7354441 reads counted 7828921 reads counted 7591681 reads counted 8066161 reads counted 8540641 reads counted 8303401 reads counted 9252361 reads counted 8777881 reads counted 9489601 reads counted 9015121 reads counted Output in /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/intermediate/10.Hadza.mapcount [6 hours, 52 minutes, 30 seconds]: STEP11 -> COUNTING TAX ABUNDANCES: 11.mcount.pl [6 hours, 52 minutes, 39 seconds]: STEP12 -> COUNTING FUNCTION ABUNDANCES: 12.funcover.pl
Calculating coverage for functions Reading coverage in /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/intermediate/10.Hadza.mapcount Reading rpkm in /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/intermediate/10.Hadza.mapcount Now creating cog coverage output in /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/results/12.Hadza.cog.funcover Now creating kegg coverage output in /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/results/12.Hadza.kegg.funcover Now creating cog raw reads output in /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/ext_tables/12.Hadza.cog.stamp Now creating kegg raw reads output in /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/ext_tables/12.Hadza.kegg.stamp [6 hours, 53 minutes, 2 seconds]: STEP13 -> CREATING GENE TABLE: 13.mergeannot2.pl Creating table in /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/results/13.Hadza.orftable Creating table in /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/results/13.Hadza.orftable Reading GFF in /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/results/03.Hadza.gff Reading Diamond hits Reading COG list Reading KEGG list Reading aa sequences Reading nt sequences Reading rRNA sequences Reading tRNA/tmRNA sequences Reading ORF information Calculating GC content for genes Calculating GC content for RNAs Reading contig information Reading KEGG annotations Reading COGs annotations Reading Pfam annotations Reading RPKMs and Coverages

GENE TABLE CREATED: /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/results/13.Hadza.orftable

[6 hours, 54 minutes, 15 seconds]: STEP14 -> BINNING: 14.runbinning.pl [7 hours, 7 minutes, 0 seconds]: STEP15 -> DAS_TOOL MERGING: 15.dastool.pl
[7 hours, 8 minutes, 26 seconds]: STEP16 -> BIN TAX ASSIGNMENT: 16.addtax2.pl
[7 hours, 11 minutes, 25 seconds]: STEP17 -> CHECKING BINS: 17.checkM_batch.pl
Evaluating bins with CheckM (Parks et al 2015, Genome Res 25, 1043-55) Creating /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/temp/checkm_batch Reading /home/cer/anaconda3/envs/SQM150/SqueezeMeta/data/alltaxlist.txt Looking for DAS bins in /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/results/bins 60 bins found Bin 1/60: maxbin.001.fasta.contigs.fa.tax Using profile for class rank : Gammaproteobacteria Bin 2/60: maxbin.002.fasta.contigs.fa.tax Using profile for family rank : Succinivibrionaceae Bin 3/60: maxbin.003.fasta.contigs.fa.tax Using profile for genus rank : Prevotella Bin 4/60: maxbin.004.fasta.contigs.fa.tax Using profile for genus rank : Prevotella Bin 5/60: maxbin.005.fasta.contigs.fa.tax Using profile for family rank : Rikenellaceae Bin 6/60: maxbin.006.fasta.contigs.fa.tax Using profile for genus rank : Treponema Bin 7/60: maxbin.007.fasta.contigs.fa.tax Using profile for genus rank : Clostridium Bin 8/60: maxbin.008.fasta.contigs.fa.tax Using profile for genus rank : Prevotella Bin 9/60: maxbin.009.fasta.contigs.fa.tax Using profile for class rank : Clostridia Bin 10/60: maxbin.010.fasta.contigs.fa.tax Using profile for family rank : Oscillospiraceae Using profile for class rank : Clostridia Bin 11/60: maxbin.011.fasta.contigs.fa.tax Using profile for domain rank : Bacteria Bin 12/60: maxbin.013.fasta_sub.contigs.fa.tax Using profile for genus rank : Prevotella Bin 13/60: maxbin.014.fasta_sub.contigs.fa.tax Using profile for genus rank : Prevotella Bin 14/60: maxbin.016.fasta.contigs.fa.tax Using profile for class rank : Clostridia Bin 15/60: maxbin.017.fasta.contigs.fa.tax Using profile for genus rank : Phascolarctobacterium Using profile for family rank : Acidaminococcaceae Bin 16/60: maxbin.018.fasta.contigs.fa.tax Using profile for genus rank : Prevotella Bin 17/60: maxbin.019.fasta.contigs.fa.tax Using profile for family rank : Oscillospiraceae Using profile for class rank : Clostridia Bin 18/60: maxbin.020.fasta_sub.contigs.fa.tax Using profile for family rank : Oscillospiraceae Using profile for class rank : Clostridia Bin 19/60: maxbin.021.fasta.contigs.fa.tax Using profile for family rank : Elusimicrobiaceae Using profile for order rank : Elusimicrobiales Using profile for class rank : Elusimicrobia Using profile for phylum rank : Elusimicrobia Using profile for domain rank : Bacteria Bin 20/60: maxbin.024.fasta.contigs.fa.tax Using profile for class rank : Lentisphaeria Using profile for phylum rank : Lentisphaerae Using profile for domain rank : Bacteria Bin 21/60: maxbin.025.fasta.contigs.fa.tax Using profile for family rank : Rikenellaceae Bin 22/60: maxbin.026.fasta.contigs.fa.tax Using profile for family rank : Selenomonadaceae Using profile for order rank : Selenomonadales Bin 23/60: maxbin.027.fasta_sub.contigs.fa.tax Using profile for genus rank : Prevotella Bin 24/60: maxbin.028.fasta.contigs.fa.tax Using profile for genus rank : Prevotella Bin 25/60: maxbin.030.fasta_sub.contigs.fa.tax Using profile for genus rank : Prevotella Bin 26/60: maxbin.031.fasta_sub.contigs.fa.tax Using profile for class rank : Clostridia Bin 27/60: maxbin.032.fasta.contigs.fa.tax Using profile for order rank : Bacteroidales Bin 28/60: maxbin.033.fasta.contigs.fa.tax Using profile for domain rank : Bacteria Bin 29/60: maxbin.034.fasta.contigs.fa.tax Using profile for family rank : Elusimicrobiaceae Using profile for order rank : Elusimicrobiales Using profile for class rank : Elusimicrobia Using profile for phylum rank : Elusimicrobia Using profile for domain rank : Bacteria Bin 30/60: maxbin.035.fasta_sub.contigs.fa.tax Using profile for family rank : Lachnospiraceae Bin 31/60: maxbin.036.fasta_sub.contigs.fa.tax Using profile for genus rank : Prevotella Bin 32/60: maxbin.037.fasta.contigs.fa.tax Using profile for family rank : Rikenellaceae Bin 33/60: maxbin.038.fasta.contigs.fa.tax Using profile for family rank : Prevotellaceae Bin 34/60: maxbin.039.fasta.contigs.fa.tax Using profile for class rank : Spirochaetia Bin 35/60: maxbin.040.fasta.contigs.fa.tax Using profile for class rank : Clostridia Bin 36/60: maxbin.041.fasta_sub.contigs.fa.tax Using profile for phylum rank : Firmicutes Bin 37/60: maxbin.042.fasta.contigs.fa.tax Bin 38/60: maxbin.043.fasta.contigs.fa.tax Using profile for order rank : Bacteroidales Bin 39/60: maxbin.044.fasta_sub.contigs.fa.tax Using profile for genus rank : Treponema Bin 40/60: maxbin.045.fasta.contigs.fa.tax Using profile for class rank : Clostridia Bin 41/60: maxbin.046.fasta_sub.contigs.fa.tax Using profile for family rank : Oscillospiraceae Using profile for class rank : Clostridia Bin 42/60: maxbin.047.fasta.contigs.fa.tax Using profile for genus rank : Escherichia Bin 43/60: maxbin.048.fasta_sub.contigs.fa.tax Using profile for genus rank : Dialister Bin 44/60: metabat2.11.fa_sub.contigs.fa.tax Using profile for genus rank : Prevotella Bin 45/60: metabat2.12.fa_sub.contigs.fa.tax Using profile for genus rank : Prevotella Bin 46/60: metabat2.18.fa.contigs.fa.tax Using profile for family rank : Oscillospiraceae Using profile for class rank : Clostridia Bin 47/60: metabat2.21.fa.contigs.fa.tax Using profile for genus rank : Treponema Bin 48/60: metabat2.23.fa.contigs.fa.tax Using profile for class rank : Clostridia Bin 49/60: metabat2.24.fa_sub.contigs.fa.tax Using profile for genus rank : Prevotella Bin 50/60: metabat2.27.fa.contigs.fa.tax Using profile for genus rank : Dialister Bin 51/60: metabat2.2.fa_sub.contigs.fa.tax Using profile for family rank : Lachnospiraceae Bin 52/60: metabat2.31.fa.contigs.fa.tax Using profile for genus rank : Ruminiclostridium Using profile for family rank : Oscillospiraceae Using profile for class rank : Clostridia Bin 53/60: metabat2.32.fa_sub.contigs.fa.tax Using profile for genus rank : Prevotella Bin 54/60: metabat2.35.fa.contigs.fa.tax Using profile for genus rank : Prevotella Bin 55/60: metabat2.38.fa_sub.contigs.fa.tax Using profile for genus rank : Prevotella Bin 56/60: metabat2.42.fa_sub.contigs.fa.tax Using profile for genus rank : Prevotella Bin 57/60: metabat2.43.fa_sub.contigs.fa.tax Using profile for genus rank : Phascolarctobacterium Using profile for family rank : Acidaminococcaceae Bin 58/60: metabat2.6.fa_sub.contigs.fa.tax Using profile for genus rank : Prevotella Bin 59/60: metabat2.7.fa_sub.contigs.fa.tax Using profile for genus rank : Prevotella Bin 60/60: metabat2.9.fa.contigs.fa.tax Using profile for class rank : Alphaproteobacteria Storing results for DAS in /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/intermediate/17.Hadza.checkM [8 hours, 7 minutes, 23 seconds]: STEP18 -> CREATING BIN TABLE: 18.getbins.pl Method:DAS Reading checkM results in /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/intermediate/17.Hadza.checkM Looking for bins in /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/results/bins Reading data for bin metabat2.9.fa.contigs
Calculating coverages Creating table in /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/results/18.Hadza.bintable Done!

BIN TABLE CREATED: /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/results/18.Hadza.bintable

[8 hours, 7 minutes, 43 seconds]: STEP19 -> CREATING CONTIG TABLE: 19.getcontigs.pl Reading taxa for contigs information...done! Reading GC & length... done! Reading number of genes... done! Reading coverages... done! Reading bins... done! Creating contig table...done!

CONTIG TABLE CREATED: /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/results/19.Hadza.contigtable

[8 hours, 8 minutes, 30 seconds]: STEP20 -> CREATING TABLE OF PATHWAYS IN BINS: 20.minpath.pl Running MinPath (Ye and Doak 2009, PLoS Comput Biol 5(8), e1000465) Running MinPath for kegg: metabat2.9.fa.contigs
Running MinPath for metacyc: metabat2.9.fa.contigs
[8 hours, 10 minutes, 56 seconds]: STEP21 -> MAKING FINAL STATISTICS: 21.stats.pl Output in /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/results/21.Hadza.stats

Deleting temporary files in /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/temp [8 hours, 11 minutes, 15 seconds]: FINISHED -> Have fun! For citation purposes, you can find a summary of methods in the file /bio1/cer/SqueezeMetat150_Databases_rabbit/test/Hadza/methods.txt

You can analize your results using the SQMTools R library (see https://github.com/jtamames/SqueezeMeta/wiki/Using-R-to-analyze-your-SQM-results)

*****SqueezeMeta 1.5.0 with re-downloaded databases on this machine appears not to fail.

fpusan commented 2 years ago

Thanks for the detailed report! And I'm glad it ended up working. So what do you get when running ls /bio1/cer/SqueezeMeta150_Databases?

nucleoli2 commented 2 years ago

...yes, I don't see the difference in ~/db, between this and the version that works....

The top two levels are:

SqueezeMeta150_Databases: drwxrwxr-x. 10 cer cer 4096 Dec 26 18:30 db drwxrwxr-x. 6 cer cer 133 Jan 17 15:44 test

SqueezeMeta150_Databases/db:

total 244319900 -rw-r--r--. 1 cer cer 67368053 Nov 16 2018 arc.all.faa -rw-r--r--. 1 cer cer 839378 Apr 27 2018 arc.hmm -rw-r--r--. 1 cer cer 1324749 Nov 16 2018 arc.scg.faa -rw-r--r--. 1 cer cer 202175 Nov 16 2018 arc.scg.lookup -rw-r--r--. 1 cer cer 16173613 Nov 16 2018 bac.all.faa -rw-r--r--. 1 cer cer 3671082 Apr 27 2018 bacar_marker.hmm -rw-r--r--. 1 cer cer 808656 Apr 27 2018 bac.hmm -rw-r--r--. 1 cer cer 249620 Nov 16 2018 bac.scg.faa -rw-r--r--. 1 cer cer 36700 Nov 16 2018 bac.scg.lookup -rw-rw-r--. 1 cer cer 56 Dec 26 18:30 DB_BUILD_DATE drwxrwxr-x. 2 cer cer 63 Jan 13 2015 distributions -rw-rw-r--. 1 cer cer 3510404936 Dec 26 09:02 eggnog.dmnd -rw-r--r--. 1 cer cer 1038507 Apr 27 2018 euk.hmm drwxrwxr-x. 4 cer cer 240 Jan 13 2015 genome_tree drwxrwxr-x. 2 cer cer 84 Jan 13 2015 hmms drwxrwxr-x. 2 cer cer 93 Jan 13 2015 hmms_ssu drwxrwxr-x. 2 cer cer 30 Jan 13 2015 img -rw-rw-r--. 1 cer cer 2867640353 Dec 26 06:20 keggdb.dmnd -rw-rw-r--. 1 cer cer 33 Oct 16 2020 kegg.db.md5 drwxrwxr-x. 2 cer cer 76 Dec 26 18:30 LCA_tax -rw-r--r--. 1 cer cer 15929643 Apr 27 2018 marker.hmm -rw-r--r--. 1 cer cer 456264 Apr 27 2018 mito.hmm -rw-rw-r--. 1 cer cer 231400917020 Dec 26 08:50 nr.dmnd -rw-rw-r--. 1 cer cer 64 Dec 26 09:00 nr.md5 drwxrwxr-x. 2 cer cer 52 Jan 13 2015 pfam -rw-rw-r--. 1 cer cer 1572192814 Dec 26 09:02 Pfam-A.hmm -rw-rw-r--. 1 cer cer 335 May 21 2018 ReadMe -rw-rw-r--. 1 cer cer 81505 Jan 13 2015 selected_marker_sets.tsv -rw-rw-r--. 1 cer cer 10680384402 Nov 3 2020 silva.nr_v132.align -rw-rw-r--. 1 cer cer 33 Nov 3 2020 silva.nr_v132.align.md5 -rw-rw-r--. 1 cer cer 22266447 Nov 3 2020 silva.nr_v132.tax -rw-rw-r--. 1 cer cer 33 Nov 3 2020 silva.nr_v132.tax.md5 -rw-rw-r--. 1 cer cer 21521855 Jan 13 2015 taxon_marker_sets.tsv drwxrwxr-x. 2 cer cer 27 Jan 13 2015 test_data

fpusan commented 2 years ago

I think you should have run configure_nodb.pl /bio1/cer/SqueezeMeta150_Databases/db

Since /bio1/cer/SqueezeMeta150_Databases/db is the directory that actually contains the nr.dmnd file.

nucleoli2 commented 2 years ago

...good... my error...

Thanks!!

SanshiroTakahashi commented 2 years ago

Mamba create -n SqueezeMeta -c conda-forge -c bioconda -c fpusan squeezemeta conda activate SqueezeMeta download_databases.pl /path/to/store/databases test_install

The manual does not describe the differences or the reasons for them. Doing this seems to solve the problem. They say that any exchange that cannot be resolved above is pointless.