Closed dorton21 closed 2 years ago
Hi! Does the file /scratch/squeezedb/nr.db
exist? If not, where is nr.db
located?
nr.db
doesn't appear to exist at all. I thought maybe an easy fix would be to delete the db I had downloaded and reinstall it. But the new install didn't include nr.db either. I don't know if it is something that comes with the db installation or if it is something generated via some script I need to call.
Now I realize...
Did you actually download the database at all? (i.e. run download_databases.pl
?)
configure_nodb.pl
would write the necessary configuration files, but you still need to have the database downloaded.
I ran download_databases.pl outside of the container in order to separately download the DB.
I see there is a nr.dmnd that is generated. Is this file supposed to be mutated into the nr.db somehow?
Yes, sorry, nr.dmnd
is the correct one. Where is it located?
nr.dmnd
is inside the path /scratch/squeezedb/db
What happens if you run /path/to/SqueezeMeta/bin/diamond dbinfo --db /scratch/squeezedb/db/nr.dmnd
?
That returns the following:
diamond v2.0.8.146 (C) Max Planck Society for the Advancement of Science Documentation, support and updates available at http://www.diamondsearch.org
Database type Diamond database Database format version 3 Diamond build 146 Sequences 371327556 Letters 134857927905
Ok, so the database is there and fine.
And what output do you get from /path/to/SqueezeMeta/utils/install_utils/test_install.pl
?
Running that command gives SqueezeMeta_conf.pl says that databases are located in /scratch/squeezedb/db but we can't find nr.db there, or it is corrupted
. Like I said before, that file is not there. I don't know if it needs to be generated or what the deal is. I do, however, have the nr.dmnd file.
The nr.db
part is a typo in the error message (hadn't noticed before, will patch it). It's really looking for nr.dmnd
which is there, right?
The script is internally running /path/to/SqueezeMeta/bin/diamond dbinfo --db /scratch/squeezedb/db/nr.dmnd
and complaining if the command fails.
So I'm surprised that it worked when you ran it manually before, but test_install.pl
is not working...
1) What shell interpreter are you using?
2) Can you manually edit /path/to/SqueezeMeta/utils/install_utils/test_install.pl
?
my $ecode = system("$installpath/bin/diamond dbinfo --db $databasepath/nr.dmnd >/dev/null 2>&1");
my $ecode = system("$installpath/bin/diamond dbinfo --db $databasepath/nr.dmnd");
And see what's the output then...Yes, nr.dmnd is in the proper location. For some more perspective. I'm trying to build this in a singularity container using a centos8 image. I can probably use sed to make the modifications you specified and I'll report back the results. Thank you for the continued help.
Also let me know what is your shell interpreter? You can check with
echo $0
and
echo $SHELL
Bash is the shell interpreter.
Running Diamond (Buchfink et al 2015, Nat Methods 12, 59-60) for taxaError running command: /opt/conda/envs/SqueezeMeta-2020.11/SqueezeMeta/bin/diamond blastp -q /home/las_djorton/singularity-squeezemeta/Hadza/results/03.Hadza.faa -p 12 -d /scratch/squeezedb/db/nr.dmnd -e 0.001 --id 40 -f tab -b 2.7 --quiet -o /home/las_djorton/singularity-squeezemeta/Hadza/intermediate/04.Hadza.nr.diamond at /opt/conda/envs/SqueezeMeta-2020.11/SqueezeMeta/scripts/04.rundiamond.pl line 66. Stopping in STEP4 -> 04.rundiamond.pl. Program finished abnormally
If you don't know what went wrong or want further advice, please look for similar issues in https://github.com/jtamames/SqueezeMeta/issues Feel free to open a new issue if you don't find the answer there. Please add a brief description of the problem and upload the /home/las_djorton/singularity-squeezemeta/Hadza/syslog file (zip it first) Died at /opt/conda/envs/SqueezeMeta-2020.11/bin/SqueezeMeta.pl line 1367.
The above is the new error message after an attempt to make the changes you suggested.
But that output is for running SqueezeMeta, and the changes were made in test_install.pl
, right?
What is the output of test_install.pl
after the changes?
Alright. I did some messing around. It can detect nr.db now but I have landed on a new error.
Checking that SqueezeMeta is properly configured... checking database in /scratch/squeezedb/db nr.db OK
CRITICAL ERROR: Can not find the checkm DATA_CONFIG file in /opt/conda/envs/SqueezeMeta-2020.11/SqueezeMeta/lib/checkm. This indicates a broken installation. If the error persists after reinstalling from scratch please open an issue at http://github.com/jtamames/SqueezeMeta
Hi again! Sorry for the delay, I was on a much needed holiday break.
That one comes from not running configure_nodb.pl
but is easy to fix.
Manually create the /opt/conda/envs/SqueezeMeta-2020.11/SqueezeMeta/lib/checkm/DATA_CONFIG
file, with the following content
{"dataRoot": "/scratch/squeezedb/db/", "remoteManifestURL": "https://data.ace.uq.edu.au/public/CheckM_databases/", "manifestType": "CheckM", "localManifestName": ".dmanifest", "remoteManifestName": ".dmanifest"}
In general, "dataRoot"
should point to wherever you have the SqueezeMeta databases installed (I copied that path from your previous answers, but double-check just in case)
Thank you for your continued help. I added the DATA_CONFIG into the proper spot and its now failing on the fourth step.
[2 hours, 39 minutes, 36 seconds]: STEP4 -> HOMOLOGY SEARCHES: 04.rundiamond.pl Setting block size for Diamond AVAILABLE (free) RAM memory: 13.10 Gb We will set Diamond block size to 2.6 (Gb RAM/5, Max 8). You can override this setting using the -b option when starting the project, or changing the $blocksize variable in SqueezeMeta_conf.pl Running Diamond (Buchfink et al 2015, Nat Methods 12, 59-60) for taxaError running command: /opt/conda/envs/SqueezeMeta-2020.11/SqueezeMeta/bin/diamond blastp -q /root/HadzaTest/results/03.HadzaTest.faa -p 12 -d /scratch/squeezedb/db/nr.dmnd -e 0.001 --id 40 -f tab -b 2.6 --quiet -o /root/HadzaTest/intermediate/04.HadzaTest.nr.diamond at /opt/conda/envs/SqueezeMeta-2020.11/SqueezeMeta/scripts/04.rundiamond.pl line 66. Stopping in STEP4 -> 04.rundiamond.pl. Program finished abnormally
It doesn't seem to specify any specific error other than it failing.
Try running the command alone
/opt/conda/envs/SqueezeMeta-2020.11/SqueezeMeta/bin/diamond blastp -q /root/HadzaTest/results/03.HadzaTest.faa -p 12 -d /scratch/squeezedb/db/nr.dmnd -e 0.001 --id 40 -f tab -b 2.6 --quiet -o /root/HadzaTest/intermediate/04.HadzaTest.nr.diamond
Singularity local:~> /opt/conda/envs/SqueezeMeta-2020.11/SqueezeMeta/bin/diamond blastp -q /root/HadzaTest/results/03.HadzaTest.faa -p 12 -d /scratch/squeezedb/db/nr.dmnd -e 0.001 --id 40 -f tab -b 2.6 --quiet -o /root/HadzaTest/intermediate/04.HadzaTest.nr.diamond
Responds with...
Killed
No other errors.
This is the last message in the logs:
2021-10-12 17:04:28 - Assemble contigs from SdBG for k = 119 2021-10-12 17:04:28 - command /opt/conda/envs/SqueezeMeta-2020.11/SqueezeMeta/bin/megahit/megahit_core assemble -s /root/HadzaTest/data/megahit/tmp/k119/119 -o /root/HadzaTest/data/megahit/intermediate_contigs/k119 -t 8 --min_standalone 300 --prune_level 2 --merge_len 20 --merge_similar 0.95 --cleaning_rounds 5 --disconnect_ratio 0.1 --low_local_ratio 0.2 --cleaning_rounds 5 --min_depth 2 --bubble_level 2 --max_tip_len 182.0 --is_final_round 2021-10-12 17:04:32 - b'INFO main_assemble.cpp : 129 - Loading succinct de Bruijn graph: /root/HadzaTest/data/megahit/tmp/k119/119Done. Time elapsed: 3.936642' 2021-10-12 17:04:32 - b'INFO main_assemble.cpp : 133 - Number of Edges: 389706867; K value: 119' 2021-10-12 17:04:32 - b'INFO main_assemble.cpp : 140 - Number of CPU threads: 8' 2021-10-12 17:04:49 - b'INFO assembly/sdbg_pruning.cpp : 160 - Removing tips with length less than 2; Accumulated tips removed: 3408; time elapsed: 0.9663' 2021-10-12 17:04:50 - b'INFO assembly/sdbg_pruning.cpp : 160 - Removing tips with length less than 4; Accumulated tips removed: 8760; time elapsed: 1.0998' 2021-10-12 17:04:52 - b'INFO assembly/sdbg_pruning.cpp : 160 - Removing tips with length less than 8; Accumulated tips removed: 20602; time elapsed: 1.6616' 2021-10-12 17:04:55 - b'INFO assembly/sdbg_pruning.cpp : 160 - Removing tips with length less than 16; Accumulated tips removed: 41962; time elapsed: 3.0469' 2021-10-12 17:05:00 - b'INFO assembly/sdbg_pruning.cpp : 160 - Removing tips with length less than 32; Accumulated tips removed: 79160; time elapsed: 5.4550' 2021-10-12 17:05:09 - b'INFO assembly/sdbg_pruning.cpp : 160 - Removing tips with length less than 64; Accumulated tips removed: 133646; time elapsed: 8.6437' 2021-10-12 17:05:25 - b'INFO assembly/sdbg_pruning.cpp : 160 - Removing tips with length less than 128; Accumulated tips removed: 215554; time elapsed: 16.3859' 2021-10-12 17:05:49 - b'INFO assembly/sdbg_pruning.cpp : 169 - Removing tips with length less than 182; Accumulated tips removed: 243846; time elapsed: 23.4003' 2021-10-12 17:05:49 - b'INFO main_assemble.cpp : 158 - Tips removal done! Time elapsed(sec): 77.137' 2021-10-12 17:07:12 - b'INFO assembly/unitig_graph.cpp : 84 - Graph size without loops: 334900, palindrome: 3' 2021-10-12 17:07:13 - b'INFO main_assemble.cpp : 167 - unitig graph size: 335215, time for building: 84.255' 2021-10-12 17:07:13 - b'INFO assembly/contig_stat.h : 40 - Max: 219680, Min: 120, N50: 1036, number contigs: 335215, number isolated: 171307, number looped: 315, total size: 226649654,' 2021-10-12 17:07:13 - b'INFO main_assemble.cpp : 184 - Graph cleaning round 1' 2021-10-12 17:07:13 - b'INFO main_assemble.cpp : 201 - Number of bubbles removed: 4320, Time elapsed(sec): 0.449' 2021-10-12 17:07:14 - b'INFO main_assemble.cpp : 211 - Number of complex bubbles removed: 3432, Time elapsed(sec): 0.858741' 2021-10-12 17:07:15 - b'INFO main_assemble.cpp : 222 - Number unitigs disconnected: 14639, time: 0.295' 2021-10-12 17:07:15 - b'INFO main_assemble.cpp : 246 - Unitigs removed in excessive pruning: 1372, time: 0.218' 2021-10-12 17:07:15 - b'INFO main_assemble.cpp : 184 - Graph cleaning round 2' 2021-10-12 17:07:16 - b'INFO main_assemble.cpp : 192 - Tips removed: 1371, time: 1.490' 2021-10-12 17:07:17 - b'INFO main_assemble.cpp : 201 - Number of bubbles removed: 8, Time elapsed(sec): 0.301' 2021-10-12 17:07:17 - b'INFO main_assemble.cpp : 211 - Number of complex bubbles removed: 93, Time elapsed(sec): 0.383459' 2021-10-12 17:07:17 - b'INFO main_assemble.cpp : 222 - Number unitigs disconnected: 452, time: 0.263' 2021-10-12 17:07:18 - b'INFO main_assemble.cpp : 246 - Unitigs removed in excessive pruning: 254, time: 0.234' 2021-10-12 17:07:18 - b'INFO main_assemble.cpp : 184 - Graph cleaning round 3' 2021-10-12 17:07:19 - b'INFO main_assemble.cpp : 192 - Tips removed: 105, time: 1.458' 2021-10-12 17:07:19 - b'INFO main_assemble.cpp : 201 - Number of bubbles removed: 5, Time elapsed(sec): 0.272' 2021-10-12 17:07:20 - b'INFO main_assemble.cpp : 211 - Number of complex bubbles removed: 1, Time elapsed(sec): 0.392340' 2021-10-12 17:07:20 - b'INFO main_assemble.cpp : 222 - Number unitigs disconnected: 129, time: 0.256' 2021-10-12 17:07:20 - b'INFO main_assemble.cpp : 246 - Unitigs removed in excessive pruning: 94, time: 0.213' 2021-10-12 17:07:20 - b'INFO main_assemble.cpp : 184 - Graph cleaning round 4' 2021-10-12 17:07:22 - b'INFO main_assemble.cpp : 192 - Tips removed: 36, time: 1.437' 2021-10-12 17:07:22 - b'INFO main_assemble.cpp : 201 - Number of bubbles removed: 1, Time elapsed(sec): 0.279' 2021-10-12 17:07:22 - b'INFO main_assemble.cpp : 211 - Number of complex bubbles removed: 2, Time elapsed(sec): 0.363779' 2021-10-12 17:07:22 - b'INFO main_assemble.cpp : 222 - Number unitigs disconnected: 70, time: 0.266' 2021-10-12 17:07:23 - b'INFO main_assemble.cpp : 246 - Unitigs removed in excessive pruning: 37, time: 0.218' 2021-10-12 17:07:23 - b'INFO main_assemble.cpp : 184 - Graph cleaning round 5' 2021-10-12 17:07:24 - b'INFO main_assemble.cpp : 192 - Tips removed: 14, time: 1.404' 2021-10-12 17:07:24 - b'INFO main_assemble.cpp : 201 - Number of bubbles removed: 1, Time elapsed(sec): 0.269' 2021-10-12 17:07:25 - b'INFO main_assemble.cpp : 211 - Number of complex bubbles removed: 0, Time elapsed(sec): 0.341810' 2021-10-12 17:07:25 - b'INFO main_assemble.cpp : 222 - Number unitigs disconnected: 29, time: 0.295' 2021-10-12 17:07:25 - b'INFO main_assemble.cpp : 246 - Unitigs removed in excessive pruning: 12, time: 0.215' 2021-10-12 17:07:25 - b'INFO assembly/contig_stat.h : 40 - Max: 296942, Min: 120, N50: 1167, number contigs: 291331, number isolated: 174819, number looped: 323, total size: 219965631,' 2021-10-12 17:07:30 - b'INFO main_assemble.cpp : 289 - Number of local low depth unitigs removed: 5123, complex bubbles removed: 185, time: 4.452962' 2021-10-12 17:07:30 - b'INFO assembly/contig_stat.h : 40 - Max: 296942, Min: 120, N50: 1202, number contigs: 277964, number isolated: 175516, number looped: 323, total size: 217904971,' 2021-10-12 17:08:01 - b'INFO utils/utils.h : 152 - Real: 213.0491\tuser: 1019.8106\tsys: 1.9013\tmaxrss: 877040' 2021-10-12 17:08:01 - Merging to output final contigs 2021-10-12 17:08:01 - 241008 contigs, total 212125857 bp, min 200 bp, max 296942 bp, avg 880 bp, N50 1266 bp 2021-10-12 17:08:01 - ALL DONE. Time elapsed: 8335.118238 seconds
Weird that it just gets killed right away. Are you in a cluster? If so, are you booking enough memory?
srun --time=12:00:00 --nodes=4 --cpus-per-task=8 --mem=32G --pty /usr/bin/bash
this is what I am booking to run this. I think the exact error message is Died at /opt/conda/envs/SqueezeMeta-2020.11/bin/SqueezeMeta.pl line 1367.
[1 hours, 27 minutes, 53 seconds]: STEP4 -> HOMOLOGY SEARCHES: 04.rundiamond.pl Setting block size for Diamond AVAILABLE (free) RAM memory: 364.98 Gb We will set Diamond block size to 8 (Gb RAM/5, Max 8). You can override this setting using the -b option when starting the project, or changing the $blocksize variable in SqueezeMeta_conf.pl Running Diamond (Buchfink et al 2015, Nat Methods 12, 59-60) for taxaError running command: /opt/conda/envs/SqueezeMeta-2020.11/SqueezeMeta/bin/diamond blastp -q /work/las-research/students/las_djorton/Hadza/results/03.Hadza.faa -p 12 -d /work/LAS/BioDatabase/squeezedb/db/nr.dmnd -e 0.001 --id 40 -f tab -b 8 --quiet -o /work/las-research/students/las_djorton/Hadza/intermediate/04.Hadza.nr.diamond at /opt/conda/envs/SqueezeMeta-2020.11/SqueezeMeta/scripts/04.rundiamond.pl line 66. Stopping in STEP4 -> 04.rundiamond.pl. Program finished abnormally
If you don't know what went wrong or want further advice, please look for similar issues in https://github.com/jtamames/SqueezeMeta/issues Feel free to open a new issue if you don't find the answer there. Please add a brief description of the problem and upload the /work/las-research/students/las_djorton/Hadza/syslog file (zip it first) Died at /opt/conda/envs/SqueezeMeta-2020.11/bin/SqueezeMeta.pl line 1367.
Ok, here's what I think it's happening. 1) You book 32Gb of memory 2) SqueezeMeta, when running in the computing node, has no way of knowing that. It looks at the system and sees that there are 364.98 Gb free (the whole amount of memory free in the node at the time). It thus sets diamond blocksize to 8, which is expected to use 40 Gb of memory. 3) When the process exceeds 32 Gb usage, it gets killed by slurm.
So I think that the answer might be setting up -b manually. Can you add -b 1.5
in the initial call to SqueezeMeta? Also make sure to keep requesting those 32Gb (or ideally more).
[10 hours, 32 minutes, 4 seconds]: STEP18 -> 18.checkM_batch.pl PATH=/opt/conda/envs/SqueezeMeta-2020.11/SqueezeMeta/bin:/opt/conda/envs/SqueezeMeta-2020.11/SqueezeMeta/bin/pplacer:/opt/conda/envs/SqueezeMeta-2020.11/SqueezeMeta/bin/hmmer:$PATH /opt/conda/envs/SqueezeMeta-2020.11/SqueezeMeta/bin/checkm taxon_set phylum Elusimicrobia /work/las-research/students/las_djorton/Hadza/data/checkm_markers/Elusimicrobia.ms >> /work/las-research/students/las_djorton/Hadza/syslog 2>&1
[CheckM - taxon_set] Generate taxonomic-specific marker set.
It seems that the CheckM data folder has not been set yet or has been removed. Running: 'checkm data setRoot'. You do not seem to have permission to edit the checkm config file located at /opt/conda/envs/SqueezeMeta-2020.11/SqueezeMeta/lib/checkm/DATA_CONFIG Please try again with updated privileges. Error was:
[Errno 30] Read-only file system: '/opt/conda/envs/SqueezeMeta-2020.11/SqueezeMeta/lib/checkm/DATA_CONFIG' Sorry, CheckM cannot run without a valid data folder.
Unexpected error: <class 'FileNotFoundError'>
Traceback (most recent call last):
File "/opt/conda/envs/SqueezeMeta-2020.11/SqueezeMeta/bin/checkm", line 718, in
The latest in this series of issues. Thank you for your continued support with this.
Moved the answer to the last question to a different issue, as it is no longer related to STEP4. However, I am curious to what was causing the problem in step 4, as it seems that it worked ok for you in the end...
You were correct in you suspicion that I wasn't booking enough memory. Once again thank you for your continued support.
Ok, glad it worked! Closing issue
Currently running into " STEP4 -> 04.rundiamond.pl. Program finished abnormally" during the testing phase. I am attempting to put SqueezeMeta into a singularity container where execution is done like
singularity exec --bind /scratch/ squeezemeta.img SqueezeMeta.pl -m coassembly -p Hadza -s /scratch/squeezedb/test/test.samples -f /scratch/squeezedb/test/raw/
I don't believe I can run the
configure_nodb_alt.pl
due to how I am setting up squeezemeta so I added the following to try and replicate its effects in the container:# Replacing Squeezemeta's default db installation location
wget -U '' -P $SQUEEZEMETAPATH/lib http://silvani.cnb.csic.es/SqueezeMeta/classifier.tar.gz
tar xvzf $SQUEEZEMETAPATH/lib/classifier.tar.gz -C $SQUEEZEMETAPATH
rm $SQUEEZEMETAPATH/lib/classifier.tar.gz
cd $SQUEEZEMETAPATH/bin/
ln -s $SQUEEZEMETAPATH/classifier/classifier.jar . > /dev/null 2>&1
cd /
sed 's/\/media\/disk5\/tamames\/SqueezeMeta\/db/\/scratch\/squeezedb\/db/g' $SQUEEZEMETAPATH/scripts/SqueezeMeta_conf_original.pl > $SQUEEZEMETAPATH/scripts/SqueezeMeta_conf.pl
The reason for the scratch binding is the DB is stored in a position other than the default file path (I believe it was downloaded via/path/to/SqueezeMeta/utils/install_utils/preparing_databases/download_databases.pl /download/path
) I have the DB preinstalled into /scratch/squeezedb/ when I run<installpath>/SqueezeMeta/utils/install_utils/test_install.pl
I get the error _SqueezeMetaconf.pl says that databases are located in /scratch/squeezedb/db but we can't find nr.db there, or it is corrupted