jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis
GNU General Public License v3.0
372 stars 80 forks source link

sqm_annot.pl fails at splitting Diamond file #744

Closed gracegyho closed 8 months ago

gracegyho commented 11 months ago

I am currently running sqm_annot.pl on a database with 494 amino acid sequences. Actually, I ran into problems running it first on about 240k sequences, but after crashing the first time I tried with a specific subset.

I seem to be getting an error similar to #629 with sqm_annot.pl

At first I tried to annotate all predicted proteins in a single metagenome (237394 sequences). Then it crashed with a similar message as below.

Then I subset the metagenome to 494 sequences of interest (basically, proteins which I detected from a different analysis) and ran it again.

Output

Output of `sqm_annot.pl -s samplefile_Day20200326_subset.txt -f Dummy_Dir/ -t 24 -b 16`: ``` SQM_annot v1.6.3, September 2023 - (c) J. Tamames, F. Puente-Sánchez CNB-CSIC, Madrid, SPAIN This is part of the SqueezeMeta distribution (https://github.com/jtamames/SqueezeMeta) Please cite: Tamames & Puente-Sanchez, Frontiers in Microbiology 10.3389 (2019). doi: https://doi.org/10.3389/fmicb.2018.03349 Now I will call SqueezeMeta to do my stuff. Please hold on. *** SqueezeMeta v1.6.3, September 2023 - (c) J. Tamames, F. Puente-Sánchez CNB-CSIC, Madrid, SPAIN Please cite: Tamames & Puente-Sanchez, Frontiers in Microbiology 9, 3349 (2019). doi: https://doi.org/10.3389/fmicb.2018.03349 Run started Fri Oct 27 11:01:06 2023 in sequential mode 1 metagenomes found: Day_20200326_mapped --- SAMPLE Day_20200326_mapped --- Now creating directories Reading configuration from /scratch/gho/sqm_annot_Day-20200326_subset/Day_20200326_mapped/SqueezeMeta_conf.pl Running trimmomatic (Bolger et al 2014, Bioinformatics 30(15):2114-20) for quality filtering Parameters: Directory structure and conf files created. Exiting Working with Day_20200326_mapped Working with taxonomy database in /bioinf/home/gho/databases/SqueezeMeta/db/nr.dmnd taxa COGS Running Diamond (Buchfink et al 2015, Nat Methods 12, 59-60) for KEGG Splitting Diamond file Starting multithread LCA in 12 threads DBD::SQLite::db prepare failed: Expression tree is too large (maximum depth 1000) at /home/gho/miniconda3/envs/SqueezeMeta/SqueezeMeta/scripts/06.lca.pl line 254. Thread 1 terminated abnormally: DBD::SQLite::db prepare failed: Expression tree is too large (maximum depth 1000) at /home/gho/miniconda3/envs/SqueezeMeta/SqueezeMeta/scripts/06.lca.pl line 254. Thread 2 terminated abnormally: Cannot open /scratch/gho/sqm_annot_Day-20200326_subset/Day_20200326_mapped/temp/diamond_lca.2.m8 Thread 3 terminated abnormally: Cannot open /scratch/gho/sqm_annot_Day-20200326_subset/Day_20200326_mapped/temp/diamond_lca.3.m8 Thread 4 terminated abnormally: Cannot open /scratch/gho/sqm_annot_Day-20200326_subset/Day_20200326_mapped/temp/diamond_lca.4.m8 Thread 5 terminated abnormally: Cannot open /scratch/gho/sqm_annot_Day-20200326_subset/Day_20200326_mapped/temp/diamond_lca.5.m8 Thread 6 terminated abnormally: Cannot open /scratch/gho/sqm_annot_Day-20200326_subset/Day_20200326_mapped/temp/diamond_lca.6.m8 Thread 7 terminated abnormally: Cannot open /scratch/gho/sqm_annot_Day-20200326_subset/Day_20200326_mapped/temp/diamond_lca.7.m8 Thread 8 terminated abnormally: Cannot open /scratch/gho/sqm_annot_Day-20200326_subset/Day_20200326_mapped/temp/diamond_lca.8.m8 Thread 9 terminated abnormally: Cannot open /scratch/gho/sqm_annot_Day-20200326_subset/Day_20200326_mapped/temp/diamond_lca.9.m8 Thread 10 terminated abnormally: Cannot open /scratch/gho/sqm_annot_Day-20200326_subset/Day_20200326_mapped/temp/diamond_lca.10.m8 Thread 11 terminated abnormally: Cannot open /scratch/gho/sqm_annot_Day-20200326_subset/Day_20200326_mapped/temp/diamond_lca.11.m8 Thread 12 terminated abnormally: Cannot open /scratch/gho/sqm_annot_Day-20200326_subset/Day_20200326_mapped/temp/diamond_lca.12.m8 Creating /scratch/gho/sqm_annot_Day-20200326_subset/Day_20200326_mapped/results/06.Day_20200326_mapped.fun3.tax.wranks file Creating /scratch/gho/sqm_annot_Day-20200326_subset/Day_20200326_mapped/results/06.Day_20200326_mapped.fun3.tax.noidfilter.wranks file Functional assignment for COGS KEGG Taxonomic assignment stored in Day_20200326_mapped/results/06.Day_20200326_mapped.fun3.tax.wranks COG functional assignment stored in Day_20200326_mapped/results/07.Day_20200326_mapped.fun3.cog KEGG functional assignment stored in Day_20200326_mapped/results/07.Day_20200326_mapped.fun3.kegg COG summary created in Day_20200326_mapped/results/COG.summary KEGG summary created in Day_20200326_mapped/results/KEGG.summary Have a nice day! ```

And at this point I came across #629 , and thought to run it with the option -t 1, on one thread. And I got the same errors, threads crashing etc. Furthermore when I go into the project folder and check SqueezeMeta_conf.pl $numthreads is set to 12, despite specifying -t 1.

And actually the config file of the run where t=24 is also saying $numthreads is =12.

sqm_annot.pl worked without problems on another faa file with 8214 sequences with parameters -t 24 -b 16. I checked the conf.pl file again here, and $numthreads is again =12 instead of 24.

Is there something wrong with the thread specification line in sqm_annot.pl?

jtamames commented 11 months ago

Hello You are rigth, sqm_annot.pl calls the main SqueezeMeta program but forgets to pass the specified number of threads, silly script! For fixing this, edit the sqm_annot.pl script and do the following: Change line 71 to: my $result = GetOptions ("t=i" => \($numthreads=12), Change line 115 to: my $command="perl $scriptdir/SqueezeMeta.pl -s $tempsample -f $aadir -m sequential $edb $blockoption -t $numthreads --nopfam -c 0 --empty"; Best, J

jtamames commented 11 months ago

Nevertheless, issue #629 was happening because there was just one sequence to annotate. As multithreading here is managing dividing the input in blocks of sequences, it failed because a single sequence cannot be divided. That's why it worked when putting -t 1. In your case, it seems to be a matter of RAM memory instead. Best, J

gracegyho commented 11 months ago

Change line 115 to: my $command="perl $scriptdir/SqueezeMeta.pl -s $tempsample -f $aadir -m sequential $edb $blockoption -t $numthreads --nopfam -c 0 --empty"; Best, J

So for whatever reason this was on line 111 and I ran it with your recommended changes (line numbers didnt match though)

Running sqm_annot.pl -h, I get the error:

Global symbol "$edb" requires explicit package name (did you forget to declare "my $edb"?) at /home/gho/miniconda3/envs/SqueezeMeta/bin/sqm_annot_backup.pl line 111.
Execution of /home/gho/miniconda3/envs/SqueezeMeta/bin/sqm_annot_backup.pl aborted due to compilation errors.

So I removed $edb from this line and running sqm_annot.pl -h gives the help message. I'll run it with my data again and see if it works. Changed line 71 (which was also on a different line). So far the config file in the project directory says numthreads=24, so there's one issue (sorta) fixed :)

Thanks so far! I'm amazed by your and @fpusan's responsivity!

EDIT: Same error but now with 24 threads terminating abnormally.

I have 1.9 TB of RAM (at least in this interactive slurm session, it seems I forgot to specify memory, oops). So it's not a matter of me running out, right?

fpusan commented 10 months ago

Hi again. Got caught in a crazy november and lost track of this. Are you still experiencing memory issues?

fpusan commented 8 months ago

Closing due to lack of activity, hope you managed to fix this, otherwise feel free to reopen!