Closed lam-c closed 6 months ago
Hello! Ok, I found a bug in the script 01.run_all_assemblies.pl causing this error. Please edit the script /media/cy/micromamba/envs/squeeze_meta/SqueezeMeta/scripts/01.run_all_assemblies.pl . In line 70, where it reads:
if($extassemblies{$asamples}) {
change to:
if($extassemblies{$asamples}) {
$extassembly=$extassemblies{$asamples};
That should fix the error. Tell me otherwise. Best, J
Thank you for quick response. Unfortunately, it prompted the same error and same syslog after I modified the perl scripts. I wonder if I should add 'noassembly' string into the samplefile, or whether something go wrong with reading samples section (line 40-52). Looks like that raw_fastq not successfully loaded either (the raw_fastq folder is empty).
Run started Fri Oct 27 10:20:09 2023 in sequential mode
SqueezeMeta v1.6.2, March 2023 - (c) J. Tamames, F. Puente-Sánchez CNB-CSIC, Madrid, SPAIN
Please cite: Tamames & Puente-Sanchez, Frontiers in Microbiology 10.3389 (2019). doi: https://doi.org/10.3389/fmicb.2018.03349
Run started for squeeze_meta, Fri Oct 27 10:20:09 2023
Project: S126
Map file: sample_file.txt
Fastq directory: squeeze_meta/qc_shortreads
Command: squeeze_meta/SqueezeMeta/scripts/SqueezeMeta.pl -m sequential -s sample_file.txt -f qc_shortreads --nobins --doublepass --euk -t 50
[0 seconds]: STEP0 -> SqueezeMeta.pl
COGS; KEGG; PFAM; EUKNOFILTER; DOUBLEPASS;
[0 seconds]: STEP1 -> 01.run_all_assemblies.pl (megahit)
Stopping in STEP1 -> 01.run_all_assemblies.pl. Program finished abnormally
_____________
System information:
_____________
Tree for the project:
ic #202212191242 SMP Mon Dec 19 13:25:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
[4.0K Oct 27 10:20] squeeze_meta/S126
├── [ 31 Oct 27 10:20] creator.txt
├── [4.0K Oct 27 10:20] data
│ ├── [ 304 Oct 27 10:20] 00.S126.samples
│ └── [4.0K Oct 27 10:20] raw_fastq
├── [4.0K Oct 27 10:20] ext_tables
├── [4.0K Oct 27 10:20] intermediate
│ └── [4.0K Oct 27 10:20] binners
├── [ 117 Oct 27 10:20] methods.txt
├── [3.1K Oct 27 10:20] parameters.pl
├── [ 37 Oct 27 10:20] progress
├── [4.0K Oct 27 10:20] results
├── [8.3K Oct 27 10:20] SqueezeMeta_conf.pl
├── [ 998 Oct 27 10:20] syslog
└── [4.0K Oct 27 10:20] temp
8 directories, 7 files
Ok, can reproduce the bug and found a solution for it. It only happens when you specify extaseembly both in pair1 and pair2 of the samples file. When putting that option just in pair1, it works fine. So, first solution would be to change the samples file, removing all "extassembly" from pair2 lines. But of course that is just a patch. To solve the issue, change line 47 in 01.run_all_assemblies.pl, from:
if($mapreq=~/extassembly\=(.*)/) { $extassemblies{$sample}=$1; } #-- Store external assemblies if specified in the samples file
to
if($mapreq=~/extassembly\=(.*)/) { $extassemblies{$sample}=$1; $datasamples{$sample}{$iden}{$file}=1;} #-- Store external assemblies if specified in the samples file
That should do it. Best, J
Sorry... maybe that fix can create other problems. Do this instead:
Change line 48, from:
elsif(($mode eq "sequential") && ($sample eq $projectname)) { $datasamples{$sample}{$iden}{$file}=1; }
to
if(($mode eq "sequential") && ($sample eq $projectname)) { $datasamples{$sample}{$iden}{$file}=1; }
It works! Many thanks for your timely help.
Sorry... maybe that fix can create other problems. Do this instead: Change line 48, from:
elsif(($mode eq "sequential") && ($sample eq $projectname)) { $datasamples{$sample}{$iden}{$file}=1; }
toif(($mode eq "sequential") && ($sample eq $projectname)) { $datasamples{$sample}{$iden}{$file}=1; }
Hi, there! I deploy the newest pipeline on a machine with better computational resource, and run the metagenome data in seqmerge
mode (fresh start from the qc reads). However, I get trap in merging assemblies. (syslog and merge assemblies log are shown below)
I search the previous issues and found a similar one [#565 ], which was posted last year. I wonder if there any solution to deal with it now? I'm not sure whether the machine can handle coassembly. Or, do you have any suggestion for it?
It seems like you only have 3 samples, is that right? Seqmerge should be able to deal with that, but if it is still stuck you can try a coassembly. What resources do you have exactly?
It seems like you only have 3 samples, is that right? Seqmerge should be able to deal with that, but if it is still stuck you can try a coassembly. What resources do you have exactly?
Thank you for your patience. I have 79 samples in total, and feed them into SqueezeMeta in 23 groups (each group contains 3-6 samples, in order to retrieve MAGs from each group), running in parallel.
I turned to co-assembly mode after requesting for more computational resources (shown below) and limited only 15 groups (smaller fastq) running at the same time. Luckily the peak RAM was less than 2T (not sure about how to estimate the required RAM).
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 384
free -m
total used free shared buff/cache available
Mem: 6190948 273186 208143 4105 5709618 5885262
However, it's weird that 3 samples (~22GB for par(1|2).fastq.gz
respectively, in folder output/data/raw_fastq
) not accepted in seqmerge
mode (it did work on the test data downloaded along with databases). Is there any solution that we can try to fix it, considering saving cost of time and computational resources in the future?
Tbf the individual assemblies are quite big, 5x as bigger than those of the test dataset. I very much suspect that minimus2 scales quadratically meaning that exec time and maybe memory usage would be maybe 25x higher. You are better off trying coassemblies, I suspect. Otherwise you can use the sequential mode and combine the results for the different samples later.
OK, I will take that advice. Thanks for your time ~
Closing due to lack of activity, feel free to reopen
Thank you for bringing us this amazing tool! I tried this pipeline on a multi-domain metagenome dateset (external assemblied), but I was stuck on the first step, and couldn't figure out what was going wrong. Look forward to your reply. I hope the information below would be useful.
execution command:
$SqueezeMetaPath/scripts/SqueezeMeta.pl -m sequential -s sample_file.txt -f qc_shortreads --nobins --doublepass --euk -t 50
stdout:
The external assemblies were resulted from megahit, and renamed (description removed from header)
sample_file.txt syslog.zip