jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis
GNU General Public License v3.0
380 stars 80 forks source link

Error in Seq merge with Minimus2 #392

Closed addyblanch closed 2 years ago

addyblanch commented 2 years ago

Hi, I seem to be running into a issue with Minimus2. I'm trying to use the Seq merge method but keep getting this error:

Starting assembly merge
  MERGE 1, 01.waste_water.C88_DNA.fasta and 01.waste_water.C93_DNA.fasta (dist 1.00) -> merged_1.waste_water.fasta
  Running cd-hit-est
  Transforming to afg format
  Merging with minimus2
Error running command:    /home/srganalysis/Software/miniconda3/envs/SqueezeMeta/SqueezeMeta/bin/AMOS/minimus2_mod /media/srganalysis/Data/Adam/Tanya/waste_water/temp/mergedassemblies.waste_water -D OVERLAP=100 -D MINID=95 -D THREADS=72 > /dev/null 2>&1 at /home/srganalysis/Software/miniconda3/envs/SqueezeMeta/SqueezeMeta/scripts/01.merge_sequential.pl line 103.

I had a read of other issues on the Git, confirmed the kmer is set to 12 and i'm also not sure its memory related. I monitored RAM usage and it never got close to the max (250GB) and only ever used one thread. Looking at the Temp directory it seems never complete the creation of the bnk.2 files or arg.2 file.

!!! 2021-12-01 10:45:48  Doing step 15
!!! 2021-12-01 10:45:48  Running: /home/srganalysis/Software/miniconda3/envs/SqueezeMeta/SqueezeMeta/bin/AMOS/dumpreads /media/srganalysis/Data/Adam/Tanya/waste_water/temp/mergedassemblies.waste_water.bnk.2 -m 0 > /media/srganalysis/Data/Adam/Tanya/waste_water/temp/mergedassemblies.waste_water.qry.seq
FATAL: Could not open bank file, /media/srganalysis/Data/Adam/Tanya/waste_water/temp/mergedassemblies.waste_water.bnk.2/RED.ifo, No such file or directory
  there has been a fatal error, abort
iid: 0 eid: 
Objects seen: 0
Objects written: 0
!!! 2021-12-01 10:45:48  Command: /home/srganalysis/Software/miniconda3/envs/SqueezeMeta/SqueezeMeta/bin/AMOS/dumpreads /media/srganalysis/Data/Adam/Tanya/waste_water/temp/mergedassemblies.waste_water.bnk.2 -m 0 > /media/srganalysis/Data/Adam/Tanya/waste_water/temp/mergedassemblies.waste_water.qry.seq exited with status: 1
!!! END - Elapsed time: 0d 0h 1m 40s

My syslog file is a bit of a mess as i've been trying to troubleshoot. But copied the main bits over

syslog.txt . I'm on Ubuntu 20.04.03 LTS

Adam

jtamames commented 2 years ago

Hello Adam! Indeed there seems to be a problem with the second file to merge, that produces an empty .afg and all crashes because of that. Could you please paste some lines of the mergedassemblies.waste_water.afg file that is in the temp directory? Let's see if we can find out where the issue is. Best, Javier

addyblanch commented 2 years ago

Hi Javier,

A quick head gives:

$ head mergedassemblies.waste_water.afg
{UNV
eid:afg
com:
generated by /home/srganalysis/Software/miniconda3/envs/SqueezeMeta/SqueezeMeta/bin/AMOS/toAmos
Wed Dec  1 12:51:28 2021
.
}
{LIB
iid:2608194
eid:unmated

and tail gives:

$ tail -100 mergedassemblies.waste_water.afg
{RED
iid:1304093
eid:k141_593387_C93_DNA
seq:
GATGGTGAAGACCATATCTGTATTGAACGTAAAGCCAAGACGGAACTTGGTCGTTCATTG
TGTTTGGATAACGGCGACTACCCATTCACTTCTCGTTACTTCGGTAAGTTCAATAGCTTG
GAAGGCTATTGGCTGTATGTAACGTGTCCTGATTTCAATCGTCGTGAACACTTGCGTTCT
GTATCTGGTGTAGCTCTGCAAGGTATTCGTCGTGACGCCATGAGTACCATGGTTAAGACT
AGCAACCTTCGTTCCATCATGATGCAGGCTATGTATGACCGCATCGTACAGAACGAGAGG
TTGCGTGAGCTGTTCATCGAGAACACTTTGCCTTTCGATATGTACCACATCGACTATGAA
TCAGGTCGACGCATGCGTGCAGCGCACATGATGCACATGACCTATGTGTGCTACCATATG
CTGTTCGAGTGCCTGTCTGAAGGCGGTGATGTCGAAAGTGCTATCCAGCATGCACGCAAC
GACAAAGAAGGCGACATGTATGCGAAATTGTTGCCTGCTTATCAAGTCGAATGGTTGAAG
AACCAAGAGCGCATCCGTGCTGAGAAAGCAGAGCGTAAG
.
qlt:
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
.
frg:3309900
clr:0,579
}
{RED
iid:1304094
eid:k141_593388_C93_DNA
seq:
GGGCTGGCCACAGCGTCTTTGCTGATGATGTGCATTGGGCTGACATCATGGATGTACCTG
CACCACCGCTGGCCTGAAATTGGCAGCCACGCTGGCGTAACACCCGGCGGCTGAGCCCCA
CCTTCACCCCTTCATGAAGAAGCAAGCAGACGCTGGTCAGCGTCTGCGCTCCAGGCCCTG
CTGGTGGTAGTAACGCTTCTTTTCCTCGTACATGGCCTGGTCGGCCTTGTGCAGGGCAGA
CTCCACTGCCTCACCGGATTGACAGCACGCGATCCCCATGGCCAATCCCAGCGGCTGCCC
TGAATAGAACTGGTTGTTCAACTCCACCAGCGACT
.
qlt:
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
.
frg:3794506
clr:0,335
}
{RED
iid:1304095
eid:k141_593389_C93_DNA
seq:
CCGCGCCTATCATGGTGGACGGCAACACGGTGCATGATTGTCGCCACGTTCATATTTACG
TGCTGCACGGGCGGCATGTCCACGTCGCTAACAACATCGTTTACGCCTCCGGCGGTCATA
CGCTATTTGACGACTTCCCCGGTGAGCCTATCGGCGGTAACGGCTACGTCATTCGCGACG
AGCGCAAAGACATCTACACGGCCTCGCAGTACATATCGTATTACAACAATCTGGCCGTCA
ACACTTCCAAGCTGATTTACCTGAGTGGCAACGACCGCAACTATATCGGCATGTACGTGG
GCTTTAATACGTTCGTCGGCGGGCCGAATACGCAAAAGGCCGGCGTTACCATATCGGGCG
CGGAAGGCCGCGGCATTAT
.
qlt:
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNN
.
frg:2944681
clr:0,379
}
{RED
iid:1304096
eid:k141_593390_C93_DNA
seq:
ATCGCGGCGAAGGGTTTGACCCGCGCACGATTTCGGAACGTGTAGAAACCGATTGGGGCG
CGCTGCCTGAACAGGTTCAGCTTATCACCATGACAACCGACGTGCAGGGCGGCGAATCGG
GGCAGTCGCGCCTCGAAACCATGTGGCTCGGCTGGGGTGAAGGTGCGGAAGGCTGGCTGC
TGGATTACGACGTGACGTATGGTGAACTCGACCAGCCCGAAATCTGGCAGCGGCATGACG
AACTGAGGGCGCGCACGTTTGAAACCTACGACGGGCGCAAGGTCGGCGCGACGGTTGCCT
TCATTGACCGCGGCTTCGAGGCTCAAAAAGTGCTGGCCCATACCATGCGCCGGGCGAAGT
ACAGGACGTTTGCAATCAAGGGTGTCGAGGGAACGCCGGC
.
qlt:
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
.
frg:3139553
clr:0,400
}
jtamames commented 2 years ago

I think I've got it.. it's a contig naming issue similar to #277 but no exactly the same thing. Could you please paste the lines 200-205 of your 01.merge_sequential.pl script? (find it in the scripts directory of SqueezeMeta). Also please paste the first lines of the syslog file to see the command you used to run SqueezeMeta. Best, J

addyblanch commented 2 years ago

Thanks for the quick replies Javier! Just to confirm I did use Megahit though not SPAdes.

        if($m[0]=~/^Merged/) { $ts=$m[0]; }
        else {
            shift @m; shift @m;
             $ts=join("_",@m);
             $ts=$m[$#m];   #-- Last field of contig name contains the sample ID. Previous solution worked for megahit but not for SPAdes. This one can get trouble if sample names contains "_"
            }

and


SqueezeMeta v1.4.0, May 2021 - (c) J. Tamames, F. Puente-Sánchez CNB-CSIC, Madrid, SPAIN

Please cite: Tamames & Puente-Sanchez, Frontiers in Microbiology 10.3389 (2019). doi: https://doi.org/10.3389/fmicb.2018.03349

Run started Fri Nov 12 15:04:45 2021 in seqmerge mode
Command: /home/srganalysis/Software/miniconda3/envs/SqueezeMeta/bin/SqueezeMeta.pl -m seqmerge -s samples.txt -f Data/ -p waste_water -map bwa -t 48
Project: waste_water
Map file: samples.txt
Fastq directory: /media/srganalysis/Data/Adam/Tanya/Data
Options: threads=48; contiglen=200; assembler=megahit; mapper=bwa; COGS; KEGG; PFAM;
[0 seconds]: STEP0 -> SqueezeMeta.pl

A

jtamames commented 2 years ago

Yep, what I expected. It is a very silly thing, but the problem is in the name of your samples. You are using an underscore character "_", which we use for separating fields in the name of the contigs. I think SqueezeMeta is smart enough as to consider that, but we have a renaming step to be able to use minumus2 that messes things up. Long story short, I think it will work if you comment the line 205, the one reading $ts=$m[$#m]; in 01.merge_sequential.pl. Then just run it using the project name, that is: 01.merge_sequential.pl waste_water I think the rest of the pipeline will behave well, but for next runs it will be safer if you avoid putting "_" in the name of the samples. Best, Javier

addyblanch commented 2 years ago

Amazing, I'll give it a go and report back. Thank you.

addyblanch commented 2 years ago

All looks good! Running past the section where it died! Thanks Javier!

jtamames commented 2 years ago

Great! When this step is completed, restart it via restart.pl <project> -step 2

Best, J