jts / sga

de novo sequence assembler using string graphs
http://genome.cshlp.org/content/22/3/549
237 stars 82 forks source link

substring remaining after rmdup #162

Closed bvalot closed 4 years ago

bvalot commented 4 years ago

Hi,

I'm actually discover sga software and I try to assembly bacterial genome with 2x150bp illumina library. I run the following command : sga preprocess -p 1 -q 20 -m 50 DHS01_R1_100x.fastq.gz DHS01_R2_100x.fastq.gz > DHS01_trim_100x_preprocess.fastq sga index -t 12 DHS01_trim_100x_preprocess.fastq sga correct -t 12 -m 60 DHS01_trim_100x_preprocess.fastq sga rmdup -t 12 DHS01_trim_100x_preprocess.fastq sga overlap -t 12 -m 60 DHS01_trim_100x_preprocess.fastq

And get the following error, although I performed the rmdup step. I have miss something? [sga::overlap] starting parallel-mode overlap computation with 12 threads Error: substring read found during overlap computation. Please run sga rmdup before sga overlap

I make a mistake?

jts commented 4 years ago

Hi @bvalot,

The problem here is that you did not re-index your reads after error correction. See here for the suggested bacterial workflow: https://github.com/jts/sga/blob/master/src/examples/sga-ecoli-miseq.sh

Please note SGA is deprecated and you should use a different assembler like SPAdes instead.

Jared

bvalot commented 4 years ago

Thanks for your response.

Why it is deprecated? I actually used Spades, but want to evaluated other possible tools in comparison.

jts commented 4 years ago

Other tools outperform it and I don't have enough time to maintain it anymore