rrwick / Unicycler

hybrid assembly pipeline for bacterial genomes
GNU General Public License v3.0
566 stars 131 forks source link

contig broken with unicycler and Spades #148

Open fetyj opened 6 years ago

fetyj commented 6 years ago

Hi everyone, I need a little help to deal with my assembly. I usually use Spades but since I found that Unicycler could improve my assembly, I always perform the two separately and Unicycler get the job done perfectly (short reads and hybrid). I have one issue on a particular genome, I know that this genome have some tandem repeat (we have VNTR typing on it) and using Spades alone, I could get the region where the repeat is but with Unicycler, the region is missing. Is there any option that I can use to solve this? Any suggestion are welcome Thanks for your help,

Fety

Here are the log file spades.log

unicycler.log

raw937 commented 6 years ago

It looks like the assembly finished. Spades is going to have trouble with a VNTR. Well any assembler for that matter. The regular spades assembly has the repeat? You can merge the assembles with minimus2. From spades and unicycler. Let me know if you need help with minimus2?

fetyj commented 5 years ago

Hi, Thanks for the reply. This issue is very tricky because assembly in SPAdes 3.11 has the repeat but in 3.13 version, during scaffolding it generates N's in the region....Could you help with minimus2? Which command is use for that purpose? Fety

raw937 commented 5 years ago

Here is a quick shell script:

example shell script minimus2

!/bin/bash

rename file

mv contigs.fasta contigs.seq

Format to AMOS

toAmos -s contigs.seq -o contigs.afg

merge minimus2

minimus2 contigs REFCOUNT=0 MINID=99.9 OVERLAP=200 -D MAXTRIM=1000 -D WIGGLE=15 -D CONSERR=0.01

There are lots of parameters in miniID etc. You will have to play with the parameters a bit

fetyj commented 5 years ago

Thanks for the tips. Still got the issue, the output get rid of N's sequences splitting the contig... I'm still waiting for a reply from SPAdes team. Fety