voutcn / megahit

Ultra-fast and memory-efficient (meta-)genome assembler
http://www.ncbi.nlm.nih.gov/pubmed/25609793
GNU General Public License v3.0
580 stars 134 forks source link

Fragmented contigs #184

Open Sebastien-Raguideau opened 5 years ago

Sebastien-Raguideau commented 5 years ago

Hello, I am looking at the contig graph produced using contig2fastg. I saw sequential contigs with no branching. Maybe I am missing something but if there is no branching why are those contigs not merged into a single one? Do I need to process the assembly and look for those? Here is a bandade close up. Also, seems that the coverage for both is 1 is it relevant? example Best, Seb

voutcn commented 5 years ago

Interesting. Could you show me the two sequences and the k used?

Sebastien-Raguideau commented 5 years ago

Hello, So I did a grep on the .gfa to extract sequence and links, that's this file. If you are interested I have other examples. Also, what I said previously about coverage being 1 was a mistake on my part, it's ~5. k=141 Fragmented_contigs.txt Best, Seb

voutcn commented 5 years ago

The two sequences do not share any common k-mers, and I doubt why they are connected. Did you use megahit_toolkit's fasta2fastg to generate the graph?

Sebastien-Raguideau commented 5 years ago

Hi, They do share a 141 long kmer but one of them need to be reverse complemented. I didn't realise it when first submitted. It's because I'm using the .gfa format which only store sequence in one orientation so the files are less heavy. So, yes I am usually using megahit_toolkit contig2fastg to generate the graph. I then use one of Bandage function to translate the fastg file in .gfa . I need then script for renaming purpose, as translation as megahit_toolkit contig2fastg change the name of the contigs. Though it keeps things in the same order, so it is fine. Best, Seb