ruanjue / wtdbg2

Redbean: A fuzzy Bruijn graph approach to long noisy reads assembly
GNU General Public License v3.0
513 stars 94 forks source link

find unused reads #110

Closed LipengKang closed 5 years ago

LipengKang commented 5 years ago

Dear jue! I am find some ways to overcome drawbacks that wtdbg2 assemblies sometimes cover less reference genomes. Can I find all unused and clipped reads when assemble? In dbg.clps file, COL3 keep_offset is the start position of retain sequence and its length is COL4 keep_length? By the way, a node means a kmer bin ?

Sorry for too many simple question. Thank you, lipeng

ruanjue commented 5 years ago

Thanks in advance!

Format of dbg.clps
read_tag read_len clip_off clip_len

To find unused reads, you need to parse the file dbg.ctg.lay to see which part of one read are used in contigs. The format can be found in https://github.com/ruanjue/wtdbg2/blob/master/README-ori.md .

Best, Jue

LipengKang commented 5 years ago

Thank you, I got it.