ruanjue / wtdbg2

Redbean: A fuzzy Bruijn graph approach to long noisy reads assembly
GNU General Public License v3.0
513 stars 94 forks source link

How does wtdbg2 deal with chimeric/siamaeric reads? #232

Closed A-J-F-Mackintosh closed 3 years ago

A-J-F-Mackintosh commented 3 years ago

Hi,

I am curious about how wtdbg2 handles Pacbio CLR reads that are chimeric (two molecules joined together) or siamaeric (multiple subreads of the same molecule contained within a single read).

yacrd (https://github.com/natir/yacrd) uses all-v-all read comparisons to identify and split chimeric/siamaeric reads. In the paper describing yacrd, the authors write that 'wtdbg2 contains steps that have a similar effect as yacrd`.

Are these steps outlined anywhere? I could not find them in the wtdbg2 paper, but perhaps setting edge-coverage to 3 is enough to mean that any chimeric edges in the graph are discarded?

Best,

Alex

ruanjue commented 3 years ago

Thanks for the information. Actually, wtdbg just reused the read-clippling codes from SMARTdenovo. Recently, I had described it in the paragraph "Trimming" of Liu, Hailin, et al. “SMARTdenovo: A de Novo Assembler Using Long Noisy Reads.” Gigabyte, vol. 2021, 2021, pp. 1–9.. Hope it helps.

A-J-F-Mackintosh commented 3 years ago

The Trimming section of the SMARTdenovo paper answered my questions.

Many thanks,

Alex