nanoporetech / modkit

A bioinformatics tool for working with modified bases
https://nanoporetech.com/
Other
127 stars 7 forks source link

General range of --edge-filter for kb level sequencing #40

Closed Yijun-Tian closed 1 year ago

Yijun-Tian commented 1 year ago

Hi modkit maker, For the modkit summary module, do you recommend a rough number for the --edge-filter option when processing common genomic sequencing reads? Is there any literature discussing why or when these edge-trimming is needed? Thank you,

ArtRand commented 1 year ago

Hello @Yijun-Tian,

If you're concerned that there are edge effects in your data, we recommend using modkit extract to hone in on the details of that effect. However, in typical runs, however 10bp is likely quite safe. This will remove any false canonical calls due to adapter ligation and most or all edge effects in the signal. For very long reads you may want to increase the --edge-filter up to as much as 30bp (which was used in megalodon previously).

Yijun-Tian commented 1 year ago

Thanks for the clarification. Does this rule also apply to the adaptive sampling reads? As the adaptive sampling reads undergoes frequent rejecting or reversing processes, should I use longer --edge-filter in this case?

ArtRand commented 1 year ago

@Yijun-Tian For the accepted reads during Adaptive Sampling, I would not expect that you'd need to treat them any differently that typical reads. For the rejected reads I'd recommend looking at the output of modkit extract to decide how best to handle them (if you're going to use them at all).