Open rhpvorderman opened 6 months ago
I did some thinking and research. The best way to approach this is as follows:
3 is quite challenging, but by following the steps, cutadapt will already be useful for nanopore with chimeric reads at step 1, without requiring extra code.
Currently I am researching ONT possibilities with cutadapt, and it seems that the most basic functionality can be achieved. Unfortunately after the adapters have been adequately cut, sequali still finds adapter sequences.
These are most likely due to chimeric reads, where reads are joined by adapter sequences. These reads should be split. With the newest chemistry the amount of chimeric reads is estimated at 10% (previously around 2%). These chimeric reads are not always split by the sequence provider and historic data may also contain the 2% reads because splitting was not available back then.
Since cutadapt already has a decent alignment algorithm that can detect sequences anywhere in the read, it should be possible to write a routine that detects chimeric reads.
The hard part I guess will be the actual splitting, were one read becomes two or more reads and feed that back into the pipeline. I can imagine that consideration wasn't a thing when cutadapt was designed.