Handling of long reads - Githubissues

douweschulte commented 1 year ago

Long reads pose a problem, we are lucky to have them but not equipped to handle them properly yet. To fix this multiple methods are proposed:

use local scoring if the read is bigger than the template.
place a huge number of Xs to handle an arbitrary length read placement
score overhang of reads after the template as 0 instead of negative The last methods seems likely to fix all of the problems, so try it out.

douweschulte commented 1 year ago

A major problem is that by using Enforce Unique any read can only match one segment so V or J while the sequence could stretch over both. Without EnforceUnique the placed reads are a lot worse, so that is not really a good solution. n theory the EnforceUnique could take patches of reads into account, meaning that it would check when enforcing unique if it can place a single read in multiple places as long as the used parts of the sequence are not overlapping (ignore all X matches parts for this?). This could in theory fix this problem very neatly, but could be a lot of work to get to reliably work.

douweschulte commented 1 year ago

The localised Enforce Unique has been added, and has shown to be working with the recombined MA example. It now places the same read at IGHV and IGHJ both uniquely, it does not however place it at IGHC because the overlap is very small and so any reasonable TM cutoff score will prevent it from being placed.

snijderlab / stitch

Handling of long reads #215