snijderlab / stitch

Template-based assembly of proteomics short reads for de novo antibody sequencing and repertoire profiling
MIT License
22 stars 3 forks source link

Handling of long reads #215

Closed douweschulte closed 1 year ago

douweschulte commented 1 year ago

Long reads pose a problem, we are lucky to have them but not equipped to handle them properly yet. To fix this multiple methods are proposed:

douweschulte commented 1 year ago

A major problem is that by using Enforce Unique any read can only match one segment so V or J while the sequence could stretch over both. Without EnforceUnique the placed reads are a lot worse, so that is not really a good solution. n theory the EnforceUnique could take patches of reads into account, meaning that it would check when enforcing unique if it can place a single read in multiple places as long as the used parts of the sequence are not overlapping (ignore all X matches parts for this?). This could in theory fix this problem very neatly, but could be a lot of work to get to reliably work.

douweschulte commented 1 year ago

The localised Enforce Unique has been added, and has shown to be working with the recombined MA example. It now places the same read at IGHV and IGHJ both uniquely, it does not however place it at IGHC because the overlap is very small and so any reasonable TM cutoff score will prevent it from being placed.