nanoporetech / pychopper

A tool to identify, orient, trim and rescue full length cDNA reads
Other
80 stars 22 forks source link

Possible to include N to allow for barcodes/UMIs within PCR handles? #13

Closed callumparr closed 5 years ago

callumparr commented 5 years ago

Hi,

I was passed a library generated with the following linkers (tail includes potential polyA)

head CTACACTCGTCGGCAGCGTCNNNNNNNNNNNNNNNNNNNNNNNNNGTGGT ATCAACGCAGAGTAC tail aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa CGTAGNNNNNNNNNNNNNNNCCGAGCCCACGAGACATCTC

As the unique sequences either side of stretch N are quite short so whitchever I provide for cDNA_barcodes.fas, it is difficult to detect full length and orientate reads.

I wondered if possible to include this with potential gap to allow for N. I think this would increase classifieds.

bsipos commented 5 years ago

Currently Ns are not supported, but if you are willing to wait a bit I can add this feature next week.

Botond

callumparr commented 5 years ago

Sure, that would be great.

Thanks for the great tools.

bsipos commented 5 years ago

I have made some changes to pychopper and now you can have Ns in your primers. Install the package from github master and let me know if it makes a difference for you.

Botond

callumparr commented 5 years ago

Thank you, classified jumped from 70 to 75%

bsipos commented 5 years ago

Well, 70% is already good. But I am glad to see improvement.