nanoporetech / pychopper

A tool to identify, orient, trim and rescue full length cDNA reads
Other
80 stars 22 forks source link

5' adapter #11

Closed HegedusB closed 5 years ago

HegedusB commented 5 years ago

Is it possible to add additional adapters to the adapter list (cdna_barcodes.fas)? More precisely, I would like to use Lexogen teloprime kit cap adapter to identify the 5’ end of the sequence. I tried to add to the list but the program crashed with this new adapter list. Grateful for any help!

bsipos commented 5 years ago

Just simply replace the sequence in the fasta entry "cDNA|1" with the teloprime cap adapter sequence.

HegedusB commented 5 years ago

Thank for the advice. Now the program is running but do not do what I think it should do because it filters out the seemingly correct forward reads. Did I make some mistakes?

bsipos commented 5 years ago

I do not know much about TeloPrime and your particular library construction protocol, but based on the information in the kit manual the primers should be:

>cDNA|1
TGGATTGATATGTAATACGACTCACTATAG
>cDNA|2
AAAAAAAAAAAAAAAAAACGCCTGAGA

A possible issue with this is that the RT primer only has 9 non-homopolymeric bases. If you added/ligated some extra sequence unique this primer then you could add those. And of course there can be false negatives, so some of the legitimate forward reads (as determined by alignment for example) might be discarded. They should not be in majority though.

Let me know how it goes! Botond

HegedusB commented 5 years ago

Thanks a lot! It seems, it is working now. I made a big mistake with the cDNA|2 primer. I left it unchanged. Can you give me an advice with the “cdna_classifier.py” program –g argument. I saw that lot of transcript was not recognized as full length because there were some mismatches in the adapter sequence. Should I optimize these argument or just leave unchanged?

bsipos commented 5 years ago

First, you can enable the "heuristic mode" using -x. Then you could also try decreasing stringency to lower values using -l. Beware that decreasing -l will increase the number of false positives.