Closed MustafaElshani closed 1 year ago
Hi @MustafaElshani,
Short answer: it does not make a big difference and LAPA works either way.
Pychopper trims poly(A)-tails at the end of reads but those tails are usually removed during the base-calling. If your reads have poly(A)-tails at the 3' end, you may slightly increase the precision. You can check if poly(A)-tail is present in your reads:
import matplotlib.pyplot as plt
from lapa.count import PolyaTailCounter
ttc = PolyaTailCounter(bam_file, min_tail_len=1)
ttc.plot_tail_len_dist()
If your reads have poly(A)-tails, you can use poly(A)-tails in LAPA:
lapa --alignment {bams} --fasta {fasta} --annotation {gtf} --chrom_sizes {chrom_sizes} \
--counting_method tail --min_tail_len 20 --output_dir {output}
--counting_method tail --min_tail_len 20
this arguments only counts reads with 20bp poly(A)-tails.
See supplementary material for more information: https://www.biorxiv.org/content/10.1101/2022.11.08.515683v1.supplementary-material
Thanks @MuhammedHasan
I have not trimmed the tails at basecalling as I had --trim_strategy none
this is precisely to measure poly(A) tail length at signal levels, this still ongoing.
As for the pychopper I thought that got rid of non-full length transcript based on primers and adapter sequences at each end, I lost ~10% of reads which failed the criteria. I don't think it gets rid of poly(A) tails, I could be wrong!
I'm using kit PCS111 which attaches the primer at the end of the PolyA tails so hopping tails will be there. minimap2 is the next challange as it does alot of soft-clipping which may get rid of the poly(A), I have mapped to genome and transcriptome I will check with your script if it finds tails.
Will post here and let you know Mustafa
Pychopper sorted reads work very well with LAPA and there are plenty of A in the polyA tail
Hi @MuhammedHasan
Would you recommend using pychopper upstream for ONT reads. Just as an additional QC step or do you think it wouldn't make a difference because of the way LAPA looks for PolyA signal?
Mustafa