mortazavilab / lapa

Alternative polyadenylation detection from diverse data sources such as 3'-seq, long-read and short-reads.
https://www.biorxiv.org/content/10.1101/2022.11.08.515683v1
23 stars 13 forks source link

Pychopper #17

Closed MustafaElshani closed 1 year ago

MustafaElshani commented 1 year ago

Hi @MuhammedHasan

Would you recommend using pychopper upstream for ONT reads. Just as an additional QC step or do you think it wouldn't make a difference because of the way LAPA looks for PolyA signal?

Mustafa

MuhammedHasan commented 1 year ago

Hi @MustafaElshani,

Short answer: it does not make a big difference and LAPA works either way.

Pychopper trims poly(A)-tails at the end of reads but those tails are usually removed during the base-calling. If your reads have poly(A)-tails at the 3' end, you may slightly increase the precision. You can check if poly(A)-tail is present in your reads:

import matplotlib.pyplot as plt
from lapa.count import PolyaTailCounter

ttc = PolyaTailCounter(bam_file, min_tail_len=1)
ttc.plot_tail_len_dist()

If your reads have poly(A)-tails, you can use poly(A)-tails in LAPA:

lapa --alignment {bams} --fasta {fasta} --annotation {gtf} --chrom_sizes {chrom_sizes} \
    --counting_method tail --min_tail_len 20 --output_dir {output}

--counting_method tail --min_tail_len 20 this arguments only counts reads with 20bp poly(A)-tails.

See supplementary material for more information: https://www.biorxiv.org/content/10.1101/2022.11.08.515683v1.supplementary-material

MustafaElshani commented 1 year ago

Thanks @MuhammedHasan

I have not trimmed the tails at basecalling as I had --trim_strategy none this is precisely to measure poly(A) tail length at signal levels, this still ongoing.

As for the pychopper I thought that got rid of non-full length transcript based on primers and adapter sequences at each end, I lost ~10% of reads which failed the criteria. I don't think it gets rid of poly(A) tails, I could be wrong!

I'm using kit PCS111 which attaches the primer at the end of the PolyA tails so hopping tails will be there. minimap2 is the next challange as it does alot of soft-clipping which may get rid of the poly(A), I have mapped to genome and transcriptome I will check with your script if it finds tails.

Will post here and let you know Mustafa

MustafaElshani commented 1 year ago

Pychopper sorted reads work very well with LAPA and there are plenty of A in the polyA tail