t-neumann / slamdunk

Streamlining SLAM-seq analysis with ultra-high sensitivity
GNU Affero General Public License v3.0
37 stars 22 forks source link

3'UTR length #121

Open Choopanian-Peyman opened 1 year ago

Choopanian-Peyman commented 1 year ago

Dear Tobias,

When I checked the counts file, I saw that the length of a UTR reported by slamseq is different with its length reported on UCSC. For example, see the A1BG in my count file and UCSC;

My count file: Chromosome Start End Name Length Strand ConversionRate 1 19 58346849 58347021 A1BG 172 - 0.038930397

UCSC (http://genome.cse.ucsc.edu/cgi-bin/hgGene?org=Human&hgg_chrom=none&hgg_type=knownGene&hgg_gene=uc002qsd.5) Region Fold Energy Bases Energy/Base 3' UTR -638.20 1839 -0.347

I would appreciate good advice. Best, Peyman

t-neumann commented 1 year ago

Hi - 3' end sequencing only amplifies the last ~250bp of each transcript. So our annotations does a couple of processing steps to tailor the annotation to 3' end sequencing. The 3' UTRs are actually those counting windows.

Choopanian-Peyman commented 1 year ago

Tnx for your reply,

But there are UTRs in my count file whose length is more than 250. Even the length of some of them is equal to what is reported in UCSC.

Ex; Chromosome Start End Name Length Strand ConversionRate 1 70008 71585 OR4F5 1577 + 0.0

Best

Choopanian-Peyman commented 1 year ago

Hi Tobias,

Is there any annotation file including length of 3'UTRs? Apparently I can not use the length column in the count file, since it has several lengths with the same name of UTR.

t-neumann commented 1 year ago

Hi @Choopanian-Peyman - which annotation file do you use? It could be there are multiple counting windows (one for each isoform) reported thats why you will have several entries per gene.