Closed WangJingwen21 closed 1 year ago
As you all are (to my knowledge) the first to develop PRO-seq + TimeLapse-seq, it's challenging for me to make detailed recommendations as to how to process and analyze your unique data. The suggestions that follow are mostly generic suggestions drawn from literature on analyzing PRO-seq data.
To determine the pausing zone, there seem to be two strategies that people have used:
The Guertin lab also has some ideas on their website about how to call TSSs with a tool they developed, described here.
No matter how you define the pause site, bam2bakR can be used to count mutations in your sequencing reads. For example you could provide a GTF file that has information about the location of pause sites and gene bodies for each gene. The tricky thing is that bam2bakR is currently hard-coded to expect the GTF type
column to have "exon" and "transcript". In the cB file, the GF column is the associated gene_id for all reads that mapped to any part of a transcript, the EF column is the gene_id for all reads that mapped to any part of an exon, and the XF column is the gene_id for all reads that mapped exclusively to exonic regions. You could thus modify your pause-site and gene-body GTF such that the annotated pause sites are labeled as type "transcript" and the gene bodies are labeled as type "exon".
If you did this, the cB file from bam2bakR would thus have:
Thank you so much for your advice! I'll talk to my team.
Dear Isaac, I'm sorry to bother you again. I don't know much about transcription pausing. Now we have employed pro-seq-Timelapse-seq in DLD1 cell line, we want to calculate dynamic parameter as you did in your paper. We understand that TSScall methods used in your pipeline are not fitful for pro-seq data, so, we should clean our bam file to get TSS specific reads before running ban2bakR. We want to define PolII pausing zone for each gene. However, we have no clue how to establish the process. Could you gave us some advice.