shaorray / TU_filter

A local TU annotation tool for multi-sample comparison
GNU General Public License v3.0
1 stars 1 forks source link

Inquiry about TU_filter and eRNA annotation #3

Open CW20221031 opened 1 year ago

CW20221031 commented 1 year ago

Hi Dr. Shao,

I'm generating the TT-seq analysis to see the influence of acute degradation of a TF on the transcription of mRNAs and eRNAs. I find your transcription unit annotation script - TU_filter very good to use and powerful. Thank you so much for writing such an amazing script!

I'm wondering what's the function of the third page "GenoSTAN - ChIP-seq bam files"? Can I use it to calculate the elongation velocities by uploading Pol II ChIP-seq bam files, or can I use it to define eRNAs if I provide H3K4me1, H3K4me3, and H3K27ac ChIP-seq bam files?

In addition, I found it very hard to annotate eRNAs correctly - I used TU_filter to generate and annotate TUs from STAR-mapped bam files, using mm9 GENCODE.vM1.gtf as reference transcriptome. Then I used all ncRNA TUs with exonOV=FALSE as input, (converted to bed files) to intersectBed with enhancer regions we published before (mainly using non-promoter H3K27ac peaks) to generate eRNAs. But I found in this way, too many 'eRNAs' are lincRNAs, or downstream sense RNAs, or upstream sense/antisense RNAs. Do you have any suggestions on how to define eRNAs properly? Including the intronic eRNAs.

In the original Science paper in 2016, it seems that they excluded all other RNA types (including lincRNA, uaRNA, conRNA, asRNA, etc.) first, and then using the remaining ncRNAs oriented from enhancer regions as eRNAs. But maybe it's too stringent, since intronic eRNAs could be asRNAs, and upstream/downstream eRNAs could be annotated as uaRNA/usRNA/daRNA/dsRNA, etc.

In addition, eRNAs are reported to be divergent, but I didn't see such phenomena in my samples with 40M+ high-quality mapped reads. Do you see it very often in your samples? If yes, can we use this strategy to define eRNAs? (But for the intronic eRNAs, I guess we have to include all the antisense intronic RNAs overlapped with intronic enhancers).

Apologize for the long email, but looking forward to hearing your thoughts and any suggestions!

Thank you!

Best regards, Charles

shaorray commented 1 year ago

Hi Charles,

I'm glad to see your appreciation, this tool was made for comparison between different nascent RNA-seq methods.

GenoSTAN state annotation was hided, since HMM randomly initialzes original states, adding a consern of reproducibility to be less than direct peak calling.

eRNA is a fascinating topic. In my opinion, any transcription unit (TU) pair has an adjacent effect in cis, which exponentially decreases with distance in between. While enhancers might overcome the distance limit, or produce eRNA to act in trans. The regulatory eRNAs and enhancers, then, are under gene-specific circumstances.

Divergency is an innate feature of promoter, it appears in ~20% of my intergenic RNAs (mESC SL cells). In silico annotation would finally require in situ validation, I'm afraid that the bidirectional TUs might be insufficient for regulatory potential, but can indicate their evolutionary age.

More comphrehensive annotation might consider chromatin accessibility, TF binding, histone modification, topological associated domain, transcriptional regulatory network, to assist enhancer activity prediction in gene and cell type specific manner.

Therefore, enhancer transcription is a good proxy for functional genome mining. Maybe you can find more specific criteria of enhancer activity with TF ablation in the chromatin context of interest.

Best regards, Rui

CW20221031 commented 1 year ago

Hi Rui,

Thank you for your kind suggestions and comments! Yes, I agree with you that it's better to use chromatin accessibility, TF binding, histone modification, chromatin architectures, and transcription networks to annotate enhancers.

For now, I used non-promoter H3K27ac peaks to annotate enhancers, and my goal is to see whether I can use TT-seq to detect the enhancer activity change under the acute depletion of a specific TF, since the decrease/increase of enhancer activity may also be reflected by the decrease/increase of eRNA levels.

But I meet a problem that, whether all the nascent RNAs overlapped with annotated enhancers can be defined as eRNAs. I found many nascent transcription units (TUs) overlapped with non-promoter H3K27ac peaks are upstream antisense/sense RNAs, downstream sense/antisense RNAs, intronic antisense RNAs, and lincRNA/antisense of lincRNAs. So I'm confused about what's the essence of eRNAs, and how to define eRNAs properly. For example, if there is a lncRNA already annotated in RefSeq and transcribed from a region with enhancer features, can we define this RNA as an eRNA?

Looking forward to your any suggestions and opinions!

Thank you!

Best regards, Charles

shaorray commented 1 year ago

Hi Charles,

Your question is clearly addressed by TF binding sites of interest, as opposed to the vague definition of enhancers based on H3K27ac.

While my runs have shown some non-coding TUs without H3K27ac signal or even ATAC signal, in your situation, the TUs that are adjacent/embedded to the gene and have H3K27ac are as valid as the regular distal or adjacent eRNAs (or cis-regulatory element) upstream of the gene promoter (as in textbooks).

Ultimately, the key question is whether a regulatory relationship, ideally a causal relationship, can be established between the uncertain ncRNAs and the gene of interest. TF binding sites combined with ATAC have the potential to differentiate an eRNA in your knock-out assay.

Best regards, Rui