shaorray / TU_filter

A local TU annotation tool for multi-sample comparison
GNU General Public License v3.0
1 stars 1 forks source link

some questions #2

Open BenxiaHu opened 1 year ago

BenxiaHu commented 1 year ago

Hello Rui, I still have several questions about the codes: 1: # strandFlipped from function "TU_prep"; if any experiment prepared first-in-pair read on reverse strand, then "strandFlipped" is TRUE (strandFlipped <- FALSE). Would you like to explain this? 2: mergeFeatures <<- input$merge_features # "protein_coding" or additional multi-exon transcript type, e.g. lincRNA. Based on your point, what option do I select? "protein_coding"? 3: mergeMethod <<- input$mergeMethod # by "exon" or "tu". Do you think "tu" is much better than "exon"? it seems that "tu" would generate much longer transcripts than "exon".

Best,

shaorray commented 1 year ago

Hi Benxia,

I think there will be no problem to start a run directly after initializing the input parameters from line 574.

And the functions were wrapped from the original R code, and pretty much the same if you set the mergeMethod as "tu".

Best, Rui

BenxiaHu commented 1 year ago

Hi Benxia,

I think there will be no problem to start a run directly after initializing the input parameters from line 574.

And the functions were wrapped from the original R code, and pretty much the same if you set the mergeMethod as "tu".

Best, Rui

thanks a lot.

BenxiaHu commented 1 year ago

Hello Rui, I still have several questions about the codes: 1: # strandFlipped from function "TU_prep"; if any experiment prepared first-in-pair read on reverse strand, then "strandFlipped" is TRUE (strandFlipped <- FALSE). Would you like to explain this?

2: mergeFeatures <<- input$merge_features # "protein_coding" or additional multi-exon transcript type, e.g. lincRNA. Based on your point, what option do I select? "protein_coding"?

3: mergeMethod <<- input$mergeMethod # by "exon" or "tu". Do you think "tu" is much better than "exon"? it seems that "tu" would generate much longer transcripts than "exon".

4: TSS_len <<- input$L_TSS # default 1000 Promoter_len <<- input$L_Promoter # default 1000 TTS_len <<- input$L_TTS # default 1000 what is the difference TSS_len and Promoter_len? 5: it seems this code just can analyze nascent RNA-seq from human, not other species.

Best,

shaorray commented 1 year ago

Hi Benxia,

  1. "strandFlipped" is TRUE (strandFlipped <- FALSE). Would you like to explain this?

Different library preparation methods (home-made or commercial kit, e.g. TruSeq) have inconsistent strandedness. Here for TT-seq with Oviation kit, the first reads are on the forward strand. But it might be on reverse strand in some experiments, and causes a trouble for TU annotation unless each bam file are carefully specified. I made this boolean, "strandFlipped", by comparing TU and gene strands, for handling bam files from different sources.

  1. what option do I select? "protein_coding"?

"protein_coding" or mRNA is multi-exon transcript, the same as "lincRNA". I would recommend to use both "protein_coding" and "lincRNA".

  1. Do you think "tu" is much better than "exon"?

It depends on the purpose, whether the aim is about transcription or RNA itself. If it is to capture the entire transcription cycle, then "tu" method is better (as the first TT-seq paper). If the aim is to find non-coding RNA species around a coding region, then "exon" method will be more suitable. I made an extra step of clipping off the down-stream sense RNA (in the termination window) from coding region, since it has different synthesis rate and confounds in gene level synthesis estimation. And down-stream anti-sense ncRNA will be named thereafter, according to gene regions (by exon) rather by the entire TU.

  1. what is the difference TSS_len and Promoter_len?

TSS_len is downstream of TSS, in (0, TSS_len] region. Promoter_len is upstream, in (-Promoter_len, 0] region.

  1. not other species

It actually can take a gene annotation file that matches the bam files, but I haven't fully test all versions, Ensembl and GENCODE etc.. Please let me know if there is any issue.

I would agree developing a R package is a good option, while a graphical interface in my opinion is nice for reproducible research as well. This local shiny app is computational intensive, but the main issue is downloader's limited permission cross platforms (sometimes not working on Ubuntu).

Best,

Rui

BenxiaHu commented 1 year ago

Thanks. The shiny app can not handle large bam file >40GB. is it possible to take as input a bigwig file which is much smaller than bam file?