wyguo / ThreeDRNAseq

A pipeline for differential expression and differential alternative splicing analysis
https://github.com/wyguo/ThreeDRNAseq
GNU General Public License v3.0
60 stars 27 forks source link

[Help] Easiest Way to Pull Out Both Condition Expression and PSI for Custom Figures? #25

Closed danphillips28 closed 3 years ago

danphillips28 commented 3 years ago

Dear 3D-RNA-Seq Developers,

Thanks for making this tool, I love it!

I am exactly the kind of person this tool was designed for - someone with little background in statistics and little comfort working in R. Whilst I use 3D as the backbone of my results, I am now also starting to think of how I can pull out data to use for downstream analyses etc. In particular, I recently found an interactive visualisation tool (Clustergrammer) which I am interested in trying out. Therefore, I am wondering: What is the easiest way to pull out the data used by 3D to re-visualise condition expression and PSI results?

For the DEGs I've found this, which I believe is the normalised counts used by EdgeR, for each replicate; intermediate_data[["genes_dge"]][["counts"]] Would these be appropriate to use for visualisation (after averaging by condition)? i.e. Would I be visualising the same data that produced my final results? Is there a file for counts per condition somewhere that I haven't found? I'm only finding condition count comparisons.

Similarly, for DAS genes, I found this, which looks to be the dPSI between each of my chosen contrast groups; intermediate_data[["deltaPS"]] However, ideally what I need is the PSI of each isoform in each condition (or replicate, similarly to suppa2 diffsplice input). This is because I would like to visualise (heatmap) PSI changes themselves, rather than the expression of their genes. This is something I have seen in many DAS publications and I would like to try myself.

Your suggestions and guidance on this will be greatly appreciated!

Thanks again, Daniel

wyguo commented 3 years ago

Dear Daniel, Many thanks for using the 3D RAN-seq App. In the output data folder, you will get three objects: txi_trans.RData, txi_genes.RData and intermediate_data.RData. Once you load these objects to your R, you can get:

  1. Transcript and gene level TPM

    • Transcript TPM: txi_trans$abundance
    • Gene TPM: txi_genes$abundance
  2. Raw read counts

    • Transcript read counts: txi_trans$counts or intermediate_data$trans_dge$counts (this is raw counts, not the normalised)
    • Gene read counts: txi_genes$counts or intermediate_data$genes_dge$counts (this is raw counts, not the normalised)
  3. Normlaised log2-CPM

    • Transcript level log2-CPM: intermediate_data$trans_3D_stat$voom.object$E
    • Gene level log2-CPM: intermediate_data$genes_3D_stat$voom.object$E
  4. Transcript level PS: intermediate_data$PS

Please let me know if you have further questions. Best, Wenbin

danphillips28 commented 3 years ago

Just what I needed, thank you! Dan