umccr / RNAsum

Pipeline for generating RNAseq-based cancer patient reports
https://umccr.github.io/RNAsum/
Other
7 stars 4 forks source link

Tasks to address when adding option to use“full” TCGA reference cohort #164

Open JMarzec opened 2 weeks ago

JMarzec commented 2 weeks ago

Consider the following matters to address when adding the option to use the “full” TCGA patients reference cohort

  1. Use static plots (instead of interactive ones, in particular those with per-sample data points within the "Input data summary" and "Expression profiles" sections) to reduce the run time as well as the size of the final report
  2. Switch off saving the expression data (expression matrixes) and summary tables since they are computationally intense and produce big files, which are used only for RNA data portal
  3. Look at the Addendum run time to check which time-consuming code chunks can be skipped to reduce the run time
  4. Create separate "RNAsum.data" repo with expression matrix files including the “full” TCGA patients reference cohort
JMarzec commented 2 days ago

I run RNAsum using the “full” and "partial" TCGA patients reference cohort options for the following samples:

SBJ04426 BRCA SBJ04187 BRCA SBJ04296 BRCA SBJ01649 PANCAN SBJ04469 PANCAN SBJ02061 PANCAN SBJ02091 PANCAN SBJ04376 PANCAN SBJ04408 PANCAN

Attached are summary plots illustrating the following:

Based on the "RNAsum processing time by chunk" chart , the following R code chunks are the most computationally demanding (comments in "()" indicate whether respective chunks can be skipped using the "full" TCGA reference option):

(can be skipped) data_transformation_plot (keep) glance_expr_plot_immune_genes (keep) pca (keep) glance_expr_plot_cancer_genes (can be skipped) data_transformation_display (keep) glance_expr_plot_hrd_genes (keep) top_hits_fusions (keep) unnamed-chunk-1 (keep) rle

I'd also skip "data_normalisation_plot", "scree_combined_data_display" and "rle_display" chunks since these are not readable given the number of included samples.

RNAsum processing time by sample RNAsum report size by sample RNAsum processing time by chunk