umccr / RNAsum

Pipeline for generating RNAseq-based cancer patient reports
https://umccr.github.io/RNAsum/
Other
7 stars 4 forks source link

Update description for z-score plots #157

Open skanwal opened 1 month ago

skanwal commented 1 month ago

Expand legend to clarify plots are using median values - because box plots describe medians. This is as opposed to tables which are using mean.

Mean will be different from median for genes that have low expression across the cohort.

JMarzec commented 1 month ago

I afraid that it's more complex than that. Both, the plots ad the tables, present median values of Z-scores calculated for individual groups/patient. The key functions to look at are:

  1. exprGroupsStats_geneWise.R ( https://github.com/umccr/RNAsum/blob/main/R/exprGroupsStats_geneWise.R ):

    • that function returns two objects: (1) group_stats.list, which includes stats for individual genes calculated FOR EACH group, and (2) gene_stats.list, which seems to include stats for individual genes but calculated ACROSS all samples
  2. exprTable.R ( https://github.com/umccr/RNAsum/blob/main/R/exprTable.R ):

    • it uses the group_stats.list from exprGroupsStats_geneWise.R() function
  3. cdfPlot.R ( https://github.com/umccr/RNAsum/blob/main/R/cdfPlot.R ):

    • it it uses both the group_stats.list and gene_stats.list objects from exprGroupsStats_geneWise.R() function

I feel that the plots requires to provide values in the context of the entire cohort while the table provide stats (median values) for the group/patient.

JMarzec commented 1 month ago

Re the table legend it could be mentioned that the values refer to MEDIAN Z-score (or percentile) in the reference cohort and patient, e.g. for BRCA case in the Z-score tab it could look like (changes/additions in italics font):

In the BRCA (TCGA), Patient and the Diff columns the RED colour range indicate relatively high expression (median Z-score) values and BLUE colour range indicate relatively low expression (median Z-score) values in individual sample group. The BLANK cells with missing values indicate genes with no/low expression. The Diff (Patient vs BRCA (TCGA)) column illustrates the difference between median Z-scores in patient sample and reference cancer cohort for each mutated gene...