stephenslab / dsc-log-fold-change

dsc to compare approaches to estimating/testing log-fold-change from counts
https://stephenslab.github.io/dsc-log-fold-change/
2 stars 3 forks source link

Document pipeline variables and module input/output at the top of `benchmark.dsc` #7

Closed jhsiao999 closed 5 years ago

jhsiao999 commented 5 years ago
gaow commented 5 years ago

@jhsiao999 would you elaborate? Because we do have this option as a "unsupported feature". for example:

module: R()
  a: ${a}

...

DSC:
   global:
      a: 1

then ${a} will be 1. and this can be changed in command line:

dsc ... --a 2

then ${a} is 2.

Is this what you are looking for? I think there were debates about using global parameters and I was told it is not good idea. So this remains a hidden feature. But your request seems to be backing me up.

jhsiao999 commented 5 years ago

@gaow Sorry for the confusion. This was a reminder for myself that I should make note of pipeline variables in my dsc script (see below).

And a comment about global parameters: I also decided not to use them because I didn't understand what they do to add to my dsc script.

# pipeline variables  --------------------------------------------------
# $Y1: `ngene` by `nsamp/2` matrix of counts for samples in group 1
# $Y2: `ngene` by `nsamp/2` matrix of counts for samples in group 0
# $beta: an `ngene` vector of simulated true values beta (used `poisthin` function)
# $log_fold_change_est: an `ngene` vector of estimated values beta
# $s_hat: an `ngene` vector of estimated values standard error
# $pval: an `ngene` vector of p-values
# $df: an `ngene` vector of degrees of freedom
# $type_one_error: an 'ngene' vector of degrees of freedom
# $pval_adj: an 'ngene' vector of adjusted p-values, currently we use 'qvalue' from the 'qvalue'
# $fdr_est: an 'ngene' vector of estimated vaules for false discover rate (depend on 'fdr_thres' level)
# $auc_est: an 'ngene' vector of estimated values for area under the curve (using pROC package)

# module groups --------------------------------------------------------
# data:
#   input: "data/pbmc_counts.rds"
#   output:  $Y1, $Y2, $beta
# method:
#   input: $Y1, $Y2
#   output:  $log_fold_change_est, $s_hat, $pval, $df
# score:
#   input: $pval, $pval_adj, $beta
#   output: $type_one_error, $pval_adj, $fdr_est, $auc_est
gaow commented 5 years ago

Ahh too bad you are not backing up my global variable implementation -- some might consider anti-pattern to use global variables. But I am tolerable to it as long as it is not abused ...

jhsiao999 commented 5 years ago

I still need to get more familiar with the dsc syntax... But so far, I like that the basic dsc syntax - just input and output and no global variable - is pretty easy to read for me. With global variables, I would need to keep track of an additional layer of complexity in my script...

jhsiao999 commented 5 years ago

I've added these info in benchmark.dsc. Will continue to do so as the project progresses.