stephenslab / dsc-log-fold-change

dsc to compare approaches to estimating/testing log-fold-change from counts
https://stephenslab.github.io/dsc-log-fold-change/
2 stars 3 forks source link

syntax for logical pipeline variable #11

Closed jhsiao999 closed 5 years ago

jhsiao999 commented 5 years ago

@gaow

In dsc-log-fold-change/dsc/benchmark.dsc, I'd to have in data_poisthin a logical argument shuffle_sample. This dsc module calls pois_thin function in the module folder. The syntax now only gives me one file. I expect two files: one when shuffle_sample=TRUE and one when shuffle_sample=FALSE. How do I do this? Thanks!

data_poisthin: R(counts = readRDS(dataFile)) + \
       dataSimulate.R + \
       R(set.seed(seed=seed); out = poisthin(mat=t(counts), nsamp=nsamp, ngene=ngene, gselect=gselect, shuffle_sample=shuffle_sample, signal_dist=signal_dist, prop_null = prop_null)) + \
       R(groupInd = out$X[,2]; Y1 = t(out$Y[groupInd==1,]); Y2 = t(out$Y[groupInd==0,]))
  dataFile: "data/pbmc_counts.rds"
  seed: R{2:101}
  nsamp: 90
  ngene: 1000
  prop_null: .5, .9, 1
  shuffle_sample: T, F
  gselect: "random"
  signal_dist: "bignormal"
  $Y1: Y1
  $Y2: Y2
  $beta: out$beta
gaow commented 5 years ago

@jhsiao999 DSC does not have the concept of explicit file as module output. I'm not sure what exactly you want in terms of output. But we have the file() operator when you do ask for explicit files to be kept. For example,

test: R(z = x; saveRDS(z+1, y))
  x: 1
  $y: file(rds)
  $z: z

DSC:
  run: test
$ dsc test.dsc 
INFO: DSC script exported to test.html
INFO: Constructing DSC from test.dsc ...
INFO: Building execution graph & running DSC ...
[###] 3 steps processed (3 jobs completed)
INFO: Building DSC database ...
INFO: DSC complete!
INFO: Elapsed time 4.298 seconds.

you 'll get 3 files:

├── test
│   ├── test_1.rds
│   ├── test_1.yml
│   └── test_1.y.rds

test_1.rds is the usual DSC implicit module output which contains z. test_1.y.rds contains y is the additional file you asked for. test_1.yml is meta-info of these additional files -- you can ask for more of them.

Notice I asked for file extension rds so I use saveRDS in the script explicitly. If I want a text file instead I'd do $y: file(txt) and write(z+1, y) in the DSC script.

Will that help what you want to do? I'm not saying this is the best design but it is the current design. It can be improved to get rid of yml file in the future.

Is there a reason you label this ticket as bug -- was there something not working as promised? If so, it would be nice to have an error message and a minimal reproducible example. As you can tell from recent tickets I commented on (including this one) it is straightforward to create such minimal working examples.

jhsiao999 commented 5 years ago

@gaow: Sorry about the confusion! I did not want a file. I just removed the bug label. This issue was about the use of dsc syntax.

The problem was that I had used the dsc syntax incorrectly.

Below is a minimum reproducible example.

data_poisthin: R()
  shuffle_sample: T, F
  $Y1: 1

DSC:
  define:
    data: data_poisthin
  run:
    data

I expected to have data_pois output under shuffle_sample=T and shuffle_sample=F. When I originally submitted the issue, I only saw output under shuffle_sample=T.

dsc test_logical_input_2.dsc --truncate
dsc-query test_logical_input_2 --target data_poisthin.shuffle data_poisthin.shuffle_sample -f -o test_logical_input_2.csv

DSC,data_poisthin.output.file,data_poisthin.shuffle_sample
1,data_poisthin/data_poisthin_2,T

I realized I had used the truncate option when running dsc, so it make senes that dsc only generates output under one setting. After removing the truncate option, dsc generates results under both settings.

Thanks for the quick reply and sorry about all these confusions!!

gaow commented 5 years ago

Oh okay that makes sense now. I was wondering whey you'd like to keep F result in a file separately .... Yes --truncate is meant for testing your implementation. It's good you can get away without having to use a file; but just in case you need one in the future you see how it is done in my comment. I'll close this ticket for now.