stephenslab / dsc

Repo for Dynamic Statistical Comparisons project
https://stephenslab.github.io/dsc-wiki
MIT License
12 stars 12 forks source link

DSC separate configuration file for environments etc #117

Open gaow opened 6 years ago

gaow commented 6 years ago

It has been proposed that we add a --config switch that points to a file to define various system environment needed to run DSC. The point here is to separate benchmark definition with execution environments.

Currently there are two places to configure DSC: in section DSC and in @CONF for each module. Configurations specified in these places are mostly "portable" because they specify what software packages to use, where (in relative path) other script file and executables are located, etc. Additionally there is a configuration file for cluster that defines how DSC submits to a cluster system and the specifications for each module. See here for an example.

Here I propose that we generalize cluster configuration and allow for module specific configuration, as well as global configurations. But we'll have to be careful with what we'd like to configure. I will list potential configurable items below and we should add to this list and evaluate before implement. Note that all these configures can be either global or module specific.

  1. Working directory (default to current)
  2. Global R interpreter and Python interpreter (default to Rscript and python3 in system PATH)
  3. Additional system PATH to append (default to empty)
  4. Module specific script interpreter (default to either R, Python, or Shell depending on what type of module it is). Here we can also allow for python2, julia, even matlab but these interpreters will not provide "seamless" support as we've done for default R and Python. Rather, scripts will be executed as shell commands.
pcarbo commented 6 years ago

@gaow It seems to me that 2 is most essential.

Do we need 3? (Can't the user just set $PATH before running dsc? Likewise for other environment variables.)

When would 1 be necessary, or useful?

gaow commented 6 years ago

Do we need 3?

At module level we do. This is like a more general version of running source activate for a particular module.

When would 1 be necessary, or useful?

I guess not ... I proposed that because I had in mind some software has this irritating behavior that they write some output / log etc to current work directory. An immediate one I can think of is STAR the RNA seq aligner. But I agree it is a corner case and there might be other ways around it.

pcarbo commented 6 years ago

I guess not ... I proposed that because I had in mind some software has this irritating behavior that they write some output / log etc to current work directory.

@gaow I agree, that can be annoying, but I think the solution should be within the software, and doesn't need to be directly addressed by DSC.