stephenslab / dsc

Repo for Dynamic Statistical Comparisons project
https://stephenslab.github.io/dsc-wiki
MIT License
12 stars 12 forks source link

A `dsc-setup` command #208

Open gaow opened 4 years ago

gaow commented 4 years ago

To continue our in person discussion for a dsc-setup command: what it should do is to provide a one line command to setup a github friendly template for DSC. It should include the basic suggested script structure / hierarchy for DSC benchmarks, and optionally templates to query and explore results (for dscrutils::dscquery, with potentially workflowr structure or dscrutils::shiny_plot in mind).

implementation-wise I suggest it be written as a command tool that people type dsc-setup in terminal to use it, but written in R language -- this makes it easier for the lab to maintain and change it, and we can potentially borrow codes from workflowr already for initializing a project.

For starters this ticket discuss what we want to achieve. My current DSC organization is:

- scripts
   - module1.R
   - module2.R
   - whatever-lumped-scripts.R
- modules
   - module1.dsc
   - module2.dsc
   - whatever-lumped-modules.dsc
benchmark1.dsc
benchmark2.dsc
...

where benchmark*.dsc only has the DSC section.

We can use dsc-setup as dsc-setup name that will:

  1. create a github repo name
  2. prepare .gitignore and .gitattributes files for it
  3. setup the structure above with a README.md to explain what each folder does
  4. setup a master DSC file as name.dsc with the DSC section only, with contents:
#!/usr/bin/env dsc
%import modules/*.dsc

DSC:
   output: "name"

I don't think it would be necessary (or encouraged) to add comments in a DSC script like this because the HTML file for exported DSC script will now contain the information (#209). That is, the file when you run DSC and see in the first line of output:

$ ./finemap.dsc --debug
INFO: DSC script exported to finemap_output.html
...
pcarbo commented 4 years ago

My suggestions:

$ tree mybenchmark
mybenchmark
├── analysis
│   └── summarize_results.R
└── dsc
    ├── modules
    │   ├── abs_err.R
    │   ├── mean.R
    │   ├── median.R
    │   ├── normal.R
    │   ├── sq_err.R
    │   └── t.R
    └── mybenchmark.dsc
gaow commented 4 years ago

@pcarbo we can setup a github repository to put in the aforementioned template that actually does something. I can put in a version if you create such a repo under stephenslab github account.

gaow commented 4 years ago

Or, maybe we can do it inside dscrutils package eg inst folder? See it here. From my experience using DSC:

  1. I don't think it is bad idea to add a .gitattributes file.
  2. I recommend using %include so one can have multiple "main" DSC files like template.dsc for various specific benchmarks.
  3. .gitignore should at least include our default output directory, to prevent novice users from adding the output to github.
  4. I recommend setting chmod +x to the main DSC file so users can execute the file directly ./template.dsc
pcarbo commented 4 years ago

Or, maybe we can do it inside dscrutils package eg inst folder? See it here.

That's a great start, thanks.

  1. I don't think it is bad idea to add a .gitattributes file.

I'm more okay with it here---it was annoying when it was being generated every time I ran dsc.

I recommend using %include so one can have multiple "main" DSC files like template.dsc for various specific benchmarks.

I would say this is going against the principle of this being a simple DSC. And for that matter, I think having the DSC specified in a single file is one of the things that makes DSC attractive.

I'm fine with 3 and 4.