viash-io / viash

script + metadata = standalone component
https://viash.io
GNU General Public License v3.0
39 stars 2 forks source link

Allow script to specify functionality and platform as headers #10

Closed rcannood closed 4 years ago

rcannood commented 4 years ago

By placing the functionality.yaml and the platform_*.yaml in the header of a script, using viash becomes much more succinct.

viash run r-estimate.R -- -o output.pdf    # run command
viash run r-estimate.R -P docker           # run command with one of the platforms defined in the header
viash run r-estimate.R -p platform.yaml    # override platform altogether
r-estimate.R ```r #' functionality: #' name: r-estimate #' description: | #' #' Estimate the R value based on the vignette: #' https://cran.r-project.org/web/packages/EpiEstim/vignettes/demo.html #' As input, the following are used: #' https://epistat.sciensano.be/Data/COVID19BE_HOSP.csv #' https://assets.researchsquare.com/files/rs-18805/v3/dataset.xlsx #' #' arguments: #' - name: "--output" #' alternatives: ["-o"] #' type: file #' description: The path to the output plot file. #' default: output.png #' required: true #' direction: output #' platforms: #' - type: native #' - type: docker #' image: rocker/tidyverse #' r: #' cran: #' - optparse #' - EpiEstim #' - openxlsx #' - lubridate #' - patchwork #' workdir: /app #' -type: nextflow library(tidyverse) library(EpiEstim) library(openxlsx) library(lubridate) library(patchwork) # collect incidence data covid <- read_csv("https://epistat.sciensano.be/Data/COVID19BE_HOSP.csv") incidence <- covid %>% group_by(DATE) %>% summarise_if(is.numeric, sum) %>% select(dates = DATE, I = NEW_IN) # collect infector/infectee data infections <- openxlsx::read.xlsx("https://assets.researchsquare.com/files/rs-18805/v3/dataset.xlsx") %>% mutate_at(vars(Infector.date.lwr, Infector.date.upr, Infectee.date), lubridate::mdy) # interval-censored serial interval data: # each line represents a transmission event, # EL/ER show the lower/upper bound of the symptoms onset date in the infector # SL/SR show the same for the secondary case # type has entries 0 corresponding to doubly interval-censored data # (see Reich et al. Statist. Med. 2009). si_data <- infections %>% transmute( EL = 0L, ER = 1L, SL = difftime(Infectee.date, Infector.date.upr, units = "days") %>% as.integer, SR = difftime(Infectee.date, Infector.date.lwr, units = "days") %>% as.integer, type = 0L ) # Estimating R and the serial interval using data on pairs infector/infected res <- estimate_R( incidence, method = "si_from_data", si_data = si_data, config = make_config() ) # make nicer plots than the ones proposed by EpiEstim plots <- map(c("incid", "R", "SI"), function(what) { g <- plot(res, what = what) + theme_bw() if (what %in% c("incid", "R")) { g <- g + scale_x_date(breaks = "1 week") + theme(axis.text.x = element_text(angle = 35, hjust = 1)) } g }) summary(res$R) print(res$R) g <- wrap_plots(plots, ncol = 1) ggsave(par$output, g, height = 8, width = 8) ```

Backwards compatibility with previous versions of viash could be allowed, for existing components:

viash run -f fun.yaml -p pl.yaml -- -o output.pdf

However, they could easily be merged together with a script.

rcannood commented 4 years ago

The question is, how soon would we like to support this?

tverbeiren commented 4 years ago

There are two things that need to happen from a HL point of view:

  1. Make sure viash can deal with a YAML spec with functionality and platform(s) specs together
  2. Allow for parsing this YAML spec in te header of a script

My suggestion: Do (1) now as it opens up some additional functionality from the CLI. (2) I would postpone to the next release.

rcannood commented 4 years ago

yesterday I was already making some progress to this end Main.scala#L141-L182, but my progress got sidetracked due to a more pressing issue. For now, I started creating a viash run2 command which would support the above issue. Once everything is confirmed ot be working, I can remove the old code in favour for the new code.

For part (1) I am currently planning on misusing Meta to be able to read out a single yaml containing both Functionality and Platforms, or do you see any caveats with this? I can always make a new class for this. For part (2)... If we implement part (1), part (2) is peanuts -- the code already exists :)

tverbeiren commented 4 years ago

For part (1) I am currently planning on misusing Meta to be able to read out a single yaml containing both Functionality and Platforms, or do you see any caveats with this?

Go ahead!

Just one remark: If you think about a joined spec, that in fact is a subset of Meta in the case of a YAML spec like this:

functionality:
  ...
platform:
  ...

Or worse, in the case of a platforms array:

functionality:
  ...
platforms:
- platform1:
  ...
- platform2:
  ...

Meta is then the layer that adds viash run info etc.

rcannood commented 4 years ago

:+1: implemented as of 76a9d5d8e1c4cb7728fc85dc60b7beb196fd1f3a and 836c51b1c60e1573473736ae1ab1b807ec50ffcd