paciorek / future-kubernetes

Instructions for setting up and using a Kubernetes cluster for running R in parallel using the future package.
39 stars 10 forks source link

Additional notes about using AWS ECR and extending with secrets/private code #8

Closed 1beb closed 3 years ago

1beb commented 3 years ago

Hello @paciorek,

Would you be welcome to a PR that adds sections on:

a) Using AWS ECR b) Extending the dockerfile to deal with secrets and private github repositories

As this respository is largely an "article" of sorts, perhaps you would prefer these sections written as standalones in an issue such that you can integrate them as you please? Some of the other elements of the article may be better presented in a question and answer format (like an FAQ). Perhaps these would fit that mold as well.

paciorek commented 3 years ago

Yes, I'd love to get some additional content. However, given the current README is already long and I'd like to retain the current flow of content, it probably makes sense to have additional content as one or more additional Markdown files that I can link to from the README.

Both of your suggested additions sound good. How about you do a PR with the new content and then we can work out how to organize things.

1beb commented 3 years ago

NTS: Also discuss process design to take full advantage of future topologies:

Example:

library(dplyr)
library(future)
library(furrr)

# plan(cluster, manual = TRUE)
plan(list(
  tweak(cluster, manual = TRUE),
  multisession
))

scheduler_fun <- function(i) {
  df <- data.frame(
    task = i,
    thost = Sys.info()[["nodename"]],
    tpid = Sys.getpid()
  )
  node_fun(df)
}

node_fun <- function(dft) {
  future_map_dfr(1:7, function(x) {
    dfb <- data.frame(
      res = mean(rnorm(1*10^x)),
      lhost = Sys.info()[["nodename"]],
      lpid = Sys.getpid(),
      lcores = parallelly::availableCores()
    )
    dplyr::bind_cols(dft, dfb)
  }
  )
}

r <- future_map(1:20, scheduler_fun)
bind_rows(r)
1beb commented 3 years ago

See #9