tidyverse / multidplyr

A dplyr backend that partitions a data frame over multiple processes
https://multidplyr.tidyverse.org
Other
641 stars 75 forks source link

progress bar #58

Closed donaldRwilliams closed 5 years ago

donaldRwilliams commented 7 years ago

Hi: I am thinking of using this package for some intensive Bayesian simulations, since it appears I can "loop" through row of condition very nicely. However, I would like to have a progress bar. Is this possible ? Here is a basic t-test sim that gives an idea of what I am thinking of using this package for (but for far more complex models).

I am also wondering how random number streams would be handled here, and how to set a cluster rng stream.

library(multidplyr)

conditions to simulate

n1 <- c(10, 20, 30) dat <- data.frame(n1, n2 = rev(n1), sd = c(20, 10, 5)) d <- expand.grid(dat)

define function to be applied to each row nsims times (could be anything: mse, etc)

func <- function(nsims, x1, x2, sd){ replicate(nsims, t.test(rnorm(x1, 0, sd), rnorm(x2, 0, 1), var.equal = TRUE)$p.value) }

create cluster

cluster <- create_cluster(16)

register function to cluster

cluster_assign_value(cluster, 'func', func)

results <- d %>% partition(n1, n2, sd,cluster = cluster) %>% do(t1 = mean(func(5000, x1 = .$n1, x2 = .$n2, sd = .$sd) < 0.05)) %>% collect()

unlist into data frame

results$t1 <- unlist(results$t1)

hadley commented 5 years ago

There's no way to have a progress bar, sorry.