tidyverse / multidplyr

A dplyr backend that partitions a data frame over multiple processes
https://multidplyr.tidyverse.org
Other
641 stars 75 forks source link

Clusters are not closed automatically #84

Closed eliferden closed 5 years ago

eliferden commented 5 years ago

Parallel processing was tested on the default air quality dataset but clusters are not closed automatically when using new_cluster(). Also deleting data_par and stopCluster() solutions are not working for cleaning up the clusters.

Also tried triggering full garbage collection. That did not work either.

library(tidyverse) 
library(dplyr)
library(multidplyr)
library(parallel)

#multidplyr
start <- proc.time()
core_num = detectCores()
cluster = new_cluster(core_num)
data_par <- airquality %>% group_by(Month) %>% partition(cluster) %>% 
  summarize(cnt = n()) %>% collect()
data_par
time_elapsed_parallel <- proc.time() - start
time_elapsed_parallel #process time with parallel process
stopCluster(cluster)
hadley commented 5 years ago

You'll have to rm(cluster) before it'll get cleaned up