tidyverse / multidplyr

A dplyr backend that partitions a data frame over multiple processes
https://multidplyr.tidyverse.org
Other
641 stars 75 forks source link

Error in if (left == 0) break : argument is of length zero #47

Closed cwaldock1 closed 5 years ago

cwaldock1 commented 7 years ago

Running a number of models from qgam package (https://github.com/mfasiolo/qgam) and run into this error. Can't provide reproducible example but here is example code. The error seems to be quite random.

Thought you'd like to know, and would be good to know if it's to do with my setting up of the clusters.

#  Detect clusters
parallel::detectCores() # 4 cores

# Create groups
group <- rep(1:4, 
             length.out = nrow(TestData %>% group_by(Group1, Group2) %>% nest()))
TestData2 <- bind_cols(tibble(group), TestData %>% group_by(TaxonomicName, Method) %>% nest())
TestData2 <- TestData2 %>% unnest(data)

# Create clusters
cluster <- create_cluster(cores = 4)

# Create partition dataframe
TestDataParty <- TestData2 %>% partition(group, cluster = cluster)

# Load library for within each cluster
cluster_library(TestDataParty, "qgam")

# Run models in parallel 
TestDataPartyOutput <- TestDataParty %>% group_by(Group1, Group2) %>% 
  do(Model1 = tryCatch(qgam(y ~ s(x, k = 3), qu = 0.95, data = .), error = function(e) NA),
     Model2 = tryCatch(qgam(y ~ s(x, k = 4), qu = 0.95, data = .), error = function(e) NA),
     Model3 = tryCatch(qgam(y ~ s(x, k = 5), qu = 0.95, data = .), error = function(e) NA),
     Model4 = tryCatch(qgam(y ~ s(x, k = 7), qu = 0.95, data = .), error = function(e) NA))
hadley commented 7 years ago

Can you please provide a minimal reprex (reproducible example)? The goal of a reprex is to make it as easy as possible for me to recreate your problem so that I can fix it: please help me help you!

If you've never heard of a reprex before, start by reading "What is a reprex", and follow the advice further down the page.

cwaldock1 commented 7 years ago

Have only run into the error once, and thought it might be a bug somewhere in multidplyr. The example is a similar scale to the data I was running at the time.

Cheers

## -- ## 
library(devtools)
install_github("mfasiolo/qgam")
library(qgam)
library(dplyr)
library(multidplyr)
library(purrr)

# Create tibble
# Groups
TestData <- as_data_frame(data.frame(Group1 = sort(as.factor(rep(1:3000, length.out = 6000))), 
                          Group2 = as.factor(rep(1:2))))

# Variables
TestData <- TestData %>% 
  group_by(Group1, Group2) %>% 
  do(x = seq(1:1000),
     y = rnorm(1000)) %>% 
  unnest(x,y)

# Optional subset to smaller scale for speed. 
#TestData <- TestData[1:10000,]

#  Detect clusters
parallel::detectCores() # 4 cores

# Create groups
group <- rep(1:4, 
             length.out = nrow(TestData %>% group_by(Group1, Group2) %>% nest()))
TestData2 <- bind_cols(tibble(group), TestData %>% group_by(Group1, Group2) %>% nest())
TestData2 <- TestData2 %>% unnest(data)

# Create clusters
cluster <- create_cluster(cores = 4)

# Create partition dataframe
TestDataParty <- TestData2 %>% partition(group, cluster = cluster)

# Load library for within each cluster
cluster_library(TestDataParty, "qgam")

# Run models in parallel 
TestDataPartyOutput <- TestDataParty %>% group_by(Group1, Group2) %>% 
  do(Model1 = tryCatch(qgam(y ~ s(x, k = 3), qu = 0.95, data = .), error = function(e) NA),
     Model2 = tryCatch(qgam(y ~ s(x, k = 4), qu = 0.95, data = .), error = function(e) NA),
     Model3 = tryCatch(qgam(y ~ s(x, k = 5), qu = 0.95, data = .), error = function(e) NA),
     Model4 = tryCatch(qgam(y ~ s(x, k = 7), qu = 0.95, data = .), error = function(e) NA))
mschilli87 commented 6 years ago

I ran into the same problem and 'resolved' it by re-initiating my cluster. I did not change any code, just re-ran the cluster setup (incl. library loading and data copying).

hadley commented 5 years ago

Seems unlikely to be a bug in multidplyr, but if you're still interested, and can create a simple reprex, please file a new issue.