tidyverse / multidplyr

A dplyr backend that partitions a data frame over multiple processes
https://multidplyr.tidyverse.org
Other
641 stars 75 forks source link

pmap_dfr - Error: Element 5 is not a vector (environment) #60

Closed kartiksubbarao closed 6 years ago

kartiksubbarao commented 6 years ago

I'm trying to use multidplyr_0.0.0.9000 with dplyr_0.7.4.9000 and pmap_dfr from purrr_0.2.4.9000. The following code (without using multidplyr) works fine:

grid1 = as_tibble(expand.grid(m1 = c(1:10), m2 = c(20:30)))
retstuff = function(m1, m2) { return(tribble(~m3, ~m4, m1+1, m2+2)) }
pmap_dfr(grid1, retstuff)

When I try to partition the grid with multidplyr:

grid2 = partition(grid1, m1)
pmap_dfr(grid2, retstuff)

I get the error Error: Element 5 is not a vector (environment) from pmap_dfr()

I also get the #57 warning from partition(): group_indices_.grouped_df ignores extra arguments. Not sure if that's related or not.

kartiksubbarao commented 6 years ago

Got the answer: https://stackoverflow.com/a/47066154/3151579 What I didn't realize was that apparently only dplyr verbs can be directly called on the partitioned data frame, so I needed to wrap the pmap_dfr call in dplyr::do. I had implicitly assumed that anything from the tidyverse would "just work" :-) It would be nice if it were possible to report a better error for this, but I'll go ahead and close this issue.