Make partition work with invoke_rows #51

Closed stanstrup closed 5 years ago

stanstrup commented 7 years ago

It seems invoke_rows doesn't accept a party_df object. That would be useful...

cluster <- c(detectCores(), length(unique(mtcars$carb))/2) %>% min %>% create_cluster()
mtcars %>% partition(carb, cluster=cluster) %>% invoke_rows(.f = sum)

--> Error: .d must be a data frame

jepusto commented 7 years ago

Wrapping in do() makes the above example work:

cars_serial <- 
  mtcars %>% 
  invoke_rows(.f = sum) %>%

cars_parallel <- 
  mtcars %>% 
  partition(carb, cluster=cluster) %>% 
  do(invoke_rows(.f = sum, .d = .)) %>%
  collect() %>%

setdiff(cars_serial, cars_parallel) %>% nrow()
stanstrup commented 7 years ago

The work around now gives me:

Warning message:
group_indices_.grouped_df ignores extra arguments 

I am not understanding what goes wrong here...

Ax3man commented 7 years ago

Most likely because you have updated dplyr to the latest dev version, but multidplyr isn't up to date.

derekpowell commented 6 years ago

Sorry to resurrect this issue, I'm getting the same group_indices_.grouped_df ignores extra arguments warning. As far as I can tell it's not creating any real issues, but I'm concerned I'm missing something. So, I'm just wondering, should I be worried?

Here's a minimal example:


df <- data.frame(A=c(1,2,3,4,5,6),

cluster <- create_cluster(2)
byGroup <- partition(df, group, cluster=cluster)

The resulting byGroup is a party_df that looks correct to me:

> byGroup
Source: party_df [6 x 3]
Groups: group
Shards: 2 [3--3 rows]

# S3: party_df
      A     B group
  <dbl> <dbl> <dbl>
1     1     4     1
2     2     5     1
3     3     5     1
4     4     6     2
5     5     8     2
6     6     4     2

hadley commented 5 years ago

This will eventually be fixed by an implementation group_map()/group_modify(); I don't currently have plans to add support for purrr/purrlyr.