tidyverse / multidplyr

A dplyr backend that partitions a data frame over multiple processes
https://multidplyr.tidyverse.org
Other
641 stars 75 forks source link

Partition() doesn't play nicely with data.table #13

Closed AndrewsOR closed 8 years ago

AndrewsOR commented 8 years ago

Fails with incorrect error message when data.table is used as input to partition.

my data.table is too large to replicate here, but perhaps it is enough to try:

cl <- create_cluster(3)
set_default_cluster(cl)

iris %>% partition(Species) #works

library(data.table)
data.table(iris) %>% partition(Species) #fails "Error: length(values) == length(cluster) is not TRUE"
hadley commented 8 years ago

There's currently no support for data.table.

phineas-pta commented 2 years ago

now is it possible now with dtplyr ? i tried but it's 2x slower when combining dtplyr and multidplyr