tidyverse / multidplyr

A dplyr backend that partitions a data frame over multiple processes
https://multidplyr.tidyverse.org
Other
641 stars 75 forks source link

possible solution #45 #48

Closed CorradoLanera closed 5 years ago

CorradoLanera commented 7 years ago

The function proposed works recursively and it should be is able to manage explicit index for the nodes (maybe for future). Simple balancing was expected.

I've set up some testthat()s which failed with the previous formula/procedure and correctly pass with the proposed one.

CorradoLanera commented 7 years ago
hadley commented 7 years ago

I wonder if a simpler approach would to be simply iterate over n, at each point assigning to the cluster with the fewest current groups.

hadley commented 5 years ago

I'm going to use a completely new strategy - thanks for trying!