possible solution #45 - Githubissues

CorradoLanera commented 7 years ago

The function proposed ~~works recursively and it should be~~ is able to manage explicit index for the nodes (maybe for future). Simple balancing was expected.

I've set up some testthat()s which failed with the previous formula/procedure and correctly pass with the proposed one.

CorradoLanera commented 7 years ago

no recursive function
a more balanced strategy to assign groups to clusters
profvis(partition(flights, flight)): the most slow actions are in partition_ and they are:
- [ ] nrow [640ms], caused by lazyloadDBfetch (~ number of rows) --- I don't know how to improve this (if it is possible)
- [ ] grouping_part [860ms], mainly caused by the resizing of two vectors in the main list --- I'll think about how to improve this one. Anyway, given the number of groups (g) and the number of clusters (k) the complexity is:
  - O(1) for k >= g,
  - O(gk^2 - k^3) for k < g; for k << g (as usual) it is linear wrt g. that said, IMO and if you agree with the way the groups are balanced among the core, the main improve (if necessary) will be due to the elimination of those resizing.

hadley commented 7 years ago

I wonder if a simpler approach would to be simply iterate over n, at each point assigning to the cluster with the fewest current groups.

hadley commented 5 years ago

I'm going to use a completely new strategy - thanks for trying!

tidyverse / multidplyr

possible solution #45 #48