`partition()` seams don't manage unbalanced (number) of group/core #45

Closed CorradoLanera closed 5 years ago

CorradoLanera commented 7 years ago
# partitioning 9 df-rows grouped in 7 groups on 7 core
# windows server 2012 R2 (64 bit, R3.2.2, RStudio 1.0.136) 
# `partition()` distribute data only in 6 core with more
# than a group in some core and left the last one empty.

# 4 cpu / 8 core / winserver 2012 R2
# note: i was not able to reproduce a similar issue on my
# 2 cpu / 4 core macbook-pro

df <- data_frame(
    df_to_be_modelled = map(seq_len(9),
                            ~ mtcars[seq_len(.), ] 

# suppose data are very unbalanced and that the time
# to model a couple of the first is quite the same spent
# to model one of the lasts: you like to group in a way 
# each core works quite the same amount of time
# (and use all "max - 1" core).

cluster <- create_cluster() # n - 1 =  7 by default
#> Initialising 7 core cluster.

df %<>% mutate(group = c(1L, 2L, 2L, 1L, 3L, 4L, 5L, 6L, 7L))

df_cl <- df %>% partition(group)
#> Source: party_df [9 x 2]
#> Groups: group
#> Shards: 6 [1--2 rows]
#> # S3: party_df
#>       df_to_be_modelled group
#>                  <list> <int>
#> 1 <data.frame [8 × 11]>     6
#> 2 <data.frame [2 × 11]>     2
#> 3 <data.frame [3 × 11]>     2
#> 4 <data.frame [5 × 11]>     3
#> 5 <data.frame [6 × 11]>     4
#> 6 <data.frame [7 × 11]>     5
#> 7 <data.frame [9 × 11]>     7
#> 8 <data.frame [1 × 11]>     1
#> 9 <data.frame [4 × 11]>     1

#> [[1]]
#> [1] "ukwoanoyti"
#> [[2]]
#> [1] "ukwoanoyti"
#> [[3]]
#> [1] "ukwoanoyti"
#> [[4]]
#> [1] "ukwoanoyti"
#> [[5]]
#> [1] "ukwoanoyti"
#> [[6]]
#> [1] "ukwoanoyti"
#> [[7]]
#> character(0)

# the first cluster have two different groups
# the last one have no groups, i.e. have no data
# note: the two observation of group 1 are both in the same
# node (i.e. cluster 4), as well as the two of group 2 (i.e. cluster 6).
# cluster 1 is the only one with two different groups.

actual_name <- cluster_ls(cluster)[[1]]
# cluster_eval(cluster, purrr::safely(print)(<name into `actual_name`>))
# sorry, I don't know how to do it in a simple automatic way
hadley commented 7 years ago

Can you please use the reprex package to generate your reprex? It will fix your formatting issues.

CorradoLanera commented 7 years ago

Done. Is it all 0k now? I didn't know that package. (note: I was not able to automatise the last expression, but I think that the results should still be clear).

CorradoLanera commented 7 years ago

It's not really fixed: I'm still working on it.

hadley commented 5 years ago

I've completely rewritten the algorithm and I'll have a fix pushed shortly.