This adds the associative keyword for mv and cp which makes combine applied recursively when there are multiple values that needs to be combined. This is the same as what is done for reducevalues and enables more parallelism.
I chose a different default value compared to reducevalues to avoid breakage, but unless someone objects I'll change the default to be consistent with reducevalues in a subsequent breaking release.
@jpsamaroo : I think this is a universal way to reduce in a parallel, but if you have time I appreciate a check that it's not just accidentally depending on some scheduler implementation detail.
Example:
julia> using FileTrees, Distributed
julia> addprocs(10; exeflags=["--project", "--threads=1"], lazy=false);
julia> @everywhere using FileTrees, Distributed
julia> @everywhere function myvcat(x, y)
@info "combine lengths $(length(x)) and $(length(y))"
sleep(1) # fake slowness
vcat(x,y)
end
julia> tt = mapvalues(identity, maketree("root" => ["next" => [(name=string(x), value=1:10) for x in 'a':'k']]); lazy=true);
julia> ttm = mv(tt, r"next/[a-z]$", s"next"; combine=myvcat, associative=false);
julia> @time exec(ttm);
From worker 8: [ Info: combine lengths 10 and 10
From worker 8: [ Info: combine lengths 20 and 10
From worker 8: [ Info: combine lengths 30 and 10
From worker 8: [ Info: combine lengths 40 and 10
From worker 8: [ Info: combine lengths 50 and 10
From worker 8: [ Info: combine lengths 60 and 10
From worker 8: [ Info: combine lengths 70 and 10
From worker 8: [ Info: combine lengths 80 and 10
From worker 8: [ Info: combine lengths 90 and 10
From worker 8: [ Info: combine lengths 100 and 10
11.606803 seconds (27.74 k allocations: 1.300 MiB)
julia> ttm_assoc = mv(tt, r"next/[a-z]$", s"next"; combine=myvcat, associative=true);
julia> @time exec(ttm_assoc);
From worker 2: [ Info: combine lengths 10 and 10
From worker 4: [ Info: combine lengths 10 and 10
From worker 6: [ Info: combine lengths 10 and 10
From worker 2: [ Info: combine lengths 10 and 10
From worker 7: [ Info: combine lengths 10 and 20
From worker 4: [ Info: combine lengths 10 and 20
From worker 2: [ Info: combine lengths 10 and 20
From worker 4: [ Info: combine lengths 20 and 30
From worker 2: [ Info: combine lengths 30 and 30
From worker 2: [ Info: combine lengths 50 and 60
4.793218 seconds (43.04 k allocations: 2.158 MiB, 0.90% compilation time)
Fixes #60
This adds the
associative
keyword formv
andcp
which makescombine
applied recursively when there are multiple values that needs to be combined. This is the same as what is done forreducevalues
and enables more parallelism.I chose a different default value compared to
reducevalues
to avoid breakage, but unless someone objects I'll change the default to be consistent withreducevalues
in a subsequent breaking release.@jpsamaroo : I think this is a universal way to reduce in a parallel, but if you have time I appreciate a check that it's not just accidentally depending on some scheduler implementation detail.
Example: