shashi / FileTrees.jl

Parallel computing with a tree of files metaphor
http://shashi.biz/FileTrees.jl
Other
88 stars 6 forks source link

Add associative keyword for mv and cp #64

Closed DrChainsaw closed 2 years ago

DrChainsaw commented 2 years ago

Fixes #60

This adds the associative keyword for mv and cp which makes combine applied recursively when there are multiple values that needs to be combined. This is the same as what is done for reducevalues and enables more parallelism.

I chose a different default value compared to reducevalues to avoid breakage, but unless someone objects I'll change the default to be consistent with reducevalues in a subsequent breaking release.

@jpsamaroo : I think this is a universal way to reduce in a parallel, but if you have time I appreciate a check that it's not just accidentally depending on some scheduler implementation detail.

Example:

julia> using FileTrees, Distributed

julia> addprocs(10; exeflags=["--project", "--threads=1"], lazy=false);

julia> @everywhere using FileTrees, Distributed

julia> @everywhere function myvcat(x, y)
                   @info "combine lengths $(length(x)) and $(length(y))"
                   sleep(1) # fake slowness
                   vcat(x,y)
               end

julia> tt = mapvalues(identity, maketree("root" => ["next" => [(name=string(x), value=1:10) for x in 'a':'k']]); lazy=true);

julia> ttm = mv(tt, r"next/[a-z]$", s"next"; combine=myvcat, associative=false);

julia> @time exec(ttm);
      From worker 8:    [ Info: combine lengths 10 and 10
      From worker 8:    [ Info: combine lengths 20 and 10
      From worker 8:    [ Info: combine lengths 30 and 10
      From worker 8:    [ Info: combine lengths 40 and 10
      From worker 8:    [ Info: combine lengths 50 and 10
      From worker 8:    [ Info: combine lengths 60 and 10
      From worker 8:    [ Info: combine lengths 70 and 10
      From worker 8:    [ Info: combine lengths 80 and 10
      From worker 8:    [ Info: combine lengths 90 and 10
      From worker 8:    [ Info: combine lengths 100 and 10
 11.606803 seconds (27.74 k allocations: 1.300 MiB)

julia> ttm_assoc = mv(tt, r"next/[a-z]$", s"next"; combine=myvcat, associative=true);

julia> @time exec(ttm_assoc);
      From worker 2:    [ Info: combine lengths 10 and 10
      From worker 4:    [ Info: combine lengths 10 and 10
      From worker 6:    [ Info: combine lengths 10 and 10
      From worker 2:    [ Info: combine lengths 10 and 10
      From worker 7:    [ Info: combine lengths 10 and 20
      From worker 4:    [ Info: combine lengths 10 and 20
      From worker 2:    [ Info: combine lengths 10 and 20
      From worker 4:    [ Info: combine lengths 20 and 30
      From worker 2:    [ Info: combine lengths 30 and 30
      From worker 2:    [ Info: combine lengths 50 and 60
  4.793218 seconds (43.04 k allocations: 2.158 MiB, 0.90% compilation time)