shashi / FileTrees.jl

Parallel computing with a tree of files metaphor
http://shashi.biz/FileTrees.jl
Other
88 stars 6 forks source link

mv does not seem to parallelize combine #60

Closed DrChainsaw closed 2 years ago

DrChainsaw commented 2 years ago

I think this boils down to merge, but mv makes for a simpler MWE. Glancing at the code it looks like it should parallelize, but maybe there is a good reason why it doesn't.

julia> using Distributed, FileTrees

julia> addprocs(10; exeflags="--project");

julia> tt = mapvalues(identity, maketree("root" => ["next" => [(name=string(x), value=1:10) for x in 'a':'k']]); lazy=true)
root/
└─ next/
   ├─ a (Thunk(#71, (1:10,)))
   ├─ b (Thunk(#71, (1:10,)))
   ├─ c (Thunk(#71, (1:10,)))
   ├─ d (Thunk(#71, (1:10,)))
   ├─ e (Thunk(#71, (1:10,)))
   ├─ f (Thunk(#71, (1:10,)))
   ├─ g (Thunk(#71, (1:10,)))
   ├─ h (Thunk(#71, (1:10,)))
   ├─ i (Thunk(#71, (1:10,)))
   ├─ j (Thunk(#71, (1:10,)))
   └─ k (Thunk(#71, (1:10,)))

julia> @everywhere function myvcat(x, y)
        @info "on $(myid())"
        vcat(x,y)
        end

julia> ttm = mv(tt, r"next/[a-z]$", s"next"; combine=myvcat)
root/
└─ next (Thunk(myvcat, (Thunk(myvcat, ...), Thunk(#71, ...))))

julia> exec(ttm)
[ Info: on 13
[ Info: on 13
[ Info: on 13
[ Info: on 13
[ Info: on 13
[ Info: on 13
[ Info: on 13
[ Info: on 13
[ Info: on 13
[ Info: on 13
root/
└─ next (110-element Vector{Int64})

julia> ttm = mapsubtrees(ttt -> reducevalues(myvcat, ttt), tt, r"[a-z]")
root/
└─ next/ (Thunk(myvcat, (Thunk(myvcat, ...), Thunk(myvcat, ...))))

julia> exec(ttm)
[ Info: on 13
[ Info: on 19
[ Info: on 16
[ Info: on 13
[ Info: on 14
[ Info: on 19
[ Info: on 14
[ Info: on 13
[ Info: on 19
[ Info: on 13
root/
└─ next/ (110-element Vector{Int64})

I believe that achieving the latter is one of the use cases of mv, right?

DrChainsaw commented 2 years ago

I suppose one would need to implement something similar to assocreduce in merge, right? Probably need the same flag as reducevalues too?