shashi / FileTrees.jl

Parallel computing with a tree of files metaphor
http://shashi.biz/FileTrees.jl
Other
88 stars 6 forks source link

Does seems to work; #62

Open xiaodaigh opened 2 years ago

xiaodaigh commented 2 years ago
using Distributed
addprocs()

using CSV, FileTrees

CSV.write("c:/tmp/meh/a.csv", DataFrame(a = 1:3))
CSV.write("c:/tmp/meh/b.csv", DataFrame(a = 1:3))

a = FileTrees.FileTree("c:/tmp/meh", lazy=true)

b = FileTrees.load(a, lazy = true) do file
    1
end;

ok = mapvalues(b) do y
    y + 1
end;

ok2 = reducevalues(+, ok)

exec(ok2)

I am expecting 4 to be returned but it's complaining about

ERROR: LoadError: On worker 2:
KeyError: key Dagger [d58978e5-989f-55fb-8d15-ea34adc7bf54] not found

I am on Julia 1.7.2 and FileTrees 0.3.4

jpsamaroo commented 2 years ago

Try @everywhere using FileTrees before doing anything with FileTrees to ensure that FileTrees and Dagger are properly scoped in Main.

DrChainsaw commented 2 years ago

Longer explanation: When you addprocs with Distributed, you spin up new Julia processes which are pretty much independent of the process you call addprocs from. As such, they don't know anything about what modules you have loaded and what variables you have declared. This is also what allows Distributed to run on multiple machines connected over a network, e.g. a compute cluster. The @everywhere commad just means "run this command on all processes".

Note that if you are not running from the default environment (e.g. you have started Julia with --project or have ran Pkg.activate()) you also need to ensure that the added processes are running in the same environment or else you will get a similar error as above. Most failsafe way to do this is to run addprocs(...; exeflags ="--project").

Afaik, Dagger also makes seamless usage of Threads which don't require jumping through the above hoops to get parallelism as they run inside the same process. There are some subtleties w.r.t. memory allocation which in some cases makes running multiple threads slower than multiple processes, so it can be useful to try both though.