shashi / FileTrees.jl

Parallel computing with a tree of files metaphor
http://shashi.biz/FileTrees.jl
Other
88 stars 6 forks source link

Large lazy trees with heterogenous values causes inference overflow in Dagger #72

Open DrChainsaw opened 1 year ago

DrChainsaw commented 1 year ago

MWE:

largetree = maketree("root" => [string(i) for i in 1:600]);

# All values of same type seem to be ok
FileTrees.load(largetree; lazy=true) do file
       "AAA"
end |> exec;

FileTrees.load(largetree; lazy=true) do file
       isodd(parse(Int, name(file))) ? "AAA" : 13
end |> exec;
# Internal error: stack overflow in type inference of  (::Dagger.var"#85#86"{FileTrees.var"#165#169", Tuple{Dagger.Chunk{Array{String, 1}, etc...
# This might be caused by recursion over very long tuples or argument lists.

# Small(ish) trees are always ok it seems 
smalltree = maketree("root" => [string(i) for i in 1:300]);

FileTrees.load(smalltree; lazy=true) do file
       isodd(parse(Int, name(file))) ? "AAA" : 13
end |> exec;

I suppose it is this line which happens to get a nice signature when everything has the same type.

Possible solution is to use assocreduce (or at least the same strategy) here. I will experiment with this in a PR, but other suggestions are welcome in the meantime.

jpsamaroo commented 1 year ago

Yeah, the issue is probably that delayed(f) constructs ~a closure~ an anonymous function (which is then immediately called with the passed arguments). I suspect that Dagger.@spawn also has this issue right now, but it's more easily resolved due to how it interacts with Dagger's scheduler (everything can just be passed around in a Vector{Any}, never fully interacting with inference until the task executes).

DrChainsaw commented 1 year ago

Thanks for chiming in @jpsamaroo. I don't follow you about the closure. It doesn't close over any variables? I thought it was that humongous tuple (xs...) of all the values in the tree that causes inference trouble which somehow was avoided when all values are of the same type. I have had my eyes on it for while but it has never turned out to be problematic until now.

I kinda wish there was a way to tell Dagger to just schedule this set of Thunks for me without having to reduce them to a single Thunk. I tried to recursively transform the Thunks to EagerThunks by just toeager(x::Thunk) = Dagger.spawn(x.f, toeager.(x.inputs)...) but it reaches a bit too much into the internals to feel comfortable. It also had about twice the latency of the current solution (which might not be a problem since I guess people don't use FileTrees to do stuff which just takes a few milliseconds to do).

jpsamaroo commented 1 year ago

Sorry, I misspoke, I meant to say "anonymous function". Either way, yeah, we hit inference somewhere and then it blows up due to inference somehow not having a good way to deal with heterogeneous signatures.

Transforming a Thunk to an EagerThunk is definitely a bad idea (we do this internally in a controlled manner to keep the GC happy); a better solution is switching FileTrees over to Dagger.@spawn wholesale.

Anyway, whether we use delayed or Dagger.@spawn, we'll still have this issue when we hit inference. There are two solutions I can think of:

DrChainsaw commented 1 year ago

Thanks, I got a bit worried that there was more going on there which I had missed.

I guess then the assocreduce option looks like the best one. I tried it and it was bit slower than the current mechanism, probably because it creates alot of intermediate arrays.

As for Dagger.@spawn: Given this it seems like more of an API question, i.e do we just reexport spawn/@spawn (maybe as an own delegating function) or add it as a keyword.

One place where the reexport option becomes attractive is map which passes the Files which in turn contain references to the whole FileTree, meaning that if we spawn on the user provided function internally we will be passing around every value in the tree to every worker. If the user instead used spawn directly they can just spawn on the value computing function which hopefully does not need the whole tree. We have basically the same problem with map for lazy values today.

Then again, perhaps one can just redesign the map implementation, but I can't think of how to do that in a user friendly way.