Open DrChainsaw opened 1 year ago
Yeah, the issue is probably that delayed(f)
constructs ~a closure~ an anonymous function (which is then immediately called with the passed arguments). I suspect that Dagger.@spawn
also has this issue right now, but it's more easily resolved due to how it interacts with Dagger's scheduler (everything can just be passed around in a Vector{Any}
, never fully interacting with inference until the task executes).
Thanks for chiming in @jpsamaroo. I don't follow you about the closure. It doesn't close over any variables? I thought it was that humongous tuple (xs...
) of all the values in the tree that causes inference trouble which somehow was avoided when all values are of the same type. I have had my eyes on it for while but it has never turned out to be problematic until now.
I kinda wish there was a way to tell Dagger to just schedule this set of Thunk
s for me without having to reduce them to a single Thunk
. I tried to recursively transform the Thunk
s to EagerThunk
s by just toeager(x::Thunk) = Dagger.spawn(x.f, toeager.(x.inputs)...)
but it reaches a bit too much into the internals to feel comfortable. It also had about twice the latency of the current solution (which might not be a problem since I guess people don't use FileTrees to do stuff which just takes a few milliseconds to do).
Sorry, I misspoke, I meant to say "anonymous function". Either way, yeah, we hit inference somewhere and then it blows up due to inference somehow not having a good way to deal with heterogeneous signatures.
Transforming a Thunk
to an EagerThunk
is definitely a bad idea (we do this internally in a controlled manner to keep the GC happy); a better solution is switching FileTrees over to Dagger.@spawn
wholesale.
Anyway, whether we use delayed
or Dagger.@spawn
, we'll still have this issue when we hit inference. There are two solutions I can think of:
Vector{Thunk}
and Dagger will unwrap Thunk
s automatically), and then add a built-in reduction operation that works on blocks of a collection. This is a medium-term goal for me, because it's both important but also complicated.Thanks, I got a bit worried that there was more going on there which I had missed.
I guess then the assocreduce option looks like the best one. I tried it and it was bit slower than the current mechanism, probably because it creates alot of intermediate arrays.
As for Dagger.@spawn
: Given this it seems like more of an API question, i.e do we just reexport spawn/@spawn
(maybe as an own delegating function) or add it as a keyword.
One place where the reexport option becomes attractive is map
which passes the File
s which in turn contain references to the whole FileTree
, meaning that if we spawn
on the user provided function internally we will be passing around every value in the tree to every worker. If the user instead used spawn
directly they can just spawn on the value computing function which hopefully does not need the whole tree. We have basically the same problem with map
for lazy values today.
Then again, perhaps one can just redesign the map
implementation, but I can't think of how to do that in a user friendly way.
MWE:
I suppose it is this line which happens to get a nice signature when everything has the same type.
Possible solution is to use
assocreduce
(or at least the same strategy) here. I will experiment with this in a PR, but other suggestions are welcome in the meantime.