Use assocreduce in mapcompute instead of splatting

Fixes #72

There is a fair bit of overhead here, more than 100% more when the tree is really large. One argument why this could be acceptable is that people might not use the lazy mode unless they have heavy computations in which case the overhead should not be significant. I don't see assocreduce popping up in the profile viewer either, so maybe it is just more work for Dagger to move the partial results around.

I suppose one could have a more coarse grained splitting of the thunks array than going all the way down to a single element (e.g. doing the current splatting on up to 100-ish elements at the time). Not sure if it is worth the effort and extra moving parts though.

Benchmarks

4 Threads, no Distributed:

julia> smalltree = maketree("root" => [string(i) for i in 1:3]);

# Current main:
julia> @benchmark exec(FileTrees.load($Returns(1), $smalltree; lazy=true))
BenchmarkTools.Trial: 9588 samples with 1 evaluation.
 Range (min … max):  378.400 μs … 27.309 ms  ┊ GC (min … max): 0.00% … 64.64%
 Time  (median):     404.700 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   518.627 μs ±  1.306 ms  ┊ GC (mean ± σ):  9.67% ±  3.79%
 Memory estimate: 158.53 KiB, allocs estimate: 2728.

# With PR:
julia> @benchmark exec(FileTrees.load($Returns(1), $smalltree; lazy=true))
BenchmarkTools.Trial: 6720 samples with 1 evaluation.
 Range (min … max):  488.200 μs … 44.203 ms  ┊ GC (min … max): 0.00% … 74.48%
 Time  (median):     532.850 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   738.081 μs ±  1.742 ms  ┊ GC (mean ± σ):  9.43% ±  4.04%
 Memory estimate: 191.28 KiB, allocs estimate: 3320.

julia> mediumtree = maketree("root" => [string(i) for i in 1:1000]);

# Current main:
julia> @benchmark exec(FileTrees.load($Returns(1), $mediumtree; lazy=true))
BenchmarkTools.Trial: 75 samples with 1 evaluation.
 Range (min … max):  49.478 ms … 85.537 ms  ┊ GC (min … max):  0.00% … 21.06%
 Time  (median):     69.424 ms              ┊ GC (median):    21.73%
 Time  (mean ± σ):   66.818 ms ± 10.082 ms  ┊ GC (mean ± σ):  17.39% ± 10.23%
 Memory estimate: 41.53 MiB, allocs estimate: 572385.

# With PR:
julia> @benchmark exec(FileTrees.load($Returns(1), $mediumtree; lazy=true))
BenchmarkTools.Trial: 34 samples with 1 evaluation.
 Range (min … max):  136.017 ms … 174.173 ms  ┊ GC (min … max): 10.26% … 18.41%
 Time  (median):     143.190 ms               ┊ GC (median):    11.37%
 Time  (mean ± σ):   147.357 ms ±   9.639 ms  ┊ GC (mean ± σ):  13.47% ±  3.75%
 Memory estimate: 72.54 MiB, allocs estimate: 1133724.

julia> largetree = maketree("root" => [string(i) for i in 1:10000]);

# Current main:
julia> @benchmark exec(FileTrees.load($Returns(1), $largetree; lazy=true))
BenchmarkTools.Trial: 6 samples with 1 evaluation.
 Range (min … max):  806.612 ms …    1.073 s  ┊ GC (min … max): 22.84% … 35.78%
 Time  (median):     969.528 ms               ┊ GC (median):    31.32%
 Time  (mean ± σ):   960.597 ms ± 107.355 ms  ┊ GC (mean ± σ):  31.01% ±  6.69%
 Memory estimate: 489.02 MiB, allocs estimate: 6003015.

# With PR:
julia> @benchmark exec(FileTrees.load($Returns(1), $largetree; lazy=true))
BenchmarkTools.Trial: 3 samples with 1 evaluation.
 Range (min … max):  2.060 s …    2.708 s  ┊ GC (min … max): 25.88% … 31.11%
 Time  (median):     2.257 s               ┊ GC (median):    32.34%
 Time  (mean ± σ):   2.342 s ± 332.086 ms  ┊ GC (mean ± σ):  29.97% ±  3.43%
 Memory estimate: 928.02 MiB, allocs estimate: 11738336.

Distributed with 4 workers (single thread per worker):

julia> addprocs(4; exeflags=["--project=.", "-t 1"], lazy=false);

# Current main:
julia> @benchmark exec(FileTrees.load($Returns(1), $smalltree; lazy=true))
BenchmarkTools.Trial: 1323 samples with 1 evaluation.
 Range (min … max):  2.479 ms … 48.137 ms  ┊ GC (min … max): 0.00% … 27.85%
 Time  (median):     3.161 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.767 ms ±  3.109 ms  ┊ GC (mean ± σ):  2.54% ±  4.24%
 Memory estimate: 239.81 KiB, allocs estimate: 4244.

# With PR
julia> @benchmark exec(FileTrees.load($Returns(1), $smalltree; lazy=true))
BenchmarkTools.Trial: 803 samples with 1 evaluation.
 Range (min … max):  4.098 ms … 41.777 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     5.297 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.220 ms ±  3.890 ms  ┊ GC (mean ± σ):  2.03% ± 4.23%
 Memory estimate: 294.66 KiB, allocs estimate: 5232.

# Current main:
julia> @benchmark exec(FileTrees.load($Returns(1), $mediumtree; lazy=true))
BenchmarkTools.Trial: 2 samples with 1 evaluation.
 Range (min … max):  3.096 s …   3.139 s  ┊ GC (min … max): 1.29% … 1.51%
 Time  (median):     3.117 s              ┊ GC (median):    1.40%
 Time  (mean ± σ):   3.117 s ± 30.027 ms  ┊ GC (mean ± σ):  1.40% ± 0.16%
 Memory estimate: 182.52 MiB, allocs estimate: 6762341.

# With PR:
julia> @benchmark exec(FileTrees.load($Returns(1), $mediumtree; lazy=true))
BenchmarkTools.Trial: 2 samples with 1 evaluation.
 Range (min … max):  3.978 s …    4.406 s  ┊ GC (min … max): 1.67% … 2.08%
 Time  (median):     4.192 s               ┊ GC (median):    1.88%
 Time  (mean ± σ):   4.192 s ± 302.113 ms  ┊ GC (mean ± σ):  1.88% ± 0.29%
 Memory estimate: 236.52 MiB, allocs estimate: 7776384.

# Current main:
julia> @benchmark exec(FileTrees.load($Returns(1), $largetree; lazy=true))
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 406.002 s (1.05% GC) to evaluate,
 with a memory estimate of 14.73 GiB, over 607986258 allocations.

With PR:
julia> @benchmark exec(FileTrees.load($Returns(1), $largetree; lazy=true))
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 401.316 s (1.57% GC) to evaluate,
 with a memory estimate of 15.45 GiB, over 619338178 allocations.

shashi / FileTrees.jl

Use assocreduce in mapcompute instead of splatting #73