Closed rafaqz closed 1 year ago
Merging #471 (f315588) into main (efa31ed) will increase coverage by
0.25%
. The diff coverage is95.56%
.
@@ Coverage Diff @@
## main #471 +/- ##
==========================================
+ Coverage 89.73% 89.99% +0.25%
==========================================
Files 38 39 +1
Lines 2660 2928 +268
==========================================
+ Hits 2387 2635 +248
- Misses 273 293 +20
Impacted Files | Coverage Δ | |
---|---|---|
src/DimensionalData.jl | 100.00% <ø> (ø) |
|
src/Dimensions/dimension.jl | 88.52% <ø> (ø) |
|
src/utils.jl | 85.71% <83.33%> (-9.12%) |
:arrow_down: |
src/Dimensions/primitives.jl | 93.30% <88.88%> (-0.37%) |
:arrow_down: |
src/stack/methods.jl | 96.15% <93.75%> (+0.69%) |
:arrow_up: |
src/stack/stack.jl | 91.66% <94.44%> (-1.39%) |
:arrow_down: |
src/array/show.jl | 95.09% <95.31%> (-1.25%) |
:arrow_down: |
src/Dimensions/show.jl | 95.77% <100.00%> (+0.06%) |
:arrow_up: |
src/LookupArrays/lookup_arrays.jl | 82.77% <100.00%> (+0.09%) |
:arrow_up: |
src/LookupArrays/show.jl | 84.72% <100.00%> (+0.66%) |
:arrow_up: |
... and 2 more |
... and 1 file with indirect coverage changes
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
This ended up being quite rambling. Compile time of show
was a big factor in REPL performance, so I fixed that too to some extent.
There's a proper SnoopPrecompile.jl block now that seems to help quite a bit with show.
More work could be done with SnoopCompile getting SnoopPrecompile to precompile more common use cases.
Other than that, simplification of the DimStack
type could be done
But that would be a breaking change.
If all stacks share the same dims we could skip layerdims
(it could be nothing
and just return dims.
combinedims
and rebuild_from_arrays
could be further optimised for compilation time, they're currently the worst offenders with most operations on a stack that return another stack passing through them.
We may be able to also drop combinedims
when all dims are shared between layers.
@rafaqz I'm traveling right now but will try to review this weekend.
No rush
I also seem to have messed up the formatting a bit optimising show
The idea with the show
optimisation is that in practice it was making everything else feel slower -often show was taking longer than the methods that led to it. So:
printstyled
)Getting it down to 0.1 seconds would be best, so you basically don't notice any lag, but its hard to get there.
Indeed, I'm seeing show
be much snappier. In particular the first display(::DimArray)
for typical arrays used to take 0.52s but now takes 0.17s. display(::AbstractDimStack)
went from 1.27s to 0.96s; do you think this could be sped up further?
With these changes we don't see much of a reduction in TTFX for I/O and common analyses. After performing the analysis once on one AbstractDimStack
, the compile time for a different AbstractDimStack
is substantially reduced, so precompilation could help us a lot here. But I wonder if the implementations are doing things that substantially increase compile time. How have you been going about identifying the compile time bottlenecks in DimensionalData?
I've been using @time
and SnoopCompile @snoopi_deep
on the furst run of various methods and visualising the output with ProfileView.jl. Also Cthulhu @descend
to understand what is being compiled. Its a little clunky as a workflow as you need to keep restarting julia, but it works.
Youre right, we could precompile more things to make the first runs better too. Feel free to PR more, but we can keep working on this over time too. Its also a good excercise to understand the tradeoffs to guide future design decisions.
I can post my ttfx benchmark script here when I get back to copenhagen (tonight or tomorrow), if you want something to get started.
using SnoopCompile
using ProfileView
@time using DimensionalData
# trees = invalidation_trees(invalidations)
# using Cthulhu
# Define dimarrays
A = [1.0 2.0 3.0; 4.0 5.0 6.0]
x, y, z = X([:a, :b]), Y(10.0:10.0:30.0; metadata=Dict()), Z()
dimz = x, y
da1 = permutedims(DimArray(A, (x, y); name=:one));
da2 = DimArray(Float16.(2A), (x, y); name=:two)
da3 = DimArray(Int.(3A), (x, y); name=:three);
da4 = DimArray(cat(4A, 5A, 6A, 7A; dims=3), (x, y, z); name=:exteradim);
# Snoop display
tinf = @snoopi_deep @time display(da1)
tinf = @snoopi_deep @time display(da2)
tinf = @snoopi_deep @time display(da3)
tinf = @snoopi_deep @time display(da4)
# View flamegraph of some snoop
fg = flamegraph(tinf)
ProfileView.view(fg)
# Some mixed type DimArrays to generate random stacks from
das = (da1, da2, da3, da4)
# Define random stacks
many_das1 = Tuple(rand(das, 14)); @time s1 = DimStack(many_das1)
many_das2 = Tuple(rand(das, 14)); @time s2 = DimStack(many_das2)
# Merging stacks
tinf = @snoopi_deep @time merge(s1, s2)
s = s1
tinf = @snoopi_deep @time display(s)
# Some functions over stacks
tinf = @time @snoopi_deep map(sum, s)
tinf = @snoopi_deep @time map(A -> A[X=1], s)
tinf = @snoopi_deep @time prod(s)
fg = flamegraph(tinf)
ProfileView.view(fg)
This PR reduces compile time of
DimStack
and functions relying onmap
by avoiding intermediateNamedTuple
s ofDimArray
. For whatever reason these are much slower to compile thanTuple
s ofDimArray
.There is also some work to use
Vector
rather thanTuple
for some methods, so that compilation only has to happen once andmap
is cheaper to compile.Also some
@nospecialise
to reduce compilation (it helps even when the method is not called directly, oddly enough)@sethaxen this has a lot less recompilation for stacks with different numbers of layers - for
map
andDimStack
. It would be good to know if it helps with your specific workloads.Now that precompiling "sticks" a bit, we could add some precompile statements to remove even the first compilation for some methods.