rafaqz / DimensionalData.jl

Named dimensions and indexing for julia arrays and other data
https://rafaqz.github.io/DimensionalData.jl/stable/
MIT License
271 stars 38 forks source link

Generalize eachslice #462

Closed sethaxen closed 1 year ago

sethaxen commented 1 year ago

This PR:

Note that #418 also would add eachslice support for v1.9. The implementation in this PR avoids invoke. @ParadaCarleton you've probably thought about this more than I have. Any reason to not use this approach?

codecov-commenter commented 1 year ago

Codecov Report

Merging #462 (533f68b) into main (11be8da) will increase coverage by 0.17%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #462      +/-   ##
==========================================
+ Coverage   89.46%   89.64%   +0.17%     
==========================================
  Files          37       38       +1     
  Lines        2601     2636      +35     
==========================================
+ Hits         2327     2363      +36     
+ Misses        274      273       -1     
Impacted Files Coverage Δ
src/array/methods.jl 96.51% <100.00%> (+0.38%) :arrow_up:
src/stack/methods.jl 95.45% <100.00%> (+0.45%) :arrow_up:
src/DimensionalData.jl 100.00% <0.00%> (ø)
src/LookupArrays/LookupArrays.jl 100.00% <0.00%> (ø)
src/Dimensions/Dimensions.jl 100.00% <0.00%> (ø)
src/Dimensions/dimension.jl 88.52% <0.00%> (+0.19%) :arrow_up:
src/LookupArrays/metadata.jl 80.85% <0.00%> (+0.41%) :arrow_up:
src/name.jl 92.30% <0.00%> (+0.64%) :arrow_up:
src/array/array.jl 95.96% <0.00%> (+0.80%) :arrow_up:
src/LookupArrays/lookup_traits.jl 95.12% <0.00%> (+1.00%) :arrow_up:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

ParadaCarleton commented 1 year ago

I don't see any problems with this; invoke just happened to be the first way I thought of doing this. But if lots of this is copy-pasted from Base, avoiding code duplication and maintenance problems by using @invoke strikes me as the best solution.

sethaxen commented 1 year ago

I'll do some benchmarking to see how fast @invoke is compared to using dimensions as this PR does.

sethaxen commented 1 year ago

Quick benchmark:

julia> using Revise, Random, BenchmarkTools

julia> using DimensionalData

julia> Random.seed!(42);

julia> da = DimArray(randn(5, 2, 10), (X(1:5), Y(1:2), Z(1:10)));

julia> dims = (Z, X);

julia> dims2 = (:Z, :X);

julia> dims3 = Dimensions.dims(da, (Z, X));

julia> foo(x, dims) = eachslice(x; dims=dims);

using @invoke:

julia> @btime foo($da, $dims);
  4.735 μs (55 allocations: 2.89 KiB)

julia> @btime foo($da, $dims2);
  2.819 μs (37 allocations: 2.48 KiB)

julia> @btime foo($da, $dims3);
  2.876 ns (0 allocations: 0 bytes)

this PR:

julia> @btime foo($da, $dims);
  3.120 μs (34 allocations: 2.00 KiB)

julia> @btime foo($da, $dims2);
  2.040 μs (23 allocations: 1.84 KiB)

julia> @btime foo($da, $dims3);
  2.875 ns (0 allocations: 0 bytes)

So what this PR does is slightly faster for UnionAll and Symbol dimensions.

More importantly, when drop=false, the version in base inserts a Base.OneTo for the singleton axes, which drops the dimensional information, whereas in this PR we construct a singleton dim like for a reduction. So if we use @invoke, we would then need to construct ax in the drop==true branch ourselves anyways. Thus this approach is more straightforward than @invoke.

ParadaCarleton commented 1 year ago

Ok, that makes sense.