invenia / AxisSets.jl

Consistent operations over a collection of KeyedArrays
MIT License
5 stars 0 forks source link

Loosen component dict type constraints #42

Closed rofinn closed 3 years ago

rofinn commented 3 years ago

Also, restrict components to KeyedArray rather than XArray. Reasons for the change.

  1. Support empty KeyedDataset() constructor which requires loosened key/value type constraints to be useful.
  2. We want to support variable types and depths for the keys. If you construct a KeyedDataset with only keys (:train, :input, :load) and (:predict, :input, :load) then adding (:train, :output) would error.
  3. The XArray type (Union) was rather verbose and made setting up the correct dispatch harder. Since we don't have a use-case where we want NamedDimsArray(KeyedArray(...)) I don't think it's worth supporting.

Closes #39

codecov[bot] commented 3 years ago

Codecov Report

Merging #42 (8e215cb) into main (2863637) will increase coverage by 0.02%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #42      +/-   ##
==========================================
+ Coverage   98.86%   98.88%   +0.02%     
==========================================
  Files           7        7              
  Lines         265      270       +5     
==========================================
+ Hits          262      267       +5     
  Misses          3        3              
Impacted Files Coverage Δ
src/AxisSets.jl 75.00% <ø> (ø)
src/functions.jl 100.00% <ø> (ø)
src/indexing.jl 100.00% <ø> (ø)
src/dataset.jl 100.00% <100.00%> (ø)
src/flatten.jl 97.91% <100.00%> (ø)
src/impute.jl 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 2863637...8e215cb. Read the comment docs.

rofinn commented 3 years ago

Based on a couple our current use-cases this shouldn't have that big of an impact on performance because we're operating on relative few large components, so the type stability of the component arrays will have a larger impact on performance than the key tuples.

Julia 1.6 on macos

@btime KeyedDataset(
           flatten([
               :train => [
                   :temp => KeyedArray([1.0 1.1; missing 2.2; 3.0 3.3]; time=1:3, id=[:a, :b]),
                   :load => KeyedArray([7.0 7.7; 8.0 missing; 9.0 9.9]; time=1:3, loc=[:x, :y]),
               ],
               :predict => [
                   :temp => KeyedArray([1.0 missing; 2.0 2.2; 3.0 3.3]; time=1:3, id=[:a, :b]),
                   :load => KeyedArray([7.0 7.7; 8.1 missing; 9.0 9.9]; time=1:3, loc=[:x, :y]),
               ]
           ])...,
           constraints = Pattern[(:__, :time), (:__, :loc), (:train, :__, :id)]
       );

Before: 397.795 μs (787 allocations: 37.68 KiB) After: 397.598 μs (864 allocations: 41.12 KiB)

@btime Impute.filter($constrained_ds; dims=:id)

Before: 112.706 μs (883 allocations: 57.52 KiB) After: 72.201 μs (794 allocations: 42.45 KiB)