invenia / AxisSets.jl

Consistent operations over a collection of KeyedArrays
MIT License
5 stars 0 forks source link

Removing a dim does not violate the constraint on that dim #64

Open bencottier opened 3 years ago

bencottier commented 3 years ago

I thought this would throw an error.

MWE:

julia> ds = KeyedDataset(
           :train => KeyedArray(zeros(5, 2); target=1:5, id=[:a, :b]),
           :predict => KeyedArray(zeros(3, 2); target=6:8, id=[:a, :b]),
           constraints=Pattern[(:_, :id)]
       )
KeyedDataset with:
  2 components
    (:train,) => 5x2 KeyedArray{Float64} with dimension target, id[1]
    (:predict,) => 3x2 KeyedArray{Float64} with dimension target, id[1]
  1 constraints
    [1] (:_, :id) ∈ 2-element Vector{Symbol}

julia> map(A -> A(id=:a), ds, (:predict, :_))
KeyedDataset with:
  2 components
    (:train,) => 5x2 KeyedArray{Float64} with dimension target, id[1]
    (:predict,) => 3 KeyedArray{Float64} with dimension target
  1 constraints
    [1] (:_, :id) ∈ 2-element Vector{Symbol}

Whereas it throws an error if the dim preserved:

julia> map(A -> A(id=AxisKeys.Interval(:a, :a)), ds, (:predict, :_))
ERROR: KeyAlignmentError: Misaligned dimension keys on constraint Pattern((:_, :id))
  Tuple[(:predict, :id)] ∈ 1-element view(::Vector{Symbol}, [1]) with eltype Symbol
  Tuple[(:train, :id)] ∈ 2-element Vector{Symbol}

Stacktrace:
 [1] validate(ds::KeyedDataset, constraint::Pattern{Tuple{Symbol, Symbol}}, paths::Set{Tuple})
   @ AxisSets ~/.julia/packages/AxisSets/YTn0q/src/dataset.jl:293
 [2] validate(ds::KeyedDataset)
   @ AxisSets ~/.julia/packages/AxisSets/YTn0q/src/dataset.jl:267
 [3] map!(f::var"#5#6", dest::KeyedDataset, src::KeyedDataset)
   @ AxisSets ~/.julia/packages/AxisSets/YTn0q/src/functions.jl:52
 [4] map(f::Function, ds::KeyedDataset, keys::Tuple{Symbol, Symbol})
   @ AxisSets ~/.julia/packages/AxisSets/YTn0q/src/functions.jl:41
 [5] top-level scope
   @ REPL[9]:1
rofinn commented 3 years ago

Yeah, that's intentional cause it's pretty common to wanna drop dimensions and not have it error. We could have map impose that all shared dims remain... and require users to reconstruct the KeyedDataset from scratch if they want to change that, but that seems like it might get annoying.

bencottier commented 3 years ago

it's pretty common to wanna drop dimensions and not have it error. We could have map impose that all shared dims remain...

Dropping dimensions is common but I thought it should error because the :train and :predict components (in my example) become inconsistent. I don't think we should impose that all shared dims remain. Rather the same dim must be dropped across components, if there is a constraint on that dim between components.

So it would force the user to do map(A -> A(id=:a), ds, (:_, :id)) rather than map(A -> A(id=:a), ds, (:predict, :_)).

But maybe there is a good reason to only drop a dim on one component, which would make that annoying?