rafaqz / DimensionalData.jl

Named dimensions and indexing for julia arrays and other data
https://rafaqz.github.io/DimensionalData.jl/stable/
MIT License
271 stars 38 forks source link

Breaking: fix cat, again #515

Closed rafaqz closed 1 year ago

rafaqz commented 1 year ago

Ok, properly this time.

Breaking change for a number of minor reasons.

Constructed dimensions passed to dims will now replace the current dimensiion. Symbols or types will try to use the existing dimension values, or error if they don't make sense.

Passing constructed dimensions that are the wrong size will now break, where before it just used the existing dimensions. As with the last PR, buggy, incorrect lookups are no longer allowed, and will error result in a warning and cat on the parent array being returned.

sethaxen commented 1 year ago

Hm, I think this solution is still problematic, because it is completely fine for a third-party function to vcat, hcat, or cat 2 arrays together, and that code won't care about our dimensions, but because we care about dimensions, we raise an error and forbid our users from applying such a function to these inputs. I think to work around this I would need to special-case DimArrays in a number of functions that currently work for all array types.

From that perspective, falling back to the parent array type and raising a warning instead of an error is more useful. I wonder if this type-instability is really a major issue, since that fallback should create a typeunion of size 2, which is usually okay for performance.

An alternative would be to take this approach Int(s) are provided for dims, since generic Julia code will only provide those. But this causes an inconsistency between Symbol and Int dims for the same dimensions, which is not ideal.

rafaqz commented 1 year ago

Thanks, thats good feedback. I agree the warning and non-dimensional return value are probably the best outcome here.

Part of my reason to try for type stability was #500, but probably its fine compared to the cost of concatenation in most cases. And yes we should get a small union optimisation if its written properly.

rafaqz commented 1 year ago

@sethaxen, another question I would like to bounce off you:

If the dimension lookup values don't make sense - say they contain duplicate values or have the wrong order - should we try to keep the dimension wrappers and drop to NoLookup ? Or just drop everything and return cat on the parent arrays?

I'm trying to "get this right" once and for all, but its proving to be hard to do.

codecov-commenter commented 1 year ago

Codecov Report

Merging #515 (52ccfe6) into main (b9b4940) will increase coverage by 0.04%. Report is 25 commits behind head on main. The diff coverage is 73.60%.

@@            Coverage Diff             @@
##             main     #515      +/-   ##
==========================================
+ Coverage   89.38%   89.43%   +0.04%     
==========================================
  Files          39       39              
  Lines        2874     3369     +495     
==========================================
+ Hits         2569     3013     +444     
- Misses        305      356      +51     
Files Changed Coverage Δ
src/Dimensions/primitives.jl 87.06% <67.07%> (-6.83%) :arrow_down:
src/array/methods.jl 87.50% <77.87%> (-7.31%) :arrow_down:
src/Dimensions/indexing.jl 93.93% <100.00%> (ø)
src/precompile.jl 100.00% <100.00%> (ø)

... and 9 files with indirect coverage changes

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more