Open rafaqz opened 1 year ago
Thinking about this we should probably start by adding a warning for these things.
Thinking about this we should probably start by adding a warning for these things.
Agreed with this! RE options 1-4, I'm still thinking through whether these would cause any problems for my use cases. I agree option 5 seems too extreme.
I am currently working through the xarray 45 minute tutorial and realized, that in DimensionalData we would happily broadcast two arrays together which happen to have the same size in the dimensions with totally different dimension values. Could we make it so, that we could specify the behaviour globally in some dictionary so that people could change the default behaviour without having to specify it on every broadcast.
I think we could have a global Dict from which we would get the keyword arguments for the comparedims function so that we could change that from the user side. Similar to the setup of the YAXDefaults in YAXArrays https://github.com/JuliaDataCubes/YAXArrays.jl/blob/b4bf44866fbd9c5fe1a78f42d3ff0b454a4102a2/src/YAXArrays.jl#L12C1-L20C2
Yeah this has been a thought for a while. Currently we check the name and order but not the values. And sometimes you don't want to check the values (e.g. with NoLookup
or with slight floating point differences).
So the global setting makes sense, it would be hard to incorporate into broadcast syntax.
It may be faster to have a Ref{NamedTuple{names,Tuple{Bool,Bool,etc}}}
instead of a Dict so that access is faster, as we will always have the same set of options.
(We could also allow "error levels" like allow, warn, and error for when problems are found)
@sethaxen and @felixcremer you may have opinions on this, as like
cat
this may break some downstream code.I'm hitting annoying issues broadcasting rasters, some with the Y dimension reversed, some without (reversed Y is very common in spatial data). I shouldn't be able to broadcast them together when the order is reversed, but I can, because only the dimension type is checked with
comparedims
during broadcasts. When there is a lot of data handling code it can be hard to track down where this happens.Checking order before broadcasting would reduce the potential for these bugs. But as broadcast can't have options passed to it easily, we need to make sure any changes to it are not debilitating for anyone.
So I am proposing:
ForwardOrdered
and the other isReverseOrdered
. This doesn't seem controversial to me, and is a compile time comparison. It would be achieved with anorder=true
keyword tocomparedims
.Other possibilities that also make sense:
Ordered
andUnordered
- this could be assumed with theorder
keyword, so we need to decide clearly either way.Categorical
andSampled
. Probably this never makes sense to do.Regular
andIrregular
. Probably this also never makes sense to do, and catches different valued lookups without needing a runtime check.A stronger check that I'm hesitant to do:
Locus
. This one has a runtime cost, and potentially more serious problem where floating point error in ranges converted to vectors causes false==
results in comparisons. That would be pretty annoying, so I'm hesitant to do this. Possiblyisapprox
would be more appropriate, but again there is no way for users to set the allowed tolerance. WithRegular
lookups we could also just check that the start/stop values match to reduce the run-time cost, with Irregular we would need to check all values to get the same guarantee.Things that should work but need better return type rules
Currently broadcast return types are determined by whichever is found first. Instead we need clear rules.
Points
andIntervals
- should always returnPoints
? We can assume the interval value is at least a good approximation for the point, but we do not know if the point value is a good approximation for the whole interval.NoLookup
andSampled
orCategorical
should always returnSampled
orCategorical
?To be clear everything should always be able to broadcast with
NoLookup
if the dimension itself matches.Also if users want to break any of these rules, they can broadcast over
parent(dimarray)
for the second/third etc array and get the current behavior.