Open SimonHeybrock opened 1 month ago
I think including data on the upper edge would make sense, especially since you can use max()
to make bin edges.
a concat followed by a bin or hist can drop data!
Not sure I followed everything. Initially, I thought data may be included twice, instead of being dropped...
a concat followed by a bin or hist can drop data!
Not sure I followed everything. Initially, I thought data may be included twice, instead of being dropped...
The problem is that bin
assumes that outer coords do not lie. It may thus assume that data is in the correct bin already, and not move it to another bin. Or it may consider it as "outside" the (current) bin and thus drop it. Or it could move it to the correct bin. See the warning here: https://scipp.github.io/generated/functions/scipp.bin.html.
A long time ago we decided that the bin edges provided to algorithms such as
bin
andhist
should exclude the right bin edge. The argument was that this makes all bins consistent and, first and foremost, ensures that we can self-consistentlyconcat
the results of binning data in rangesA, B
withB, C
.In practice however it is surprising behavior to users that one cannot use the
max
of the data as thestop
in alinspace
to create bin edges. Furthermore, this behavior is different from, say,numpy.histogram
.I am thus wondering if the gained self-consistency may be outweighed by the downsides? The number of times users will run into the consistency problem is likely much lower than the number of times they run into the downsides.
It should be noted that without ensuring the aforementioned consistency, a
concat
followed by abin
orhist
can drop data! This is because the algorithm assumes that the existing bin edge to not "lie", but afterconcat
some data can be on the wrong side ofB
.