Closed frauzufall closed 4 years ago
Wow, that's an embarrassing bug!
@gselzer Please look into the best way of addressing this, in both the current imagej-ops and the new scijava-ops port. Thank you!
I'm on it for imagej-ops
. DefaultMedian
calls DefaultQuantile
with quantile 0.5
, which (as I understand it) accomplishes a slightly different task: it returns the number in the array with the nearest rank. I'm thinking of rewriting DefaultMedian
with this algorithm that guarantees O(n) time.
This bug might have been introduced by the "merger" of the Median
, Percentile
and Quantile
ops in #318 (https://github.com/imagej/imagej-ops/commit/cd382ca303132d10af1bf97105b86b3795318a10).
@rimadoma wrote:
accomplishes a slightly different task: it returns the number in the array with the nearest rank.
According to the Quantile definition, it still should return the average of two values for even-sized distributions. Is this a different case for Percentile?
I'm thinking of rewriting DefaultMedian with this algorithm that guarantees O(n) time.
It seems that's how it was before #318. Do you think quantile
could benefit from the same fast algorithm, selecting the k-th smallest element for any quantile?
According to the Quantile definition, it still should return the average of two values for even-sized distributions.
My interpretation of the current implementation is that it returns the nearest rank. It's always return array.get(k)
.
This bug might have been introduced by the "merger" of the Median, Percentile and Quantile ops in #318
I see return array.get(k)
instead of return (array.get(k) + return array.get(k - 1)) / 2.0
for even n arrays in the old implementation too.
Do you think quantile could benefit from the same fast algorithm, selecting the k-th smallest element for any quantile?
I had a quick look at the paper the algorithm I linked is based on, and it seems to generalisable for quantiles. However, I'll need help there. I could get a PR started for the median implementation, and then ask for help?
Update: I've created a branch with the median implementation
So I figured that DefaultQuantile
and the code I linked both use quick select to find the median/quantile. The only difference is in how they pick the pivot value. The one I linked has O(n), worst case, but since the O(n^2) worst case for a random pivot is rare, the former doesn't offer much real benefit. Hence, I'm just gonna add a little bug fix instead of reimplementing.
Thanks a lot @rimadoma! And I'm glad you didn't rewrite—I should hope the time complexity of rank select is good enough.
Seems fixed, thanks @rimadoma!
I wanted to migrate code from Python to ImageJ2 and noticed the difference in calculating the median. In Ops, it does not return the mean of the two middle values for an even number of values.
.. prints
1.0
whereas python would return0.5
.The test for median would probably not catch this since the unsigned byte values in the middle of a 100x100px image could both be
128
.