NaN comparison fails on certain image sizes

planetmarshall commented 4 years ago

System information (version)

OpenCV => 4.1.1
Operating System / Platform => Ubuntu 18.04
Compiler => Clang 9

Detailed description

Certain sizes of images fail when comparing for NaNs

Steps to reproduce

#include <opencv2/core.hpp>

int count_NaNs_in_image(int rows, int cols) {
    cv::Mat1f test(rows, cols);
    test.setTo(std::numeric_limits<float>::quiet_NaN());
    cv::Mat1b is_nan = test != test;
    return cv::countNonZero(is_nan);
}

int main() {
  assert(16 == count_NaNs_in_image(4, 4)); // succeeds
  assert(44 == count_NaNs_in_image(4, 11)); // fails with 12 NaNs found instead of 44?

  return 0;
}

Workaround

Invert the comparison, ie

int count_not_not_nans_in_image(int rows, int cols) {
    cv::Mat1f test(rows, cols);
    test.setTo(std::numeric_limits<float>::quiet_NaN());
    cv::Mat1b is_not_nan = test == test;
    return cv::countNonZero(255 - is_not_nan);
}

planetmarshall commented 4 years ago

Reproduced in current master (e037800e46da0c5ae45664703d29932e898d8871) with gcc 7.4

ganesh-k13 commented 4 years ago

@alalek , @planetmarshall Correct me if I am wrong:

[nan, nan, nan, nan;        [nan, nan, nan, nan; 
 nan, nan, nan, nan;   !=    nan, nan, nan, nan;
 nan, nan, nan, nan;         nan, nan, nan, nan;
 nan, nan, nan, nan]         nan, nan, nan, nan]

result should be:

[0, 0, 0, 0;
 0, 0, 0, 0;
 0, 0, 0, 0;
 0, 0, 0, 0]

so cv::countNonZero should give 0 right, not 16, first one 16 == count_NaNs_in_image(4, 4) itself is wrong.

alalek commented 4 years ago

There is problem related to optimization. OpenCV computer vision functions usually don't expect NaN or other special floating-points values. For example, equality on "regular" FP32 values can be optimized through integers of the same size: float => int (due to sizeof(float) == sizeof(int)):

NAN == NaN is false
but 0x7fc00000 == 0x7fc00000 is true

On some platforms (like mobile) this trick provides some benefits in speed and/or power efficiency.

In Computer Vision special floating point values should be normalized first through patchNaNs() call.

planetmarshall commented 4 years ago

@ganesh-k13 NaN == NaN = false (and NaN != NaN == true) - this is required by the IEEE 754 standard for floating point numbers.

planetmarshall commented 4 years ago

@alalek If that is the case then it should be documented somewhere, and there should also be some performance metrics demonstrating that it is a worthwhile optimization ("some benefits in speed/or power efficiency" should be quantified) - since it deviates from the behaviour defined by the IEEE-754 standard for floating point numbers. I think that there is potential for surprising behaviour here.

https://stackoverflow.com/questions/41759247/filter-opencv-mat-for-nan-values# https://answers.opencv.org/question/2221/create-a-mask-for-nan-cells/

ganesh-k13 commented 4 years ago

@alalek, would it make sense to check if Mat has NaN values and compare separately? Or would it be an unnecessary check that leads to performance drops as you mentioned 'computer vision functions usually don't expect NaN'

eudoxos commented 4 years ago

There is problem related to optimization. OpenCV computer vision functions usually don't expect NaN or other special floating-points values. For example, equality on "regular" FP32 values can be optimized through integers of the same size: float => int (due to sizeof(float) == sizeof(int)):

This behavior might have optimization intention behind it, but is definitely a bug. The standard mandates that floats conform to IEEE-754 which includes special values; and I'd expect matrix operators reflect that.

There is already a better way to enable possibly unsafe math optimizations: compiler flags such as -ffast-math, or preprocessor macros for agressive (and generally incorrect) optimizations. Their use here would be more than appropriate here.

While I agree this might have optimized out a few microsecs of computation, my lifetime was pessimized by several days, as this change went completely undocumented and broke stuff.

In Computer Vision special floating point values should be normalized first through patchNaNs() call. That might be your opinion, but again, it should be documented if there is OpenCV-wide consensus on that.

I'd at least propose to complement patchNaNs with findNaNs (returning binary mask). Referenced post at SO actually discusses how to do that (and shows a few ugly solutions).

opencv / opencv