Open QuLogic opened 6 years ago
Since swt has some debug infrastructure, I toggled OUTPUT_INTERMEDIATE_IMGS
flag. On x86_64 (which passes), the output is:
SWT> 358084 rays found
SWT> 6410 possible letters found
SWT> Filtering 1: 6402 letters found
SWT> Filtering 2: 6349 letters found
SWT> 8898 valid pairs found
SWT> 2670 chains after merges
SWT> 9350 letters rendered
but on 32-bit, the output is:
SWT> 358084 rays found
SWT> 6410 possible letters found
SWT> Filtering 1: 6402 letters found
SWT> Filtering 2: 6349 letters found
SWT> 8898 valid pairs found
SWT> 2670 chains after merges
SWT> 9352 letters rendered
The swt_0003_gaussian
, swt_0004_sobel_intensity
, and swt_0005_sobel_direction
differ, though the first two are barely visible. The diff on the last one is:
But then the remaining intermediate images somehow match.
On ppc64le, the output for test_swt2
is:
SWT> 346931 rays found
SWT> 8777 possible letters found
SWT> Filtering 1: 8761 letters found
SWT> Filtering 2: 8677 letters found
SWT> 9489 valid pairs found
SWT> 2991 chains after merges
SWT> 9916 letters rendered
which is quite a bit different from either x86 attempt. This one fails starting from the swt_0002_canny
file, but I guess that makes sense since test_canny
also fails.
but then barely differs after the Gaussian:
or sobel intensity:
but is quite different after sobel direction:
and swt contains many extra rays:
and after the median too:
The rest are basically a combination of the canny diff and the extra rays.
So it seems like the problems on ppc64le stem all the way back to the Gaussian convolution. For example, one point in an area that's all 253 becomes 252.9999999999999716 after the x convolution, then 252.9999999999999432 after the y convolution. Then going through the Sobel filter, some results are 0.0000000000000568 and some are -0.0000000000001705 (instead of 0) because those previous values were not quite right. Then the direction image produces 3pi/4 instead of 0 because of the slightly negative x value input to atan2 (which is where those large swaths of differences in that image arise).
I don't really understand why this only affects ppc64le that way, especially because there's also a ppc64 (big-endian) which works fine.
It seems to be related to the use (or not) of SSE instructions: https://stackoverflow.com/questions/14749929/c-float-operations-have-different-results-on-i386-and-arm-but-why It did a quick test: Enabling explicitly SSE2 instructions makes the results consistent between i386 and amd64.
I assume those instructions do not exist at all on ppc64, so we will always have slight differences compared to i386/amd64. Therefore I'm not sure if it makes sense to run such strict tests on this kind of platform .. :/
Fix for i386 : ed348171aaa9c4829bd9cd2de92c1fd9fa5cc90a
BTW, thanks for your investigation. Without it, it would have taken me a while to realize that the difference between i386 and amd64 came from floating point numbers.
PPC64 does not have SSE2; it has AltiVec. Still doesn't really explain why little-endian fails but not big-endian.
Unfortunately, Fedora supports i686 for 32-bit systems, which means no SSE2 support; I suppose I'll have to skip that test.
As noted in #2, tests fail on alternate arches. Though #10 has fixed some of them, there are still a few problems which I don't really know how to fix as I'm not really familiar with the algorithms. Below are the results and diffs that might be clearer to someone with more idea of what the algorithms are doing to figure out what's wrong.
On i686,
test_swt2
fails: with this diff (might need to zoom in to see):On ppc64le,
test_swt
fails: with this diff:On ppc64le,
test_swt2
fails: with this diff:On ppc64le,
test_canny
fails: with this diff: