Closed malaterre closed 9 years ago
Steps from debian/jessie i386:
$ image_to_j2k -i lossless_issue.pgm -o lossless_issue.j2k $ j2k_to_image -i lossless_issue.j2k -o lossless_issue.opj.pgm $ vim -b lossless_issue.opj.pgm -> remove comment $ vbindiff lossless_issue.pgm lossless_issue.opj.pgm
I can reproduce the exact same behavior using openjpeg 2.1.0 (as packaged in debian).
It seems the compression step is not working as expected:
$ crc32 *
49d5c244 lossless_issue.pgm
c611132c lossless_issue.sid32.j2k
a913d7c3 lossless_issue.sid32.kdu.pgm
a913d7c3 lossless_issue.sid32.pgm
9922c6b1 lossless_issue.sid.j2k
49d5c244 lossless_issue.sid.kdu.pgm
49d5c244 lossless_issue.sid.pgm
The good news is that if I run:
$ valgrind opj_compress -i lossless_issue.pgm -o lossless_issue.sid32.j2k
Everything works as expected. So valgrind is doing some kind of initialisation (rounding?) step that is missing in normal code executition.
If I use now clang-3.5 from my sid 32bits chroot everything works nicely. So the issue is not within the libc and such, but rather in the code generated by gcc.
gcc-4.8, gcc-4.9 and gcc-5.2 all have the same symptoms
If I recompile openjpeg with -fsanitize=undefined
I get:
/home/mathieu/tmp/opj-bug/openjpeg/src/lib/openjp2/t1.c:1517:28: runtime error: left shift of negative value -128
ref:
https://github.com/uclouvain/openjpeg/blob/master/src/lib/openjp2/t1.c#L1517
EVen if I fix the code using:
tiledp[tileIndex] *= (1 << T1_NMSEDEC_FRACBITS);
it still fails
Now if I compile the whole code using gcc and then do:
(cd /home/mathieu/tmp/opj-bug/openjpeg/bin/src/lib/openjp2 && /usr/bin/clang -Dopenjp2_EXPORTS -O0 -g -fPIC -I/home/mathieu/tmp/opj-bug/openjpeg/bin/src/lib/openjp2 -o CMakeFiles/openjp2.dir/tcd.c.o -c /home/mathieu/tmp/opj-bug/openjpeg/src/lib/openjp2/tcd.c)
the code works as expected
It appears that the issue is within this function opj_tcd_makelayer
:
https://github.com/uclouvain/openjpeg/blob/master/src/lib/openjp2/tcd.c#L218
To fix the symptoms one could use excess precision using: -msse2 -mfpmath=sse
It is still not clear why such minor change leads to a lossy compressed stream, this must only be the tip of the iceberg.
So the patch is simply, replace:
if (dd / dr >= thresh)
with:
OPJ_FLOAT64 div;
div = dd / dr;
if (div >= thresh)
Then compile the code with -ffloat-store
to remove excess precision.
The gcc documentation states that -fexcess-precision=standard
should be enough but not in this case. We are still required to use -ffloat-store
which feels like a hack.
Confirmed by upstream gcc dev here. We really need to understand why the code is so sensible to excess precision.
@malaterre,
I had a look at the piece of code that's mentioned here. In this specific case, for one of the computed value (dd/dr "==" threshold), the comparison is true with -ffloat-store & false without it.
This can be shown easily with this piece of code (with a bit of context):
if (!dr) {
if (dd != 0)
n = passno + 1;
continue;
}
if ((dd / dr) < 0.45) {
printf("%.32f - %.32f\n", (dd / dr), thresh);
}
if ((dd / dr) >= thresh) {
n = passno + 1;
if ((dd / dr) < 0.45) {
printf("True compare\n");
}
}
}
layer->numpasses = n - cblk->numpassesinlayers;
We should not rely on "==" comparison for floats.
I'll add this specific image to the test suite if needed
Just in case my series of comments were not clear:
-ffloat-store
is not the correct solution. I'd rather see -fexcess-precision=standard
(implied using -std=c99
)-fexcess-precision=standard
only have effect on cast and assigment so code need to be change (either store the result in a variable or add an explicit cast with a comment)-ffast-math
, this option is just retar*d. So I should have made my commit 6d7f5ccc much clearer and removed completely. If some linux distro decide to use it, then it's there right to shoot themself in the foot. But OpenJPEG should not make it a default option when compiling in Release, that's just wrong.And finally as discussed with Antonin, I fail to understand this piece of code. So removing the excess precision using compiler specific option is just plain wrong. One would need to understand why this piece of code is so damn sensible to excess precision.
Thanks for adding the dataset to the test suite !
Should I re-open this issue for opj 1.5 branch ?
The issue can only be affected to one milestone... I think a new one shall be created or maybe just list this one it in #574 .
Note that even if you remove the excess precision, your code might still be affected by double rounding (which occurs with a probability of 1/2048 on random data). So, you really need to understand the code and how it can be affected by floating-point rounding errors. The bug was closed by doing "thresh - (dd / dr) < DBL_EPSILON" but one question is whether DBL_EPSILON is sufficient, i.e. what is the maximum error on thresh - (dd / dr) that you can expect?
Salut Vincent ! Indeed I had seen this at least once through a funky example. Do you believe that simply increasing the precision for DBL_EPSILON will make the issue go away ?
Depending on the context, a bound like DBL_EPSILON or something larger may or may not be correct. First, the code should be commented so that the reader can know what it tries to do, for instance some mathematical specification. About the floating-point implementation: The variable dr is an integer, so that I suppose that it is exact. The variables dd and thresh are floating-point numbers. There are two questions about them: What is the range of thresh and dd / dr? What is the floating-point error bound on the variables dd and thresh? Note that the current code may be regarded as suspicious, because the error bound DBL_EPSILON is an absolute one, while a relative one may be needed. For instance, if thresh and dd / dr can be large (or small) compared to 1, then the bound DBL_EPSILON does not make much sense. Hence my question about the range.
I am having an issue with OPJ 1.5 and the following file:
http://gdcm.sourceforge.net/thingies/lossless_issue.pgm
It looks as if I cannot do a round-trip lossless compression / decompression, within a 32bits schroot. Everything is fine within a 64bits schroot system (debian jessie if that matter).