Closed mscastanho closed 2 years ago
That function has get32 and getnn calls which are reading locations written by the hardware, i.e., outside the scope of the compiler. I am wondering if a 'volatile' keyword is missing in our or gcc header files?
@abalib You're right! Adding the following fixes the issue, even without my changes.
diff --git a/lib/nx_zlib.h b/lib/nx_zlib.h
index 8ace385..cb5a02d 100644
--- a/lib/nx_zlib.h
+++ b/lib/nx_zlib.h
@@ -277,8 +277,8 @@ typedef struct nx_stream_s {
* \details Amount of target data bytes an accelerator has written in
* processing this CRB.
*/
- uint32_t tpbc;
- uint32_t tebc;
+ volatile uint32_t tpbc;
+ volatile uint32_t tebc;
/* nx commands */
int flush;
There are probably other fields that would benefit from this as well.
I updated this PR to use @abalib's suggestion of using volatile. All tests are passing, and #86 is fixed. I was worried about the impact on performance, so I ran some benchmarks with some files from oct/. I couldn't get a reserved machine for this test, so there was some noise from other processes, but overall looks like the performance impact is negligible.
I can't reproduce the issue anymore with latest develop branch.
After more investigation we came to the conclusion that this is actually a valgrind issue with the isel
instruction. Using different compiler versions, optimization levels, disabling inlining and marking vars as volatile worked because they nded up changing the generated code to avoid using isel
. That explains why the issue only happens with valgrind, and not running it directly on the host. I'm still working on a proper reproducer to report it to valgrind so it gets fixed there.
So I'm closing this PR as I believe libnxz does not need to be changed.
~~test_multithread_stress has been failing when run under valgrind if libnxz is compiled with GCC 10 and optimization levels higher than -O1:~~
~~test_multithread_stress: nx_deflate.c:990: nx_compress_block_update_offsets: Assertion `overflow <= s->len_out/2' failed.~~
~~This issue is not present on more recent versions of GCC, and is highly sensitive to changes in the code, which makes it very hard to debug, so we don't have a root cause so far. Fortunately, disabling inlining for nx_compress_block_update_offsets is enough to avoid the issue for now without much impact.~~
~~Tested-by: Tulio Magno Quites Machado Filho tuliom@linux.ibm.com Signed-off-by: Matheus Castanho msc@linux.ibm.com~~
EDIT: I updated this PR to use @abalib's suggestion of using volatile.
We use macros like getnn, get[p]32 and get[p]64 to read values from addresses written by the NX engine. Since changes to these addresses are outside of the compiler's scope, it shouldn't be totally free to apply optimizations on the variables used to read from these addresses. So mark any such variables as volatile to avoid problems.
We have recently faced one of such issues: test_multithread_stress has been failing when run under valgrind if libnxz is compiled with GCC 10 and optimization levels higher than -O1:
test_multithread_stress: nx_deflate.c:990: nx_compress_block_update_offsets: Assertion `overflow <= s->len_out/2' failed.
It is fixed by this commit.
Fixes #86