Closed mscastanho closed 2 years ago
I believe the PR is now ready for review. Putting zlib in a different link namespace fixed issue #187, so I included it here. I also added tests for the reset functions to verify NX state is always being reset.
Note to reviewers: commit Make *Reset functions always restore NX state
seems big, but the changes to nx_inflate.c and nx_deflate.c are pretty
much the same.
Turns out test_stress.mix
and some other tests are failing when libnxz is linked against older glibcs. It works fine with glibc 2.34 and later, but fails with glibc 2.32 (AT 14) and 2.31 (Debian 11). Bisect points to the commit "fixing" #187.
And I wonder if we need the map or it would work well just keeping the last entry.
@rzinsly I'm afraid we need the map for two reasons:
Turns out test_stress.mix and some other tests are failing when libnxz is linked against older glibcs. It works fine with glibc 2.34 and later, but fails with glibc 2.32 (AT 14) and 2.31 (Debian 11). Bisect points to the commit "fixing" https://github.com/libnxz/power-gzip/issues/187.
@rzinsly @tuliom I investigated further and this is what I found out: test_stress.sw
and any other test that calls sw_deflateInit2_
in a multithreaded scenario is hitting a segfault on glibc's __get_nprocs
. I bisected glibc and the commit that fixes it is:
commit b5eeca8cfd9d0fd92b5633a88901d9ff27f2b496
Author: Andreas Schwab <schwab@linux-m68k.org>
Date: Tue Dec 8 19:17:41 2020 +0100
Fix parsing of /sys/devices/system/cpu/online (bug 25859)
The file contains comma-separated ranges, not spaces.
even though none of the hosts I tested the code on have commas and/or spaces in that file. This commit was included in glibc 2.33. The only distro with an older glibc that still passes the tests is the one from SLES15. Using Advance Toolchain 15 is enough to fix the issue as well. See below:
SLES15 (glibc 2.31): OK
RHEL8 (glibc 2.28): FAIL
RHEL9 (glibc 2.34): OK
Ubuntu 22.04 (glibc 2.35): OK
Ubuntu 20.04 (glibc 2.31): FAIL
Debian 10 (glibc 2.28): FAIL
Debian 11 (glibc 2.31): FAIL
Advance Toolchain 14.0 (glibc 2.32): FAIL
Advance Toolchain 15.0 (glibc 2.34): OK
If I run the same test in containers (tested with RHEL8 and Debian 11) it works 🤷🏼 .
Also note this only happens when glibc is in a separate link namespace (dlmopen(LM_ID_NEWLM,...
).
So I don't think we can ignore this issue, but at the same time I couldn't find a solution for this, or an alternative to solve #187. Ideas?
I have just updated this PR with a slightly modified implementation.
In this new implementation, I give a different interpretation on how deflateResetKeep()
and inflateResetKeep()
should behave.
In his original proposal, @mscastanho interpreted that both these function should behave as their counterparts deflateReset()
and inflateReset()
.
While I agree with his implementation and I believe that's the most logical too, it is not practical to move forward with it because of bug #187 .
In this new interpretation deflateResetKeep()
and inflateResetKeep()
act as simpler reset functions, i.e. they reset only the stream being used without modifying the other one or recovering the NX stream after switching to software.
This is important in order to avoid that zlib's deflate()
/inflate()
ends up calling libnxz's deflateResetKeep()
/inflateResetKeep()
which would recover the NX stream while zlib still expects its own stream there.
Note: I noticed a segmentation fault on RHEL 7, but I've been able to reproduce this issue with code already in develop
.
The
*Reset_skeleton
functions are not needed anymore but I'm fine with keeping them. LGTM
Let me remove them. I'm also planning to improve documentation. I think this will be important in order to explain the current decisions in this proposal.
Done!
@tuliom The changes look good, I just didn't understand the reason for this one:
diff --git a/test/test_reset.c b/test/test_reset.c
index 4625f03..57069d0 100644
--- a/test/test_reset.c
+++ b/test/test_reset.c
@@ -25,16 +25,18 @@
static void validate_stream(z_stream *strm, struct internal_state *sw_state,
nx_streamp hw_state) {
+#ifndef KEEP
/* NX state should always be loaded */
assert(has_nx_state(strm));
- /* Pointers to internal structures shouldn't change */
- nx_streamp s = (nx_streamp) strm->state;
- assert(s == hw_state);
- assert(s->sw_stream == sw_state);
-
/* Should be ready for fallback */
- assert(s->switchable == 1);
+ assert(hw_state->switchable == 1);
+#endif
+
+ if (hw_state->sw_stream) {
+ /* Pointers to internal structures shouldn't change */
+ assert(hw_state->sw_stream == sw_state);
+ }
}
int main()
The way it's currently written we're not checking that the currently loaded NX state is switchable, only the initial one (from before deflateReset), and we're also not checking that it didn't change.
@mscastanho You're right. I shouldn't have changed the code path for deflateReset()
and inflateReset()
. Just for *ResetKeep()
.
While fixing this, I figured out the code setting hw_state
and sw_state
could be set closer to deflateInit()
and inflateInit()
.
I also modified the code that zeroes strm
in order to zero all the structure, instead of a few fields.
These are the new differences:
$ $ git diff 7ac8ea2 -- test/test_reset.c
diff --git a/test/test_reset.c b/test/test_reset.c
index 4625f03..e659696 100644
--- a/test/test_reset.c
+++ b/test/test_reset.c
@@ -25,6 +25,12 @@
static void validate_stream(z_stream *strm, struct internal_state *sw_state,
nx_streamp hw_state) {
+#ifdef KEEP
+ if (hw_state->sw_stream) {
+ /* Pointers to internal structures shouldn't change */
+ assert(hw_state->sw_stream == sw_state);
+ }
+#else
/* NX state should always be loaded */
assert(has_nx_state(strm));
@@ -35,6 +41,7 @@ static void validate_stream(z_stream *strm, struct internal_state *sw_state,
/* Should be ready for fallback */
assert(s->switchable == 1);
+#endif
}
int main()
@@ -55,9 +62,9 @@ int main()
return TEST_ERROR;
}
- strm.zalloc = Z_NULL;
- strm.zfree = Z_NULL;
- strm.opaque = (voidpf)0;
+ /* Ensure there is no garbage in the structure, which could cause
+ segmentation fault. */
+ memset(&strm, 0, sizeof(z_stream));
/* Reseting an uninitialized stream must fail */
zcheck(CALL_DEFLATE_RESET(&strm), Z_STREAM_ERROR);
@@ -77,11 +84,12 @@ int main()
return TEST_SKIP;
}
+ nx_streamp hw_state = (nx_streamp) strm.state;
+ struct internal_state *sw_state = hw_state->sw_stream;
+
/* Reseting between init and deflate should be OK */
zcheck(CALL_DEFLATE_RESET(&strm), Z_OK);
- nx_streamp hw_state = (nx_streamp) strm.state;
- struct internal_state *sw_state = hw_state->sw_stream;
strm.next_in = src;
strm.next_out = compr;
strm.avail_in = rounds[i] == SW ? 100 : src_len;
@@ -102,7 +110,10 @@ int main()
return TEST_SKIP;
}
- /* Reseting between init and deflate should be OK */
+ hw_state = (nx_streamp) strm.state;
+ sw_state = hw_state->sw_stream;
+
+ /* Reseting between init and inflate should be OK */
zcheck(CALL_INFLATE_RESET(&strm), Z_OK);
strm.next_in = compr;
@tuliom @rzinsly Turns out I can't review my own PR 😂 The changes LGTM. So if it's OK with you, I think we can merge this.
Please read commit messages for more details.
This is marked as draft because I haven't found a solution for #187 yet. This PR is affected by that issue in two situations:
sw_deflateInit2_
, then zlib callsdeflateReset
, which ends up calling libnxz'sdeflateReset
.deflateReset
libnxz callssw_deflateReset
, which callsdeflateResetKeep
, which ends up using libnxz'sdeflateResetKeep
.The first one is easier to workaround because there won't be an entry in
stream_map
, so we now we have to call zlib'sdeflateReset
. I couldn't find a way to detect calls from zlib for case 2 yet.