Closed tuliom closed 2 years ago
On libnxz we return Z_ERRNO
in three points, all of them in nx_inflate.c
.
https://github.com/libnxz/power-gzip/blob/develop/lib/nx_inflate.c#L1280-L1291
if (ticks_total > (timeout_pgfaults * nx_get_freq())) {
/* TODO what to do when page faults are too many?
* Kernel MM would have killed the process. */
prt_err("Cannot make progress; too many page");
prt_err(" faults cc= %d\n", cc);
rc = Z_ERRNO;
goto err5;
}
In this next one we could change to Z_DATA_ERROR
.
https://github.com/libnxz/power-gzip/blob/develop/lib/nx_inflate.c#L1316
if (!csb_ce_termination(nx_ce) &&
csb_ce_partial_completion(nx_ce)) {
/* check CPB for more information
spbc and tpbc are valid */
sfbt = getnn(cmdp->cpb, out_sfbt); /* Table 6-4 */
subc = getnn(cmdp->cpb, out_subc); /* Table 6-4 */
spbc = get32(cmdp->cpb, out_spbc_decomp);
tpbc = get32(cmdp->crb.csb, tpbc);
ASSERT(target_sz >= tpbc);
goto ok_cc3; /* not an error */
}
else {
/* History length error when CE(1)=1 CE(0)=0.
We have a bug */
rc = Z_ERRNO;
prt_err("history length error cc= %d\n", cc);
goto err5;
}
https://github.com/libnxz/power-gzip/blob/develop/lib/nx_inflate.c#L1363
default:
prt_err("error: cc = %u, cc = 0x%x\n", cc, cc);
char* csb = (char*) (&cmdp->crb.csb);
for(int i = 0; i < 4; i++) /* dump first 32 bits of csb */
prt_err("CSB: 0x %02x %02x %02x %02x\n", csb[0], csb[1], csb[2], csb[3]);
rc = Z_ERRNO;
goto err5;
Neither of these cases seems to be related with an errno
error from POSIX. However there is no return code for a generic error. Also we cannot follow zlib.h
here as it will never return Z_ERRNO
on a inflate.
I can see two actions that we could do to help here.
Z_ERRNO
and why errno
is not set.
Do you have any suggestions?I am at a loss here. Can we set errno to EIO perhaps?
IMHO returning an imprecise code is worst than not at all.
Do you mean EIO is an imprecise code? The nearest thing that I can see in /usr/include/asm-generic/errno-base.h is EIO and ENOMEM. ENOMEM looks to be fitting the first scenario you have above.
I agree the second case should return Z_DATA_ERROR
instead of Z_ERRNO
. And we could probably return Z_MEM_ERROR
in the first one.
For the third one, Z_ERRNO
with EIO
looks like the closest thing we've got. But @lucmaga has a good point that per zlib and the standard inflate
does not return Z_ERRNO
. Since one use case for libnxz is to replace zlib, I'm afraid we can break existing apps if we keep returning that value or even introduce a new one.
Is the third situation recoverable? i.e. is there a way we can handle that error inside inflate
without having to return?
Do you mean EIO is an imprecise code?
@abalib I did, but having a second thought it's indeed the closest thing. Returning EIO plus adding an entry on the documentation with the explanation will be clear enough.
I'm afraid we can break existing apps if we keep returning that value or even introduce a new one.
I don't have a strong opinion here. I need to understand better this section of the code before everything. Thinking this is a issue that will occur too rarely we could keep the Z_ERRNO. I suspect that most codes using inflate uses a switch case with a default action.
OK. Here is another way of thinking. The first case is when there are too many page faults. It is an administrative problem. If the user does not have enough memory then he should add more memory. Therefore, make timeout_pgfaults infinite and be done with it, meaning the code will spin and never return. Haren also said something to that effect a while back. We're not going to pretend to be OOM killers.
Second and third ones are like assertions I believe --bug in the software or hardware. They should never happen. Z_DATA_ERROR may be good enough
Any thoughts?
OK. Here is another way of thinking. The first case is when there are too many page faults. It is an administrative problem. If the user does not have enough memory then he should add more memory. Therefore, make timeout_pgfaults infinite and be done with it, meaning the code will spin and never return. Haren also said something to that effect a while back. We're not going to pretend to be OOM killers.
Let me embellish this a bit. Do not change timeout period. Just eliminate the goto err5 statement. The net result is the same as my first quoted statement but the condition is logged and we can see that it's happening
I will try implement your suggestion @abalib. Thanks for the help!
In zlib, errno is expected to be set when Z_ERRNO is returned. But that is not happening for all scenarios in inflate.