libnxz / power-gzip

POWER NX zlib compliant library
23 stars 18 forks source link

Set errno when returning Z_ERRNO #52

Closed tuliom closed 2 years ago

tuliom commented 3 years ago

In zlib, errno is expected to be set when Z_ERRNO is returned. But that is not happening for all scenarios in inflate.

lucmaga commented 2 years ago

On libnxz we return Z_ERRNO in three points, all of them in nx_inflate.c. https://github.com/libnxz/power-gzip/blob/develop/lib/nx_inflate.c#L1280-L1291

                       if (ticks_total > (timeout_pgfaults * nx_get_freq())) {
                           /* TODO what to do when page faults are too many?
                            * Kernel MM would have killed the process. */
                                prt_err("Cannot make progress; too many page");
                                prt_err(" faults cc= %d\n", cc);
                                rc = Z_ERRNO;
                                goto err5;
                        }

In this next one we could change to Z_DATA_ERROR. https://github.com/libnxz/power-gzip/blob/develop/lib/nx_inflate.c#L1316

                if (!csb_ce_termination(nx_ce) &&
                    csb_ce_partial_completion(nx_ce)) {
                        /* check CPB for more information
                           spbc and tpbc are valid */
                        sfbt = getnn(cmdp->cpb, out_sfbt); /* Table 6-4 */
                        subc = getnn(cmdp->cpb, out_subc); /* Table 6-4 */
                        spbc = get32(cmdp->cpb, out_spbc_decomp);
                        tpbc = get32(cmdp->crb.csb, tpbc);
                        ASSERT(target_sz >= tpbc);
                        goto ok_cc3; /* not an error */
                }
                else {
                        /* History length error when CE(1)=1 CE(0)=0.
                           We have a bug */
                        rc = Z_ERRNO;
                        prt_err("history length error cc= %d\n", cc);
                        goto err5;
                }

https://github.com/libnxz/power-gzip/blob/develop/lib/nx_inflate.c#L1363

        default:
                prt_err("error: cc = %u, cc = 0x%x\n", cc, cc);
                char* csb = (char*) (&cmdp->crb.csb);
                for(int i = 0; i < 4; i++) /* dump first 32 bits of csb */
                        prt_err("CSB: 0x %02x %02x %02x %02x\n", csb[0], csb[1], csb[2], csb[3]);
                rc = Z_ERRNO;
                goto err5;

Neither of these cases seems to be related with an errno error from POSIX. However there is no return code for a generic error. Also we cannot follow zlib.h here as it will never return Z_ERRNO on a inflate. I can see two actions that we could do to help here.

  1. Add a new error code to indicate hardware error.
  2. Add a note on the documentation explaining where we return Z_ERRNO and why errno is not set. Do you have any suggestions?
abalib commented 2 years ago

I am at a loss here. Can we set errno to EIO perhaps?

lucmaga commented 2 years ago

IMHO returning an imprecise code is worst than not at all.

abalib commented 2 years ago

Do you mean EIO is an imprecise code? The nearest thing that I can see in /usr/include/asm-generic/errno-base.h is EIO and ENOMEM. ENOMEM looks to be fitting the first scenario you have above.

mscastanho commented 2 years ago

I agree the second case should return Z_DATA_ERROR instead of Z_ERRNO. And we could probably return Z_MEM_ERROR in the first one.

For the third one, Z_ERRNO with EIO looks like the closest thing we've got. But @lucmaga has a good point that per zlib and the standard inflate does not return Z_ERRNO. Since one use case for libnxz is to replace zlib, I'm afraid we can break existing apps if we keep returning that value or even introduce a new one.

Is the third situation recoverable? i.e. is there a way we can handle that error inside inflate without having to return?

lucmaga commented 2 years ago

Do you mean EIO is an imprecise code?

@abalib I did, but having a second thought it's indeed the closest thing. Returning EIO plus adding an entry on the documentation with the explanation will be clear enough.

I'm afraid we can break existing apps if we keep returning that value or even introduce a new one.

I don't have a strong opinion here. I need to understand better this section of the code before everything. Thinking this is a issue that will occur too rarely we could keep the Z_ERRNO. I suspect that most codes using inflate uses a switch case with a default action.

abalib commented 2 years ago

OK. Here is another way of thinking. The first case is when there are too many page faults. It is an administrative problem. If the user does not have enough memory then he should add more memory. Therefore, make timeout_pgfaults infinite and be done with it, meaning the code will spin and never return. Haren also said something to that effect a while back. We're not going to pretend to be OOM killers.

Second and third ones are like assertions I believe --bug in the software or hardware. They should never happen. Z_DATA_ERROR may be good enough

Any thoughts?

abalib commented 2 years ago

OK. Here is another way of thinking. The first case is when there are too many page faults. It is an administrative problem. If the user does not have enough memory then he should add more memory. Therefore, make timeout_pgfaults infinite and be done with it, meaning the code will spin and never return. Haren also said something to that effect a while back. We're not going to pretend to be OOM killers.

Let me embellish this a bit. Do not change timeout period. Just eliminate the goto err5 statement. The net result is the same as my first quoted statement but the condition is logged and we can see that it's happening

lucmaga commented 2 years ago

I will try implement your suggestion @abalib. Thanks for the help!