phokz / mod-auth-external

External Authentication Module for Apache HTTP Server - mod_authnz_external
37 stars 14 forks source link

mod-auth-external 3.3.3 results in pwauth signal 15 #50

Closed pbiering closed 2 years ago

pbiering commented 3 years ago

Since some days, in EPEL8 version 3.3.3 is available. But after update from 3.3.2, this won't behave proper.

Found now in log:

External authenticator died on signal 15, referer:

Found a hint: https://bz.apache.org/bugzilla/show_bug.cgi?id=59237 but HTTP/2 is not enabled on server.

working:

problem:

I've protected an /mrtg directory, so on each image request it will fire the "pwauth" (also visible in audit log). Looks like some of them are receiving SIGSEGV...but this did not happen with 3.3.2

Any hints where to look further on?

Has one detected similar issue?

pbiering commented 3 years ago

Just an update: have now configured the "socache" extension to avoid heavy load when requesting MRTG page containing all the pictures (now SIGSEGV looks like disappeared)....but imho the issue should be analyzed, potentially a race condition happend.

pbiering commented 3 years ago

I made further investigations by applying deltas from 3.3.2 to 3.3.3 and it looks like that the code related to additional environment "ENV_METHOD" is causing the issue. But have still no indication about the underlying reason so far :-(

bimimicah commented 3 years ago

Sorry, this ticket slipped through the cracks and I only noticed it now.

This ticket seems very similar to #28, which seems more likely to be an issue with pwauth than mod_authnz_external. It is interesting though, that you say you are not using HTTP/2 and that 3.3.2 did not have this problem. I diff'ed the source code from the EPEL SRPMs for 3.3.2 and 3.3.3 and the only things that would even slightly affect this problem are, as you said, the extra environment variable and also a fix for the call to apr_procattr_child_errfn_set(). However these are both unlikely to be causing your issue, especially since the previous ticket was reported on 3.3.2 specifically.

So we can troubleshoot a bit more, could you try going back to 3.3.2 and seeing if that really fixes the problem? Is it possible that you are experiencing increased load on your server or some other issue that would trigger a pwauth bug? Could you please also try building from the master branch? We recommend waiting for releases to use in production, but this would be a useful test because we overhauled the way environment variables were stored, and I would be interested to see if there is any difference (there shouldn't be).

bimimicah commented 3 years ago

Another thing to test would be to replace the call to pwauth with something else with a similar load profile, and see if you have the same results. I think it is fairly unlikely this is a mod_authnz_external bug, though: "External authenticator died on signal 15" means that something killed the subprocess (in this case, pwauth). There is no code in mod_authnz_external to send any signals, especially not a kill signal. I would lean more toward an apache runtime (apr) or pwauth bug. But if you're willing to try a few things, we'll see what we find out :)

bimimicah commented 3 years ago

Another thing to note is the 3.3.2 SRPM you linked to is actually for Fedora, not RHEL/CentOS 8. But again, I don't think this is the problem, especially since they seem to be using our code without modifications for both RPMs.

pbiering commented 3 years ago

@bimimicah : thank you for all the hints...meanwhile I've tested some more and it looks like turning stable once I backported

https://github.com/phokz/mod-auth-external/commit/876e7f2c83dc6104e29f1210c0d697e5493decb7#diff-ddb11d5a7c17eb0d5dfc218e140d1eb06233b207b8ef572ef2ea0681d30fcd0b

to be applicable on vanilla 3.3.3 release. This would be a 2nd evidence that the extended environment (3.3.2 -> 3.3.3) with the old method is somehow not reliable and the method change was a good idea.

will keep you updated by testing next latest source code

BTW: the 3.3.2 Fedora SRPMS was rebuilded on an EL8 system before installation.

bimimicah commented 3 years ago

thank you for all the hints...meanwhile I've tested some more and it looks like turning stable once I backported

876e7f2#diff-ddb11d5a7c17eb0d5dfc218e140d1eb06233b207b8ef572ef2ea0681d30fcd0b

to be applicable on vanilla 3.3.3 release. This would be a 2nd evidence that the extended environment (3.3.2 -> 3.3.3) with the old method is somehow not reliable and the method change was a good idea.

This is very unexpected but interesting! Ok... it looks like from this commit which was contributed as a patch in ticket #19, that when REQUEST_METHOD was added, the environment variable buffer was not incremented. I must have overlooked that when I merged it. This shouldn't be an issue though, because there are 13 slots in the array, and we use exactly 13 (including the NULL item at the end) Especially since on Linux/Mac we only use 12 slots and skip the SystemRoot variable, which is Windows-specific... ....

We were going to cut a new release soon anyway, I guess this makes it a bit more urgent since it's fixed in master. I'm still not sure why it would need extra unused buffers though... Since this only seems to affect pwauth, maybe pwauth is doing something it shouldn't with the environment it has been provided? (I don't use pwauth, so I'm not very familiar with its workings)

pbiering commented 3 years ago

Status update

3.3.2 -> 3.3.3 even contains the child_env extension which was missed in mentioned commit (see below), but having issues (reproducable).

Created now a RPM using latest "master", all good so far.

Imho from current point of view it would be waste of time drilling down further on, whether it's a real issue of "pwauth" (potentially require some code extension for debugging internal flows) or even a compiler optimization issue.

I'm happy that next release is available soon, meanwhile I use my intermediate version.

@@ -96,6 +98,7 @@
 #define ENV_HOST       "HOST"          /* Remote Host */
 #define ENV_HTTP_HOST  "HTTP_HOST"     /* Local Host */
 #define ENV_CONTEXT    "CONTEXT"       /* Arbitrary Data from Config */
+#define ENV_METHOD     "METHOD"        /* Request method (eg. GET, HEAD, POST, OPTIONS, etc.) */
 /* Undefine this if you do not want cookies passed to the script */
 #define ENV_COOKIE     "COOKIE"
@@ -400,12 +412,14 @@
     apr_procattr_t *procattr;
     apr_proc_t proc;
     apr_status_t rc= APR_SUCCESS;
-    char *child_env[12];
+    char *child_env[13];
     char *child_arg[MAX_ARG+2];
     const char *t;
     int i, status= -4;
     apr_exit_why_e why= APR_PROC_EXIT;
+#ifndef _WINDOWS
     apr_sigfunc_t *sigchld;
+#endif

     /* Set various flags based on the execution method */
@@ -449,6 +463,9 @@
        if (r->uri)
            child_env[i++]= apr_pstrcat(p, ENV_URI"=", r->uri, NULL);

+       if (r->method)
+               child_env[i++] = apr_pstrcat(p, ENV_METHOD"=", r->method, NULL);
+
        if ((host= apr_table_get(r->headers_in, "Host")) != NULL)
            child_env[i++]= apr_pstrcat(p, ENV_HTTP_HOST"=", host, NULL);
pbiering commented 2 years ago

Hmm, it looks like unfortunatly not all is good so far. I'm seldom using it currently, but it still appears on initial request and then subsequently.

[Sun Jan 16 09:13:19.123176 2022] [authnz_external:error] [pid 2394628:tid 140390149625600] (10)No child processes: [client ***] Could not get status from child process, referer: https://SERVER/mrtg/index-month.html
[Sun Jan 16 09:13:19.163517 2022] [authnz_external:error] [pid 2394628:tid 140390149625600] [client ***] Authorization of user USER to access /mrtg/PICTURE.png failed. User not in Required group. Last result code: 1, referer: https://SERVER/mrtg/index-month.html
pbiering commented 2 years ago

Hmm, it looks like unfortunatly not all is good so far. I'm seldom using it currently, but it still appears on initial request and then subsequently.

[Sun Jan 16 09:13:19.123176 2022] [authnz_external:error] [pid 2394628:tid 140390149625600] (10)No child processes: [client ***] Could not get status from child process, referer: https://SERVER/mrtg/index-month.html
[Sun Jan 16 09:13:19.163517 2022] [authnz_external:error] [pid 2394628:tid 140390149625600] [client ***] Authorization of user USER to access /mrtg/PICTURE.png failed. User not in Required group. Last result code: 1, referer: https://SERVER/mrtg/index-month.html

this is a different issue, filed now https://github.com/phokz/mod-auth-external/issues/52

The original reported issue is gone since November, no longer found any related log line!