Closed arminabf closed 2 years ago
There is a known issue since the dawn of time in restart/reload behaviour and watchdog tasks. The problem is that the order in which watchdog tasks and server shut down is not coordinated well enough and OpenSSL shutdown may stumble over its feet.
One of the stack traces you show is a from a child exiting. Just to confirm that we are investigating the right thing: can you confirm crashes without the child exiting?
One of the stack traces you show is a from a child exiting. Just to confirm that we are investigating the right thing: can you confirm crashes without the child exiting?
On opening this issue I thought there are various stack traces that end up in a segfault. Now that I look at it more closely I recognized that there are only the two cases that I have mentioned in my first post. Segfaults that arise during exiting of childs tend to be in the minority. So yes, there are lots of crashes without the child exiting.
The first stack trace shows
#2 0xf3cce598 in CRYPTO_THREAD_write_lock (lock=0x0) at crypto/threads_pthread.c:78
and that means the lock pointer is NULL which is, afaict not supposed to happen. Openssl has a RUN_ONCE init that allocates locks for the RAND functions to use.
Which means either that this did not properly work or that someone wiped memory he was not supposed to. Can you compile the mod_md from 2.4.53 in the 2.4.54 into your test server to check that the problems disappear? That would lay the blame solely at my door...
Hmm, looking at the differences between 2.4.53 and 2.4.54, there was a change in JSON reference handling. Can you try the following patch?
Index: modules/md/md_json.c
===================================================================
--- modules/md/md_json.c (Revision 1903191)
+++ modules/md/md_json.c (Arbeitskopie)
@@ -195,7 +195,6 @@
j = jselect_parent(&key, 1, json, ap);
if (!j || !json_is_object(j)) {
- json_decref(val);
return APR_EINVAL;
}
@@ -206,7 +205,6 @@
}
if (!json_is_array(aj)) {
- json_decref(val);
return APR_EINVAL;
}
@@ -660,8 +658,7 @@
static int object_set(void *data, const char *key, const char *val)
{
json_t *j = data, *nj = json_string(val);
- json_object_set(j, key, nj);
- json_decref(nj);
+ json_object_set_new(j, key, nj);
return 1;
}
Can you compile the mod_md from 2.4.53 in the 2.4.54 into your test server to check that the problems disappear? That would lay the blame solely at my door...
I tested this out.... the segfaults occur also with mod_md from 2.4.53.
Can you try the following patch?
Unfortunately, the patch does not solve the problem.
Well, that means the latest changes are not causing the issues on your system.
Have you switched/updated other components? Maybe from OpenSSL 1.1 to 3.0? Trying to circle things down here...
Ok, scanning the code and the report again. Can you produce a log at LogLevel md:trace4
of what happens right before the crash? You can mail it to me at stefan at eissing.org
Since the bug also happens with the 2.4.43 version, this points to a mistake in my handling of libcurl.
Oh, I made a mistake building httpd 2.4.54 with mod_md from 2.4.53... I did it again and I have to revise my statement: indeed, with mod_md 2.4.53 the problem did not exist yet.
I'll send you next the logs file via mail...
I'm using 2.4.54 but only with http-01
and not tls-alpn-01
challenges. I haven't seen this problem yet, is it likely limited to tls-alpn-01
challenges?
@arminabf thank you for the regression testing!
@whereisaaron I am investigating the issue with @arminabf and so far, he is the only one seeing it. I proposed a fix in the current master that restores the old behaviour in handling curl
instances. Maybe it is related to the curl version he uses. We'll see.
@arminabf confirmed the issue to be fixed in the last patch which I now release as v2.4.19.
For regression testing we use a proprietary HTTP service that is able to simulate a CA authoritiy to mod_md. Since we have updated httpd to 2.4.54 we face segmentation faults with various backtraces.
Following are two examples:
In our tests we use these settings
Tracing the request-response flow between mod_md and CA authority we see following HTTP traffic for the first two requests:
It is interesting that if our HTTP service closes the connection after each response, then the segmentation fault does not occur. However, that behaviour is new since 2.4.54.