Closed j4cobgarby closed 4 months ago
Looking a bit further into the source code, I've been able to fix this issue with the following patch:
diff --git a/neco.c b/neco.c
index 322265c..51d8110 100644
--- a/neco.c
+++ b/neco.c
@@ -2739,10 +2739,10 @@ static void *worker_entry(void *arg) {
ts.tv_sec += 1;
pthread_cond_timedwait(&thread->cond, &thread->mu, &ts);
if (thread->len == 0) {
- thread->th = 0;
if (!thread->end) {
pthread_detach(thread->th);
}
+ thread->th = 0;
thread->end = false;
break;
}
(i.e., moving the thread->th = 0;
until after the call to pthread_detach
)
Let me know if this is a "correct" fix, or if the previous order was intended. If so, I'd be happy to submit a PR :)
Your code looks like it will fix an error related to the pthread_deatch being passed a zero value.
But I'm unable to reproduce an actual SEGV crash on my side.
What operating system and architecture are you running?
I'm running Opensuse Tumbleweed on Linux 6.8.1. In terms of architecture, it's some dell laptop with an Intel i7. It is a strange issue, because clearly the code works fine for you, I just don't understand how -- maybe certain implementations of pthread_detach gracefully handle invalid threads?
I wonder if the difference is due to how our systems are giving IDs to threads. If I run the following code:
#include <pthread.h>
#include <stdio.h>
void *test(void *_) {
return NULL;
}
int main() {
pthread_t th;
pthread_create(&th, NULL, &test, NULL);
printf("Created thread: %lu\n", th);
pthread_detach(th);
}
I see "Created thread: 140478894835392", but depending on what OS you're on, the first thread could potentially be given the id 0? This would explain why you're not getting a segfault, however I believe in that case it's still incorrect to explicitly set the thread to 0 before detaching it.
I agree that it must be related to a different in how systems handles passing a zero to pthread_detach. I would expect it to return an ESRCH error, and gracefully continue operating. But still, you identified a valid issue with thread being set to zero before detaching. I think that will probably fix the issue.
I would accept a PR if you are willing.
That's a good point, the man page agrees with you that it should return a ESRCH error... However even with something as simple as
#include <pthread.h>
int main() {
pthread_detach(0);
}
I get a segfault - maybe this is a bug specifically in whatever version of libc I have.
I'll submit a PR in a minute :)
Great thanks! All is merged.
thanks :)
To reproduce
1) Download
neco.{c,h}
and the example HTTP server code from the README, which is:2)
gcc neco.c main.c && ./a.out
3) It runs fine, and serves the HTML once, but segfaults immediately after.Debugging
I had a brief look at the trace in GDB:
Presumably this
thread->th == 0
is the cause of the problem, but I'm not familiar enough with the codebase to have a look into this any further.Probably worth noting that other examples that I tried so far worked fine.(Nevermind, looks like I'm getting the same issue with the echo server inexamples/
; it segfaults before I even run the client, in fact.)