karlseguin / http.zig

An HTTP/1.1 server for zig
MIT License
455 stars 31 forks source link

Fixing locking issue, where Pool::acquire causes a race condition #7

Closed marijnfs closed 10 months ago

marijnfs commented 1 year ago

I ran into crashes when quickly reloading the same page; it seems there is a race condition in acquire(). The items are assigned before the lock is acquired, and the lock is unlocked early in cases. To avoid this, I just acquire the lock directly, and defer an unlock. This avoids crashes in my application.

karlseguin commented 1 year ago

I'm happy to merge this in the name of caution, but I am skeptical about it. items is on the heap and is itself never changed. I don't understand why it would be a problem for one thread to get the reference to items while another is writing to an index of it.

Let me know if the issue comes back. I tried to put the server under various loads, and haven't been able to reproduce it.

By any chance, did you try a variation that moves the lock at the start of the function, but keeps the early release? It would be nice to have the lock released when initSelf is called...

self.mutex.lock();    //moved this up
const items = self.items;
const available = self.available;
if (available == 0) {
  self.mutex.unlock();  //keep this here
  return try self.initFn(self.initState);
}
defer self.mutex.unlock(); //keep this here
const new_available = available - 1;
self.available = new_available;
return items[new_available];
karlseguin commented 1 year ago

When you first initialise the server, are you passing a GeneralPurposeAllocator or something else?

I'm wondering if the race condition is in initFn (which is the initReqRes function in listener.zig). With your version, this call is now serialized. I could see that being important if the allocator isn't thread safe (GPA is thread safe, which is why I'm wondering if you're using a different one).

marijnfs commented 1 year ago

Hmm fair assessment, I'm also running into SIGPIPE issues, so it's kinda hard to pinpoint the exact issue. I'll try to find some way to reproduce it minimally

marijnfs commented 1 year ago

With more testing now I can't easily reproduce this issue anymore, so for now I would say lets ignore it.