olliNiinivaara / GuildenStern

Modular multithreading HTTP/1.1 + WebSocket server framework
MIT License
80 stars 7 forks source link

epoll error #7

Closed ghost closed 1 year ago

ghost commented 2 years ago

Hello, I am getting these errors

ERROR selector.select: /root/.choosenim/toolchains/nim-1.6.6/lib/pure/ioselects/ioselectors_epoll.nim(392, 15) `
not (pkey.ident == InvalidIdent)` Traceback (most recent call last)
/root/.nimble/pkgs/cligen-1.5.24/cligen.nim(793) cli
/root/.nimble/pkgs/cligen-1.5.24/cligen.nim(766) multiSubs
/root/.nimble/pkgs/cligen-1.5.24/cligen.nim(793) multi
/root/.nimble/pkgs/cligen-1.5.24/cligen.nim(766) dispatchstart
/site/src/nim/server.nim(351) start
/root/.nimble/pkgs/guildenstern-5.1.0/guildenstern/dispatcher.nim(173) serve
/root/.nimble/pkgs/guildenstern-5.1.0/guildenstern/dispatcher.nim(69) eventLoop
/root/.choosenim/toolchains/nim-1.6.6/lib/pure/ioselects/ioselectors_epoll.nim(392) selectInto
olliNiinivaara commented 2 years ago

Maybe you are trying to compile this in Windows (or in other non-Linux OS)? In that case, WSL might help you out: https://docs.microsoft.com/en-us/windows/wsl/install

ghost commented 2 years ago

The error happens at runtime in a docker container, the program compiles fine. Does the guildenstern server conflict with std/threadpool? Since I am using that too at the same time...

olliNiinivaara commented 2 years ago

looking at: https://github.com/nim-lang/Nim/blob/6f290fa3863824ec935d381bfefc850acd75196f/lib/pure/ioselects/ioselectors_epoll.nim#L376 , the problem happens in low level OS POSIX code and therefore should not happen due to conflict with std/threadpool. But it might happen due to docker, in which case I cannot help further (please test, whether the error persists without docker).

Basically epoll_wait seems to return an invalid event. Could it be that you are listening to same port (or timer event) in other thread (or process/program), and therefore the other thread has already consumed the event? Try to avoid this race condition. If not possible, there seems to be some EPOLLONESHOT and EPOLLEXCLUSIVE flags that might help us out.

It is also possible that the invalid event is spurious and nothing to worry about and the error could just be ignored (because in reality the event has already been processed). If you are sure you are not missing any "real" events, a simple temporary fix is to just modify the source by removing the log line https://github.com/olliNiinivaara/GuildenStern/blob/0c2a2ae9a9b86121eb971828850c0ae37c03babe/src/guildenstern/dispatcher.nim#L71

Sorry, cannot dig in to this much further now due to "holiday season". Please do continue reporting any further progress (or setbacks...). Pull requests or other ideas for a fix will also be highly appreciated.

ghost commented 2 years ago

Basically epoll_wait seems to return an invalid event. Could it be that you are listening to same port (or timer event) in other thread (or process/program), and therefore the other thread has already consumed the event? Try to avoid this race condition. If not possible, there seems to be some EPOLLONESHOT and EPOLLEXCLUSIVE flags that might help us out.

There are two processes (so 2 guildenstern servers) running at the same time, but on different ports and different containers The errors seem to happen only on one process at a time

olliNiinivaara commented 2 years ago

2 guildenstern servers running is a very good current hypothesis for the cause - at least that is something I have not really tested at all...

If you could run only one and take note if the problem disappears, that would corroborate the hypothesis.

If you are using timers in both, that is then most probably the root cause (and should get fixed) - at least in principle listening on different ports should not cause a race condition (and finding a fix for such phenomenon might be harder).

olliNiinivaara commented 1 year ago

closing due to inactivity. Probably the docker network was misconfigured.