Closed lynlevenick closed 1 week ago
@lynlevenick thanks for the raising this issue. my assumption was if the container goes unhealthy docker would restart but it appears that this is not actually the case. What should happen in this case do you think?
If you kill PID 1, the container will behave according to its restart policy. However, since the behavior is now to kill all running servers when the application starts, that could cause a chain of events like:
There’s a possibility of restarting the lazymc instance that the message is coming from, but I’m not sure if this error (or other recoverable errors) can occur while players are live on the server. If it can, I believe that could drop players as well.
It might be better to just ignore this particular error? If you do this though there’s always the possibility that another new one gets added or that this was insufficient. I think if I were doing it I would watch the lazymc process under the assumption that if an unrecoverable error occurs it would close itself, then restart it as needed, but I haven’t fully read through the lazymc source code.
@lynlevenick ive got a pr to make it not park the thread here: https://github.com/joesturge/lazymc-docker-proxy/pull/112 please review, I think try it with this when it is released to see what the following logs occur afterwards
Sounds good, if you publish an image I can get you some logging.
@lynlevenick v2.5.2 is live with this change, let me know what happens
Apologies for the delay - I have logs!
TRACE a::lazymc::monitor > Fetching status for 172.18.0.4:25566 ...
TRACE b::lazymc::monitor > Fetching status for 172.18.0.3:25565 ...
TRACE a::mio::poll > registering event source with poller: token=Token(1761607681), interests=READABLE | WRITABLE
TRACE b::mio::poll > registering event source with poller: token=Token(1761607681), interests=READABLE | WRITABLE
ERROR b::lazymc > Closing connection, error occurred
DEBUG lazymc-docker-proxy::health > Setting health status to: UNHEALTHY
ERROR lazymc-docker-proxy::health > Application is unhealthy.
TRACE b::mio::poll > deregistering event source from poller
TRACE a::mio::poll > deregistering event source from poller
TRACE a::lazymc::monitor > Fetching status for 172.18.0.4:25566 ...
TRACE a::mio::poll > registering event source with poller: token=Token(1778384897), interests=READABLE | WRITABLE
TRACE b::mio::poll > deregistering event source from poller
TRACE b::lazymc::monitor > Fetching status for 172.18.0.3:25565 ...
TRACE b::mio::poll > registering event source with poller: token=Token(1778384897), interests=READABLE | WRITABLE
TRACE b::mio::poll > registering event source with poller: token=Token(318767109), interests=READABLE | WRITABLE
TRACE b::mio::poll > deregistering event source from poller
TRACE b::lazymc::monitor > Fetching status for 172.18.0.3:25565 ...
It seems like this error is recoverable - the lazymc instance continues working after logging that error. My players were able to join and have the server start as desired.
thats, good, perhaps i should just stop it from setting unhealthy altogether if any error occurs, your right that we don't want to get into adding many special cases and run the risk of missing something, removing it will subvert this risk
Im going to close this issue now, if you have any other problems please open a new issue, cheers!
I’m getting logs that read
which I believe to be coming from lazymc’s status server. It emits an error whenever there’s an error reading the initial packet from the client (code here). I believe that this is a completely recoverable error in lazymc, where the error serving this status can be safely ignored. Unfortunately, lazymc-docker-proxy dutifully reads that an error has occurred and immediately parks the logging thread, which prevents any further logs from being emitted and may or may not cause the failure to connect my users are seeing, which I have little visibility into because I don’t have logs.
I believe you could reproduce this by opening a connection to lazymc-docker-proxy’s lazymc server and closing it immediately or leaving it open without sending any packets until timeout, though I haven’t had the chance to personally check this.