Closed nagisa closed 7 months ago
Please run the github workflows before merging – I was not able to run the tests locally due to tox getting into an infinite loop on my system for some reason.
I'm also okay with removing the last commit as it is only a minor optimization that's not really necessary to achieve the fix in this PR.
r? @pvanloo
r? @pvanloo
Hi, we'll look to review this soon. Things have been busy here at the office 😄
Thanks for the review. I have applied all of the suggestions.
Thanks for the review. I have applied all of the suggestions.
I've started the tox gh action check but after 30minutes they were still running. I'm assuming the same issue you had locally. I'll check this again tomorrow.
I've started the tox gh action check but after 30minutes they were still running. I'm assuming the same issue you had locally. I'll check this again tomorrow.
In my case it was tox itself hanging, but I was able to run pytest locally and was able to reproduce the issue that GitHub hung on as well. I have pushed two commits to fix the issues pointed out by pytest.
The problem was that in the network.py stop()
function we had this;
try:
self.__bind_socket.shutdown(socket.SHUT_RDWR)
self.__bind_socket.close()
self.__bind_socket = None
except AttributeError:
pass
and then
self.__server_thread.join()
but it was possible that the __bind_socket
gets set to a bound socket in between these two blocks, which would lead to the server thread listening on a properly functioning socket without any intent to shut down, while the stop()
was thinking it did everything to terminate the sockets and was waiting for the server thread to shut down.
I've started the tox gh action check but after 30minutes they were still running. I'm assuming the same issue you had locally. I'll check this again tomorrow.
In my case it was tox itself hanging, but I was able to run pytest locally and was able to reproduce the issue that GitHub hung on as well. I have pushed two commits to fix the issues pointed out by pytest.
The problem was that in the network.py
stop()
function we had this;try: self.__bind_socket.shutdown(socket.SHUT_RDWR) self.__bind_socket.close() self.__bind_socket = None except AttributeError: pass
and then
self.__server_thread.join()
but it was possible that the
__bind_socket
gets set to a bound socket in between these two blocks, which would lead to the server thread listening on a properly functioning socket without any intent to shut down, while thestop()
was thinking it did everything to terminate the sockets and was waiting for the server thread to shut down.
tox has run now, some errors in flake8 and mypy
Sorry about that! I have fixed both flake8’s and mypy’s concerns in the last two commits.
Just in case this fell through the cracks, is there any chance you could give this another look?
We'll be looking to get this merged this week.
Hi @nagisa, thanks for your patience. I've finalized the PR with some additional logic to catch errors. Thanks again!
It is possible that binding to a socket will fail at the program startup, for example because DHCP has not fetched the IP address for the interface, or the interface is down (because WiFi).
This changes the
Network
class to implement retries every 5 seconds if the initial bind has failed. Based on my experiments once the first bind succeeds, the socket will remain properly bound and working even if the interface loses the IP address or disappears entirely, so that is nice, but unknown how reliable this behaviour is. In particular my intuition was screming to me thataccept
call would raise an error which we would use to setself.__bind_socket = None
, so that we'd know we need to rebind later.While here I had to make some changes to the structure, for example:
self.__running: bool
withself.__stop: threading.Event
, as that was required to have an interruptible sleepin the__get_bound_socket
function.accept
loop, as there isn't a guarantee that the network was listening on127.0.0.1
in the first place!Fixes #13