velbus / python-velbustcp

Python application that bridges a Velbus installation with TCP
MIT License
2 stars 4 forks source link

network: automatically retry binding the listener socket #14

Closed nagisa closed 7 months ago

nagisa commented 1 year ago

It is possible that binding to a socket will fail at the program startup, for example because DHCP has not fetched the IP address for the interface, or the interface is down (because WiFi).

This changes the Network class to implement retries every 5 seconds if the initial bind has failed. Based on my experiments once the first bind succeeds, the socket will remain properly bound and working even if the interface loses the IP address or disappears entirely, so that is nice, but unknown how reliable this behaviour is. In particular my intuition was screming to me that accept call would raise an error which we would use to set self.__bind_socket = None, so that we'd know we need to rebind later.

While here I had to make some changes to the structure, for example:

Fixes #13

nagisa commented 1 year ago

Please run the github workflows before merging – I was not able to run the tests locally due to tox getting into an infinite loop on my system for some reason.

nagisa commented 1 year ago

I'm also okay with removing the last commit as it is only a minor optimization that's not really necessary to achieve the fix in this PR.

nagisa commented 11 months ago

r? @pvanloo

dukeofphilberg commented 10 months ago

r? @pvanloo

Hi, we'll look to review this soon. Things have been busy here at the office 😄

nagisa commented 10 months ago

Thanks for the review. I have applied all of the suggestions.

dukeofphilberg commented 10 months ago

Thanks for the review. I have applied all of the suggestions.

I've started the tox gh action check but after 30minutes they were still running. I'm assuming the same issue you had locally. I'll check this again tomorrow.

nagisa commented 10 months ago

I've started the tox gh action check but after 30minutes they were still running. I'm assuming the same issue you had locally. I'll check this again tomorrow.

In my case it was tox itself hanging, but I was able to run pytest locally and was able to reproduce the issue that GitHub hung on as well. I have pushed two commits to fix the issues pointed out by pytest.


The problem was that in the network.py stop() function we had this;

try:
    self.__bind_socket.shutdown(socket.SHUT_RDWR)
    self.__bind_socket.close()
    self.__bind_socket = None
except AttributeError:
    pass

and then

self.__server_thread.join()

but it was possible that the __bind_socket gets set to a bound socket in between these two blocks, which would lead to the server thread listening on a properly functioning socket without any intent to shut down, while the stop() was thinking it did everything to terminate the sockets and was waiting for the server thread to shut down.

dukeofphilberg commented 10 months ago

I've started the tox gh action check but after 30minutes they were still running. I'm assuming the same issue you had locally. I'll check this again tomorrow.

In my case it was tox itself hanging, but I was able to run pytest locally and was able to reproduce the issue that GitHub hung on as well. I have pushed two commits to fix the issues pointed out by pytest.

The problem was that in the network.py stop() function we had this;

try:
    self.__bind_socket.shutdown(socket.SHUT_RDWR)
    self.__bind_socket.close()
    self.__bind_socket = None
except AttributeError:
    pass

and then

self.__server_thread.join()

but it was possible that the __bind_socket gets set to a bound socket in between these two blocks, which would lead to the server thread listening on a properly functioning socket without any intent to shut down, while the stop() was thinking it did everything to terminate the sockets and was waiting for the server thread to shut down.

tox has run now, some errors in flake8 and mypy

nagisa commented 10 months ago

Sorry about that! I have fixed both flake8’s and mypy’s concerns in the last two commits.

nagisa commented 8 months ago

Just in case this fell through the cracks, is there any chance you could give this another look?

dukeofphilberg commented 7 months ago

We'll be looking to get this merged this week.

dukeofphilberg commented 7 months ago

Hi @nagisa, thanks for your patience. I've finalized the PR with some additional logic to catch errors. Thanks again!