tjguk / networkzero

Making it easy for teachers to use simple networking in Python
MIT License
47 stars 14 forks source link

Thread safety? #9

Closed waveform80 closed 8 years ago

waveform80 commented 8 years ago

My one concern, reading through the design guidelines, is "Thread-safety is not a priority". I'd argue that, at least on the Raspberry Pi side of things, thread-safety is vaguely important. This is primarily because anything involving GPIO event handling necessarily involves threading (RPi.GPIO, RPIO, and even GPIO Zero's native pin library all fire up background threads to poll for edge detection in a variety of ways; they also all fire event handlers from those background threads). Furthermore, GPIO Zero goes a little thread-crazy when you start playing with the source/values stuff (in that model every component effectively becomes its own thread).

This probably only matters where the Pi is an initiator in network transactions; i.e. if it's just receiving commands from a PC then I doubt threading matters much on the PC side and it certainly won't matter on the Pi. However, if a Pi sends network messages in response to, say, events triggered by a motion sensor or a button, those messages will be processed in background threads rather than the main thread.

For that reason alone, I'd beg to make thread-safety at least a consideration, if not an absolute priority.

tjguk commented 8 years ago

On 10/04/2016 18:24, Dave Jones wrote:

My one concern, reading through the design guidelines, is "Thread-safety is not a priority". I'd argue that, at least on the Raspberry Pi side of things, thread-safety is vaguely important. This is primarily because anything involving GPIO event handling necessarily involves threading (RPi.GPIO, RPIO, and even GPIO Zero's native pin library all fire up background threads to poll for edge detection in a variety of ways; they also all fire event handlers from those background threads). Furthermore, GPIO Zero goes a little thread-crazy when you start playing with the source/values stuff (in that model every component effectively becomes its own thread).

Ok. Good points there. I was trying to keep my own code complexity down by assuming that, focusing on simple learning, I wouldn't be encountering people doing thread-crazy things. (Or, if I did, that they would be on their own). I didn't think of implicit threading.

I'm about to revisit some of the design anyway, having done my first trial across 4 machines (most dev./test happens in multiple processes on one Windows box). There are definite flaws in the advert/discovery when there's any kind of latency in the system. So I'll have an eye to threading issues when I re-test.

Mostly, the issues will be around the fact that I'm passing global caches to-and-fro and, if they're hitting different threads, that's considered a no-no in ZeroMQ terms. Getting all the tests passing (and they do use threads!) was a nightmare to debug. So I definitely want the whole thing to be as robust as possible.

waveform80 commented 8 years ago

Fun stuff :) Unfortunately I'm off doing picademy stuff for the next few days but after the craziness has died down a bit I'll try and spend some time familiarizing myself with 0mq and nw0 as I can imagine this'll become another of our go to libraries (there's always one group who want to do robotics, and usually when networking comes up the answer is "don't go there - it's too hard for a one day project - just use a keyboard for now").

tjguk commented 8 years ago

Ok; I've now implemented a fairly simple thread-safety mechanism in the sockets module. As all socket creation is channelled through the "get_socket" method, the socket cache it uses is a thread-local dictionary. There is also a thread-global set of all sockets used.

A speaking/publishing socket (ie one which will eventually connect) can be created in as many threads as you like, each thread creating & cacheing a new socket.

A listening/subscribing socket (ie one which will eventually bind) can only be created once: any instance of it is thread-local, but there can be only one. [I did briefly consider calling them Highlander sockets, but thought it was too geeky!]. The thread-global cache is checked and a specific error is raised if there is an existing such socket in another thread.

There are tests for both these situations in the test_threading module.

tjguk commented 8 years ago

Reopening as the bind test is failing on Linux

tjguk commented 8 years ago

Closing again as Travis is happy once more. Looks like it's a slight difference in thread scheduling between Windows & Llinux, exposing a race condition in the test code.