jgehrcke / gipc

gevent-cooperative child processes and inter-process communication
https://gehrcke.de/gipc
MIT License
83 stars 13 forks source link

Select()-ing on gipc pipe? #16

Closed jgehrcke closed 7 years ago

jgehrcke commented 9 years ago

Originally reported by: Ivan Voras (Bitbucket: ivoras, GitHub: ivoras)


It looks like gipc pipe objects are missing fileno (), making them unusable with select()?


jgehrcke commented 9 years ago

Original comment by Ivan Voras (Bitbucket: ivoras, GitHub: ivoras):


Closing

jgehrcke commented 9 years ago

Original comment by Ivan Voras (Bitbucket: ivoras, GitHub: ivoras):


That's actually a neat idea, for simple one-way communication. Not applicable to my scenario but sure, thanks. I consider this issue closed.

jgehrcke commented 9 years ago

Original comment by Jan-Philip Gehrcke (Bitbucket: jgehrcke, GitHub: jgehrcke):


So you need a mechanism to interrupt/abort a greenlet that is currently waiting for resumption. Have you looked into Greenlet.kill(exception=CustomException)? In your child, you could have a greenlet with the sole purpose to listen for a certain 'interrupt' event sent by your master process. Upon retrieval, send a custom exception to your target greenlet. Handle this exception. This is a simple way of communicating an event. Other than that, you might want to make use of gevent.event.

Earlier you wrote that you think about but do not like implementing a custom polling- and/or timeout-based solution: I totally agree and there for sure is a way to stay withing the canonical boundaries of the framework.

jgehrcke commented 9 years ago

Original comment by Ivan Voras (Bitbucket: ivoras, GitHub: ivoras):


Sure, but you did not cover the case I described.

Let me simplify it like this:

How would you solve it without select()ing on two points of entry: #1: the socket to the remote service, #2: a pipe or socket to the master process?

Note that this simplified case is mostly useless as in practice, where a single OS process would spawn a large number of greenlets to communicate with a large number of remote services.

jgehrcke commented 9 years ago

Original comment by Jan-Philip Gehrcke (Bitbucket: jgehrcke, GitHub: jgehrcke):


Thanks for explaining your use case in more detail. Please forgive my ignorance, but it sounds like this is the canonical use case that gevent handles for you automatically, under the hood, and out-of-the box, and gipc integrates perfectly fine with this concept. That is exactly the magic of gevent, and you might not really have understood so far how that works (which wouldn't be a shame, of course!). At least, your comment "enable the master process to communicate with this greenlet which is normally blocked in read() on a socket" indicates that you need to better understand how gevent works under the hood.

Gevent works on top of greenlets. Greenlets allow for fast context switches. In terms of gevent, a context is a Python function. That is, gevent can just switch between greenlets (functions). When does it switch out of a greenlet? If a normally blocking system call is used. Instead of blocking the current OS thread (which is what a blocking system call normally does), the execution flow leaves the greenlet, and the gevent hub takes control, taking notice of the I/O event that this greenlet is waiting for. The hub requests a notification about this event from the underlying event loop. As soon as this event occurs, the hub is notified and resumes the previously interrupted greenlet. The magic is: gevent allows for a "synchronous style" programming, but the execution flow actually jumps between functions, non-deterministically. These jumps are coordinates by the gevent hub which always runs, under the hood. You do not normally interact with the hub in application code.

Read this carefully: in an I/O-bound application (which your child process clearly is), many greenlets can co-exist simultaneously, and they can simultaneously wait for certain I/O-events to happen. Whenever an I/O event happens, the gevent hub makes sure that the corresponding greenlet (which previously told the hub "hey, I am waiting for a certain event") resumes execution. Another quite important insight: all this happens in one OS thread, so greenlets never execute simultaneously, they just wait simultaneously. While they are waiting, the gevent hub takes control under the hood and delegates execution depending on incoming I/O events.

Hence, all you need to do in your child process is to spawn a greenlet for the socket interaction, spawn another greenlet for the pipe interaction. Implement them in a way that looks as if it were blocking. Gevent makes sure that both, the socket and pipe, are monitored simultaneously for incoming data.

jgehrcke commented 9 years ago

Original comment by Ivan Voras (Bitbucket: ivoras, GitHub: ivoras):


I'm interested in discussing alternative approaches, this is still in the design phase.

So the thing I am trying to accomplish is this: gevents simulates blocking IO in greenlets, so I can use get() on a pipe and read() on a socket, as you describe, but I also want the behaviour which will enable the master process to communicate with this greenlet which is normally blocked in read() on a socket. One possible solution here is for the socket to have a timeout, and then have the loop checking for some kind of input (other socket? pipe? doesn't matter really) from the master process. I don't like this solution.

The other solution is to use select() to determine if either the socket or the whatever-method fd to the master process have something to read, then process the arriving data in a "normal" way with read() or get().

How would you solve this requirement?

jgehrcke commented 9 years ago

Original comment by Jan-Philip Gehrcke (Bitbucket: jgehrcke, GitHub: jgehrcke):


Hey Ivan. Sure, gevent provides select(), but merely as a utility method and not as a primary way for implementing a "wait-for-input" controller.

First of all, like I said, you can use gevent's select() with the _fd attribute of gipc pipe handles for playing around with your scenario.

Just take care of the fact that gipc pipe handles are either duplex or non-duplex, i.e. in the former case one handle corresponds to two file descriptors, and in the latter case to only one. You need to use dir() to find the implementation, but it really is simple. The fact that gipc pipe handles can be either duplex or unidirectional also is a strong reason why I do not see a meaningful way for implementing a fileno() method for them. Once again, all that doesn't prevent you from select()ing on the underlying file descriptors if you really want to. Just do it.

However, the scenario you are describing sounds a little bogus to me. I assume that a "gipc-friendly process" is just a process spawned by gipc. So, if you are running greenlets in a gipc child process, then the canonical way to communicate back and forth between the parent process and all the greenlets in the child process is through gipc pipes just via put() and get(), clearly without having the need to explicitly apply select(). So,

"the greenlets select() on both the pipe to the master process"

would be a design flaw. This is not necessary and, as stated in my first response, might lead to race conditions. I cannot really comment on the

"the greenlets select() on [...] the socket to the remote service"

part, but it looks like this also is not the canonical way to do this with gevent. In the gevent world, you would just monkey-patch the socket implementation and use a synchronous programming approach. This would fulfill your goal ("whichever has some data to process is processed") without using select().

jgehrcke commented 9 years ago

Original comment by Ivan Voras (Bitbucket: ivoras, GitHub: ivoras):


gevent also provides a greenlet-friendly select() (see http://www.gevent.org/gevent.select.html), so my intended design of the app is the following:

jgehrcke commented 9 years ago

Original comment by Jan-Philip Gehrcke (Bitbucket: jgehrcke, GitHub: jgehrcke):


Hello Ivan,

It is true that gipc pipe handles do not provide fileno(). However, I am not convinced that this would be a useful addition in general. What is it that you are trying to achieve with using select() on the underlying file descriptors?

gipc pipe access works in cooperative (non-blocking) mode by default, leveraging the mechanisms provided by gevent and libev. That is, the underlying system already monitors the corresponding file descriptors for I/O events, and the put() and get() methods of gipc pipes perfectly integrate with the gevent framework.

That said, the gipc pipe handles "expose" the underlying file descriptors via their _fd attribute. You can make your tests with this, if you like to.

However, "manually" applying select() on these descriptors in application code would work around the gevent mechanism, and potentially even collide with that mechanism, yielding mean race conditions.

You should never read or write directly from/to the underlying file descriptors. For shifting around data on gipc pipes, you must use the put() and get() methods. And this is exactly why I do not understand how you would integrate this with manually calling select(). Did I convince you already or can you elaborate?