Closed vessellaj closed 5 years ago
Using Connection.serve_all() in its own thread on Windows 10 results in calls to select.select choking out the execution of other threads when select.select is given a timeout.
What do you mean "choking out"? According to the python source code, the GIL is released in select.select, so you mean it cycles at 100% CPU?
Judging from the code in rpyc.core.stream.Win32PipeStream.poll
someone had the same idea as you (using time.sleep(timeout)
)
IMO, I think time.sleep(0.1)
is not viable, so I'd prefer to have something better. However, I haven't looked into windows things so far..
What python version are you running on windows, do you use any special server code?
Can you provide an example server+client so I can check if I can reproduce the issue?
Best, Thomas
Using Python 3.5.2 on Windows 10.
By choking out the main thread, what's happening is that the calls to select.select are somehow blocking the main thread from executing at a reasonable pace. CPU usage seems to remain at or near 0% the whole time, yet the main thread's processing massively slows down. A task that normally takes about 5-7 seconds to run without the Connection.serve_all thread running (or with my workaround) will take a minute or more to run when I have Connection.serve_all in its own thread.
Well this is interesting...
I'm not able to reproduce the issue on a small example at all. I suppose there's some strange interaction happening in the main project that doesn't behave well with the default Connection.serve_all.
Unfortunately, the main project is currently internal and hasn't gone through the legal hoops for releasing the source yet, so I can't even give you that to try.
The server code I'm using is a placeholder for now, and its code is the following:
import threading
import rpyc
from rpyc.utils import server
class ServerService(rpyc.Service):
@staticmethod
def exposed_push_feedback(status_dict, exception):
print(status_dict)
print(exception)
@staticmethod
def exposed_register(name, operating_system):
print('New connection! %s running on %s.' % (name, operating_system))
srv = server.ThreadPoolServer(ServerService, port=18812)
t = threading.Thread(target=srv.start)
t.daemon = True
t.start()
I start the server with python3 -i server.py
I'm just interacting with the client by calling srv.fd_to_conn[<fd>].root.new_task(<args>)
, and of course the client has a service providing the new_task method.
For now I can get by using my workaround, and if it does end up released I could potentially reopen this issue later.
Quick question: are the other threads IO bound or CPU? I could imagine multiple select in parallel not working because of some weird issue. And how many file descriptors are you waiting on? I think there can be performance issues if you have many. It's hard to say anything definite without reproducible example.
Otherwise, feel free to keep it open or closed as you like.
I would say it's mostly I/O bound, as it's most often waiting on external programs. We're using the Windows COM system to control MS Office programs - in particular, Outlook. We're also using Selenium to control Firefox, which I think uses sockets.
I did notice that the Outlook and Firefox tasks are where this problem is readily apparent, but I was noticing the slowdown in parts of those tasks where there should be no I/O with those external programs - there's a method in each that is just validating the configuration input, and I was seeing that each line executed in that method was spaced apart by several seconds each after I had started the task through RPC. This is the client program.
None of that happens when I just load a configuration file from the disk, or use our (outdated) Boost IPC method, and of course it stops happening when I directly call the Connection.serve(0)
method in a loop with time.sleep(0.1)
instead of Connection.serve_all
.
At the time I discovered this, I was only waiting on one file descriptor in both client and server.
So I checked out at 2689759
. Was able to reproduce the issue
Confirmed the minor improvement using suggestions from this thread.
Checked out current master and tested again
This has been resolved already---most likely duplicate of #306 (where I found the test case).
Using Connection.serve_all() in its own thread on Windows 10 results in calls to select.select choking out the execution of other threads when select.select is given a timeout.
In order to work around this issue, I've implemented my own serve_all which calls Connection.serve(0) and uses time.sleep(0.1) to limit CPU spinning. This approach does not have the issue.
I've tested the issue with Python 3.4.3 on Fedora 23, and threading Connection.serve_all does not cause this issue there. I don't know if this is the select.select function causing this in general, or if it's specifically the Windows system call since the Linux polling method does not use select.select.