stackless-dev / stackless

The Stackless Python programming language
http://www.stackless.com/
Other
1.03k stars 60 forks source link

select.select() consuming excessive process time on Ubuntu & MacOS #234

Open adde1 opened 4 years ago

adde1 commented 4 years ago

Hi,

I have been using stackless for some time, but now I am stuck and need to ask for help. In short, the call to select.select() consumes excessive processing time (equal to the wall clock) in some scenarios. It seems it happens when the system get busy, but I have not been able to boil it down better than that.

The behaviour is not consistent across platforms and versions of Stackless. When I started using Stackless back around 2.7.2 I did not have this performance problem. I first got problems on MacOS around 2.7.9 but since I was anyway about to finish up my then big project I just switched to working on Ubuntu. But now I get similar symptoms on Ubuntu as well.

The core loop of my project has not changed significantly since the start. I also don't know what I could have done wrong on Python side to have select.select behave almost like if it was implemented with a loop (but only in some cases).

I would like to move onto Conda because for my new project I need numpy, scipy, and pygame at the same time (as well as FORTRAN compiler) but with the current issues I am kind of stuck.

The behaviour I get is as follows:

Ubuntu 12, 14, 16 - Stackless built locally - Intel 2500K

Ubuntu 18 - Conda environment - Ryzen 3700

Ubuntu 18 - Stackless built locally - Ryzen 3700

MacOS - Conda environment - Intel Core i5 (c:a 2013)

MacOS - Downloaded installer - Intel Core i5 (c:a 2013)

Sorry for the vague error report, but I just don't have a lot to go on. Any help will be appreciated.

Thank you in advance and best regards,

Andreas

kristjanvalur commented 4 years ago

Hi there. So, you are using the plain old select.select(), I gather, and no special stackless features? From your description the problems seems limited to Ubuntu 18 on conda, with which I am not familiar. Why do you think this problem is peculiar to Stackless? Does regular python show the same problem?

adde1 commented 4 years ago

Hi Kristjan,

I am using select.select() to switch between sockets (for interprocess/intermachine communication) and Stackless channels/tasklets. There is also a scheduling function so I rely on the timeout of select.select() for it to wake up. My guess is that you would find something similar at the core of any framework supporting inter process communication and concurrency.

The framework makes a fair amount of use of tasklets and cooperative scheduling, enough so that running on standard python is not an option and migrating to a thread based approach would be a fairly steep investment.

Of course I cannot rule out that the problem is in Ubuntu, but given the fundamental nature of select.select() and that I see the same issues on both MacOS and Linux I think it is a less likely source.

Similarly with Python, I was assuming that standard Python was implementing a fairly straight call to the underlying select.select() and there should not be many sources of bugs here. But I have also not looked at the Python implementation (and to be honest it is probably beyond my skills in C anyway).

Two quick questions for trying to pin down the problem:

  1. Does the Stackless implementation do anything special that in any way affects the select.select() statement in Python?
  2. Is there any other more modern way to incorporate sockets with stackless for concurrency that does not include a call to select.select()?
  3. Back in the days, I remember seeing a reference implementation of the socket module for Stackless. Is that still around, or was that incorporated into the Stackless distribution?

Thank you in advance and best regards :-)

And oh, I used to maintain a Windows dev environment too that unfortunately died some time ago. I'll see if I can resurrect that and if the problem exists on Windows or not.

kristjanvalur commented 4 years ago

select.select() is unchanged in stackless. It basically waits for file/socket IO and wakes up if these become readable/writable. From your description, it sounds like you are using select() to wait for socket IO, and then take these messages and send them into channels. if your cpu is spent in the select() call, it points to some operating system issue, possibly Ubuntu on this particular platform.

A typical stackless loop would be something like (pseudocode) while true:

run tasklets, look at custom stakless timers for wakeup time if idle

wakeup_time = perform_scheduling_and_find_next_wakeup_time()
io = wait_for_io_until(wakeup_time)   # essentially a select()/poll/()

call.

So, you need to see if it s the wait_for_io that is causing the cpu to remain high, or possibly that your sleep time is very low, possibly even 0, maybe because of some delta-time computations not being done correctly. In short, select.select() is not something within control of stackless. Eiher a) select() system call is very inefficient in this configuration or b) something is wrong in the scheduling code and you timeout is too low, causing unnecessary spin in the loop.

Regardless of all that, you should be using poll() rather than select if possible.

fös., 5. jún. 2020 kl. 09:50 skrifaði adde1 notifications@github.com:

Hi Kristjan,

I am using select.select() to switch between sockets (for interprocess/intermachine communication) and Stackless channels/tasklets. There is also a scheduling function so I rely on the timeout of select.select() for it to wake up.

The framework makes a fair amount of use of tasklets and cooperative scheduling, enough so that running on standard python is not an option and migrating to a thread based approach would be a fairly steep investment.

Of course I cannot rule out that the problem is in Ubuntu, but given the fundamental nature of select.select() and that I see the same issues on both MacOS and Linux I think it is a less likely source.

Similarly with Python, I was assuming that standard Python was implementing a fairly straight call to the underlying select.select() and there should not be many sources of bugs here. But I have also not looked at the Python implementation (and to be honest it is probably beyond my skills in C anyway).

Two quick questions for trying to pin down the problem:

  1. Does the Stackless implementation do anything special that in anyway affects the select.select() statement in Python?
  2. Is there any other more modern way to incorporate sockets with stackless for concurrency that does not include a call to select.select()?
  3. Back in the days, I remember seeing a reference implementation of the socket module for Stackless. Is that still around, or was that incorporated into the Stackless distribution?

Thank you in advance and best regards :-)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stackless-dev/stackless/issues/234#issuecomment-639376695, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABN3FR4MU4TOJMYH2AB52TTRVC5WJANCNFSM4NQSINXQ .

kristjanvalur commented 4 years ago

sorry, this should have been: while true:

run tasklets, look at custom stakless timers for wakeup time if idle

wakeup_time = perform_scheduling_and_find_next_wakeup_time()
io = wait_for_io_until(wakeup_time)   # essentially a select()/poll/()

call. This is the "idle" point in your program. act_on_io(io) # send messages to tasklets, etc.

fös., 5. jún. 2020 kl. 10:50 skrifaði Kristján Valur Jónsson < sweskman@gmail.com>:

select.select() is unchanged in stackless. It basically waits for file/socket IO and wakes up if these become readable/writable. From your description, it sounds like you are using select() to wait for socket IO, and then take these messages and send them into channels. if your cpu is spent in the select() call, it points to some operating system issue, possibly Ubuntu on this particular platform.

A typical stackless loop would be something like (pseudocode) while true:

run tasklets, look at custom stakless timers for wakeup time if idle

wakeup_time = perform_scheduling_and_find_next_wakeup_time()
io = wait_for_io_until(wakeup_time)   # essentially a select()/poll/()

call.

So, you need to see if it s the wait_for_io that is causing the cpu to remain high, or possibly that your sleep time is very low, possibly even 0, maybe because of some delta-time computations not being done correctly. In short, select.select() is not something within control of stackless. Eiher a) select() system call is very inefficient in this configuration or b) something is wrong in the scheduling code and you timeout is too low, causing unnecessary spin in the loop.

Regardless of all that, you should be using poll() rather than select if possible.

fös., 5. jún. 2020 kl. 09:50 skrifaði adde1 notifications@github.com:

Hi Kristjan,

I am using select.select() to switch between sockets (for interprocess/intermachine communication) and Stackless channels/tasklets. There is also a scheduling function so I rely on the timeout of select.select() for it to wake up.

The framework makes a fair amount of use of tasklets and cooperative scheduling, enough so that running on standard python is not an option and migrating to a thread based approach would be a fairly steep investment.

Of course I cannot rule out that the problem is in Ubuntu, but given the fundamental nature of select.select() and that I see the same issues on both MacOS and Linux I think it is a less likely source.

Similarly with Python, I was assuming that standard Python was implementing a fairly straight call to the underlying select.select() and there should not be many sources of bugs here. But I have also not looked at the Python implementation (and to be honest it is probably beyond my skills in C anyway).

Two quick questions for trying to pin down the problem:

  1. Does the Stackless implementation do anything special that in anyway affects the select.select() statement in Python?
  2. Is there any other more modern way to incorporate sockets with stackless for concurrency that does not include a call to select.select()?
  3. Back in the days, I remember seeing a reference implementation of the socket module for Stackless. Is that still around, or was that incorporated into the Stackless distribution?

Thank you in advance and best regards :-)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stackless-dev/stackless/issues/234#issuecomment-639376695, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABN3FR4MU4TOJMYH2AB52TTRVC5WJANCNFSM4NQSINXQ .

adde1 commented 4 years ago

Thank you Kristjan,

Thank you for the confirmation that Stackless does not modify the select.select() call!

I'll try switching to poll, and dig around a bit more.

I'll keep this issue open for a little while more, I'll report back my findings.

Again, thank you :-)

adde1 commented 4 years ago

Hi,

I have now:

  1. Tested the old code on Windows 10, with Stackless from conda. It works perfectly (like it used to on the other platforms as well)
  2. Instrumented the code to see that there was no bug in the delta-time calculation. I even get the same high CPU load when I lock the timeout to 0.5 seconds (resulting in 2 iterations per second when there is no communication).
  3. Tested to replace select.select() with select.poll(). It did not make any noticable difference - the problem still persists.

The core loop looks pretty much exaclty like Kristjan describes. And it has been working for years (until recently).

The only lead I have is that with the same version of Stackless (Python 2.7.16 Stackless 3.1b3 060516) I get different result depending on if I use the build provided by conda, or if I use the build I built locally. They were built with different compilers (GCC 7.3.0 vs. GCC 7.4.0) and perhaps some differences in the dependencies that got linked in. But I don't know what to make of that.

If anyone has any thought on what to try, please let me know.

Cheers,

Andreas

kristjanvalur commented 4 years ago

What happens if you create an artificial program that just does select, with a long timeout. Will it consume CPU? You can then compare different pythons with and without stackless.

On Sun, 7 Jun 2020, 17:57 adde1, notifications@github.com wrote:

Hi,

I have now:

  1. Tested the old code on Windows 10, with Stackless from conda. It works perfectly (like it used to on the other platforms as well)
  2. Instrumented the code to see that there was no bug in the delta-time calculation. I even get the same high CPU load when I lock the timeout to 0.5 seconds (resulting in 2 iterations per second when there is no communication).
  3. Tested to replace select.select() with select.poll(). It did not make any noticable difference - the problem still persists.

The core loop looks pretty much exaclty like Kristjan describes. And it has been working for years (until recently).

The only lead I have is that with the same version of Stackless (Python 2.7.16 Stackless 3.1b3 060516) I get different result depending on if I use the build provided by conda, or if I use the build I built locally. They were build with different compilers (GCC 7.3.0 vs. GCC 7.4.0) and perhaps some differences in the dependencies that get linked in. But I don't know what to make of that.

If anyone has any thought on what to try, please let me know.

Cheers,

Andreas

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stackless-dev/stackless/issues/234#issuecomment-640255880, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABN3FRZ2PSRDGRQAV4YRXM3RVPIIPANCNFSM4NQSINXQ .

adde1 commented 3 years ago

Hi,

After doing a bit of other work in C, I mustered up the courage to dig into the implementation of selectmodule.c

At least on Debian, the problem with the excessive load was solved when I commented out Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS. So I believe the problem is within Python and not the operating system.

I have not (yet) tried to track down why Python seem to go into some infinite loop when the other threads are allowed. I am worried this may be over my head. But we will see...

Cheers