Closed MichaelAz closed 10 years ago
Not getting this behavior on Linux with those versions. Will get a Windows7 virtualbox set up to try things out.
If you just run: from goless.backends import current; current.yield_()
, (something like that) what happens? gevent has never had an issue yielding on the last greenlet, so I don't know where this behavior is coming from... (I am adding some tests to verify this behavior).
Will probably be late Sunday when I am able to look into this on Windows, have weekend plans.
That code runs fine. I'll investigate further, see if I can find anything useful.
So, something interesting right off the bat. The benchmark contains this code:
def main():
prime()
bench_channels()
bench_selects()
prime just runs the benchmarks without writing any output, so we can ignore it, but an interesting thing happens when we comment out bench_channels
- the error raised by bench_selects
magically transforms into a Deadlock error.
The reason for this is that by running bench_channels
the errors location changes.
When it's run, the error happens in selecting.py, 93
, in the statement _be.yield_()
.
When it' isn't run, the error happens in selecting.py, 92
in the statement return c, c.exec_()
.
exec_
causes a send\receive which is wrapped by the _as_deadlock
decorator and thus causes a sane error.
yield_
isn't wrapped by that decorator and because of that we get the cryptic error.
So, perhaps we should think of wrapping exceptions thrown in yield_
. Next.
Inside, bench_selects
it is specifically the call to bench_select(False)
that raises the exception.
The reason for this difference in behavior is that by passing True
to bench_select
we cause a dcase
to be added to the case list, so, when none of the other channels are ready the script doesn't throw, but rather uses that default case.
There's some subtle race condition here, I believe, with sending to a full channel, because switching to a buffered channel with buffer size 2. I honestly have no idea what's going on here but I re-wrote it from scratch and it seems to work now. Unless you find a better explanation for this behavior, I think I'll commit the re-written version.
Ok, I've improved the behavior of asdeadlock to include the original stacktrace, and yield should not raise if its the last tasklet. I'll dig into this on Windows now.
May take a while to get my Windows box set up for development... in the meantime, could you try with the tip of gevent in github?
There's some subtle race condition here, I believe, with sending to a full channel, because switching to a buffered channel with buffer size 2.
Yes very likely. We suspect this is why the pypystackless tests don't work either. I will work through this code and see.
Also going from the gevent docs, it appears libev has some problems on Windows- not just bugs but also uknown errors. There could also be some gevent->libev bugs on Windows.
Ok so here's some progress for the morning. A bit of a mind-dump, maybe writing it out will help uncover something?
I can repro easily (on Windows only) by taking the bench_select
code into a script and running that. Unfortunately the behavior disappears within a test framework or under the debugger!
This has nothing to do with a deadlock, so I've removed the as_deadlock catch for SystemError. We are putting gevent/libev into a bad state somehow- I suspect the same thing is happening that is causing pypystackless to be in a bad state. It's the same sort of thing- symptom is that there's no runnable tasklet or whatever, but that cannot really be. Solving one may solve the other! (See #2 )
This is where it gets interesting. On my machine, I consistently fail at iteration 997. However, if instead of:
def sender():
while True:
c.send(0)
c.recv()
I have (you may need to import backends first):
def sender():
while True:
c.send(0)
backends.current.yield_()
c.recv()
I fail on iteration 499- which is about half of 997. Do you get the same behavior @MichaelAz , or is that just coincidence on my end? I suspect you are spot on, that the problem is send/recv to a full channel and the behavior that goes on there. The semantics are not totally clear- a blocked send will of course yield, but how about an unblocked send? I can't remember if its tested, or even defined. There are some potential problems to work through. Will keep the thread updated over the next few days.
Updates:
from gevent.queue import Channel; Channel().get()
will raise a SystemError on Windows but LoopExit on Linux.Okay, confirmed a few things. Basically, something that will deadlock or run perfectly well on Linux will raise on Windows:
from gevent.queue import Channel
import gevent
c = Channel()
def sender():
while True:
c.put(0)
gevent.spawn(c.put, 1)
for i in range(1000):
gevent.sleep(0)
Will exit fine on Linux, will error on Windows. I also cannot replicate in all cases, like under a test runner.
I can catch the error in select and ignore it to replicate the Linux behavior on Windows. I am not sure what else I could do, and other than performance and more Windows bugs in the future, I'm not sure what else we can do. It's up to libev/gevent to fix.
It's been a crazy week, I'll go over your updates more thoroughgly tomorow evening/friday morning.
This is where it gets interesting. On my machine, I consistently fail at iteration 997. However, if instead of:
def sender():
while True:
c.send(0)
c.recv()
I have (you may need to import backends first):
def sender():
while True:
c.send(0)
backends.current.yield_()
c.recv()
I'm getting the same behavior.
If this is really a bug in gevent on Windows we ought to open an issue with them. But, since this is (probably) related to the pypystackless bug - perhaps we're at fault. I really don't know. Could you link to the docs you mentioned about gevent having problems on windows?
When I dug into it, I don't think the pypystackless and gevent-windows problems are related. I think this is genuinely a bug in gevent/libev on Windows, as I was able to repro it in a purely gevent environment (see my previous comment).
Regarding the links, I wish I had taken better notes. I can only find a few pages, mostly concerning gevent's switch from libevent to libev, and libev's inferior Windows support:
Specifically there was a page I cannot now find that said something like "There should be fewer unknown errors on Windows"- I think it was for gevent (a changelog?) but could have been for libev as well. I will open a ticket with the gevent repo.
As discussed in surfly/gevent#459, adding a call to import socket
seems to solve the issue.
I'm creating a PR for this, even though the solution is extremely hacky.
I'd say calling WSAStartup
with ctypes is less hacky but it'll just require us to re-implement the relevant part of socket
and that's not DRY.
Confirmed this fixed the issue on my Windows virtualbox. I am flabberghasted by this issue. Hopefully gevent fixes the actual problem. Any solution on our side is 'hacky' so don't worry about importing socket not being optimal.
When running benchmark.py as found in master, a SystemError is raised:
would_deadlock
passes and this consistently happens at the 495 itteration (at least for me).I'm running the code on Windows 7 with the gevent backend, and gevent==1.0.1, greenlet==0.4.2.