Closed GoogleCodeExporter closed 9 years ago
Whoops, forgot to mention that it's created with robustness enabled.
Original comment by d...@sherk.me
on 6 Dec 2011 at 12:53
It does look like there is a problem here.
To reproduce in top of tree Chromium, open the JavaScript console and run e.g.
sdk/tests/extra/slow-shader-example.html. Note that the loss of context is
detected and reported to the console.
Then navigate to e.g. the San Angeles demo. First, a "sad tab" page is
displayed. This is likely a bug in Chromium. Reload the page. The system will
claim that WebGL is not enabled.
Killing the GPU process in Chrome's Task Manager and reloading the page allows
it to continue. This indicates the GPU process is probably wedged.
It looks like this might have regressed in
http://code.google.com/p/angleproject/source/detail?r=807 .
Original comment by kbr@chromium.org
on 6 Dec 2011 at 1:18
Thanks for the more in-depth repro. I'm using the Khronos
lots-of-polys-example.html test. Fx doesn't have a separate GPU process, so
this bug locks up the whole browser.
Original comment by d...@sherk.me
on 6 Dec 2011 at 2:26
I don't think this is a regression from r807 -- We had the same polling loop
previously it just moved around. In theory the GetData call should return
D3DERR_DEVICELOST instead of S_FALSE if the device has been lost, and then the
loop should break out...
Original comment by dan...@transgaming.com
on 6 Dec 2011 at 4:01
[deleted comment]
I was able to reproduce the GPU process hang, but it's not stuck in
Context::sync. Instead there are continuous glGetError calls, which return
GL_OUT_OF_MEMORY because the context is lost. The proper response to
GL_OUT_OF_MEMORY is to recreate the context.
ANGLE r902
Chrome 17.0.963.1 canary
Windows 7 x64
ATI FirePro V5700 (FireGL)
Original comment by nicolas....@gmail.com
on 7 Dec 2011 at 4:45
I was using an NVIDIA NVS 4200M. I haven't tried it on ANGLE r902 yet, though.
Original comment by d...@sherk.me
on 8 Dec 2011 at 1:32
Note that Chromium bug 95740 tracks a similar issue.
Original comment by kbr@chromium.org
on 8 Dec 2011 at 7:29
jbauman, since you're working on the Chromium side bug and I expect part of the
fix will be in ANGLE, could you take this? Let me know if I can help with any
WebGL specific issues.
Original comment by kbr@chromium.org
on 8 Dec 2011 at 7:34
Please note that r903 and r905 contain important fixes which likely affect
device recovery.
Original comment by nicolas....@gmail.com
on 8 Dec 2011 at 7:56
Looks like chrome is spinning in a loop because ANGLE continually returns
GL_OUT_OF_MEMORY, which seems to be against the GL spec: "When GetError is
called,
the code is returned and the flag is cleared, so that a further error will
again record
its code. If a call to GetError returns NO ERROR, then there has been no
detectable
error since the last call to GetError (or since the GL was initialized)".
Original comment by jbauman@chromium.org
on 9 Dec 2011 at 11:25
Yes -- we're repeatedly returning GL_OUT_OF_MEMORY if the context is lost. As
you pointed out this isn't correct and will break any application that polls
until glGetError returns GL_NONE.
The problem is that we're manufacturing the error in glGetError. If we want to
keep the OOM error for GL calls with a lost context, the error will instead
have to be set when calling other GL functions, not from glGetError itself.
I'll fix this issue.
Original comment by dan...@transgaming.com
on 12 Dec 2011 at 3:57
r905 fixes the OOM error reporting when the context is lost. The OOM error
will only be returned on a lost device when another GL call has been made
between sets of calls to glGetError, and thus it should be safe to call
glGetError in a loop until GL_NO_ERROR is returned.
Recovery is still not fully working in Chrome though because the device reset
is failing. Looks like there are still some dangling references.
Original comment by dan...@transgaming.com
on 12 Dec 2011 at 9:08
We found what prevented the Reset from succeeding, and we're currently auditing
the code for similar issues. We'll have a fix ASAP.
Original comment by nicolas....@gmail.com
on 13 Dec 2011 at 4:45
r908 and r909 should get context recovery working again.
With these changes I can do the following in Chrome Canary 17.0.936.6.
- run San Angeles Demo in one tab
- run the lots-of-polys example in another tab. This produces a TDR and the driver is reset.
- after the driver recovers, I can go to the San Angeles tab, reload and it works fine.
This has been verified on Win7 with d3d9 and d3d9ex.
(Note that I've never been able to reproduce the original issue reported by
Doug. Doug - can you retest with r909 to see if it works better for you now?)
Original comment by dan...@transgaming.com
on 13 Dec 2011 at 5:37
This is not fixed on r912.
Here's a tiny stack trace (although I don't suspect this will help):
> libGLESv2.dll!gl::Context::sync(bool block) Line 2905 C++
libGLESv2.dll!glFinish() Line 2028 C++
xul.dll!mozilla::gl::GLContext::raw_fFinish() Line 1845 C++
My guess is that you can't repro it because of our difference in hardware. My
hardware is the following:
Thinkpad T520
NVIDIA NVS 4200M
Driver version 8.17.12.8562
Original comment by d...@sherk.me
on 15 Dec 2011 at 9:21
Ok, chrome (or most other programs) rarely uses glFinish, so that's probably
why we haven't been hitting this.
Original comment by jbauman@chromium.org
on 15 Dec 2011 at 10:10
It probably doesn't hurt to test for context loss in the polling loop.
Original comment by nicolas....@gmail.com
on 15 Dec 2011 at 10:17
Yeah, busy-waiting slightly less efficiently probably won't be a problem.
Original comment by jbauman@chromium.org
on 15 Dec 2011 at 10:20
r924 adds in a check for context loss while busy waiting. Should fix this
problem.
Original comment by dan...@transgaming.com
on 16 Dec 2011 at 11:39
I can confirm that r924 fixes this problem for me.
Original comment by d...@sherk.me
on 17 Dec 2011 at 12:31
Original issue reported on code.google.com by
d...@sherk.me
on 6 Dec 2011 at 12:48