Closed Fabxx closed 1 year ago
Update: apparently this crash is not dependant on the EEPROM since restoring the backup doesn't fix this anymore, and i've experienced these crashes only in SC_DA and CT, so i'm changing the title to make it more appropriate to the context
I was able to reproduce this with @revix-0 's help, also on an NVIDIA card in Linux (1070, driver is older, 470.103.01)
In my case this did not crash, however, if I restart xemu (close the program entirely and reopen), then load the state, then repeat steps 3 and 4, I get a consistent crash.
Furthermore, once this has happened, it will happen on a new game without using the save state (indicating that something is likely corrupted in the cache).
Running in the debugger, I get a segfault in pgraph.c's pgraph_upload_surface_data
on the final glTexImage2D
call.
width: 447 height: 447 surface->fmt: fmt = {SurfaceFormatInfo} bytes_per_pixel = {unsigned int} 2 [0x2] gl_internal_format = {GLint} 33189 [0x81a5] (GL_DEPTH_COMPONENT16) gl_format = {GLenum} 6402 [0x1902] (GL_DEPTH_COMPONENT) gl_type = {GLenum} 5123 [0x1403] (GL_UNSIGNED_SHORT) gl_attachment = {GLenum} 36096 [0x8d00] (GL_DEPTH_ATTACHMENT)
gl_read_buf is 0x7fff2b613460 and contains plausible looking data.
The master
.
UPdate: looks like that reinstalling the drivers and running xemu on first boot fixes this thing, but as soon you restart it, the crashes start again, i tested with 3XX, 4XX and 5XX drivers
UPdate 2: other steps to reproduce the crash: -Reinstall the NVIDIA driver -Run xemu for the first time, it will not crash on first run after a clean install. -Cose it entirely, even without loading a game first -Reopen xemu, load double agent, do the steps of abaire -Crash
Importantly, you do not need to mess with save states at all and the bad state persists even with a clean HDD image.
xemu will crash reliably with the stack trace above.
@revix-0 mentioned that if you reinstall the driver, use a blank HDD image, play any game and then load SC:DA without restarting xemu, the crash is not reproducible.
Once in a crashing state, the crash seems mostly but not 100% reproducible on my machine. I have had a couple instances where I was able to get all the way back to the starting area without a crash, but more cases where it does crash in the same way.
Update: a user tested this on AMD RX 560 and it doesn't crash, also a note on the HDD, you don't need a blank image to avoid the crash, reinstalling the driver and run xemu on first run is sufficient, on second run it will crash anyways. we suspect that this is a native NVIDIA issue
I personally could not reproduce this with XEMU 0.6.6, NVIDIA 512.59 Drivers and a 2080ti
I only tried the snapshot method although after retrying from a clean HDD three times I was not able to trigger this.
UPDATE: by removing this if condition in the pgraph.c and by making the rendering slower, the race condition doesn't happen anymore and the libnvidiagl crash doesn't happen on the levels that cause this. AN alternative is to play with upscaling at 2x so it slows down a bit and doesn't crash either.
if (!pgraph_surface_to_texture_can_fastpath(surface, texture_shape)) {
pgraph_render_surface_to_texture_slow(d, surface, texture,
texture_shape, texture_unit);
return;
}
UPDATE: looks like that it always crashes at this instruction:
GLDBG[MARKER][NOTIFICATION]> nv2a: pgraph method (0): 0x97 -> 0x1d94 NV097_CLEAR_SURFACE[0] 0x1
where there's a z24s8-fixed integer that it's being handled, possibily a race condition while the game swaps between 32-bit and 16-bit texture buffers, or there's a incorrect handling of the fixed integer
UPDATE 2: it's a 16 bit texture and the Z buffer state remains at 1 which is correct.
nv2a: [RAM->GPU] ZETA (lin) surface @ 28dc000 (w=457,h=457,p=1024,bpp=2)
the crash always happens with a 16 bit texture with 1024 of pitch
We suspect that the texure is being destroyed before upload
UPDATE 3: the main functions where it crashes following GDB stack:
pgraph_upload_surface_data(d, pg->zeta_binding, false);
glTexImage2D(GL_TEXTURE_2D, 0, surface->fmt.gl_internal_format, width, height, 0, surface->fmt.gl_format, surface->fmt.gl_type, gl_read_buf);
pgraph_update_surface(d, true, write_color, write_zeta)
Adding the log with the 6 GDB Frames of the nvidia driver while processing the data: libnvidia frames.txt
Bug Description
While playing SC:DA, on sea of ohkoskh level or JBA HQ part 1/2, i always get this stack in the same spots no matter what: Nvidia_GL_crash_log.txt
(log is from 0.6.2 master, but it happens on the lastes master as well with the same nv2a instructions)
Spots screen: JBA HQ Downstairs
Crawl space beginning of sea of okhosk:
Expected Behavior
This behavior shouldn't happen and should not affect nvidia GL driver.
xemu Version
0.6.3-8-g30a872fa83
System Information
CPU: Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz OS Platform: Linux OS Version: Manjaro Linux Manufacturer: NVIDIA Corporation GPU Model: NVIDIA GeForce GTX 970/PCIe/SSE2 Driver: 4.0.0 NVIDIA 510.60.02 Shader: 4.00 NVIDIA via Cg compiler
Additional Context
No response