microsoft / wslg

Enabling the Windows Subsystem for Linux to include support for Wayland and X server related scenarios
MIT License
9.91k stars 296 forks source link

GUI app exit with segmentfault in NVIDIA GPU driver #715

Open huiminghao opened 2 years ago

huiminghao commented 2 years ago

Version

Microsoft Windows [版本 10.0.22598.1]

WSL Version

Kernel Version

5.10.102.1

Distro Version

Debian 11

Other Software

默认分发: Debian 默认版本: 2 WSL 版本: 0.58.1.0 内核版本: 5.10.102.1 WSLg 版本: 1.0.33 MSRDC 版本: 1.2.2924 Direct3D 版本: 1.601.0 Windows 版本: 10.0.22598.1

Repro Steps

run wsl from cmd. run glxinfo -B from wsl. run qtcreator from wsl.

Expected Behavior

glxinfo exits normally. And other GUI/qt app should work normally.

Actual Behavior

~$ glxinfo -B
name of display: :0
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Microsoft Corporation (0xffffffff)
    Device: D3D12 (NVIDIA GeForce GTX 1650) (0xffffffff)
    Version: 22.0.1
    Accelerated: yes
    Video memory: 20024MB
    Unified memory: no
    Preferred profile: core (0x1)
    Max core profile version: 3.3
    Max compat profile version: 3.3
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.1
OpenGL vendor string: Microsoft Corporation
OpenGL renderer string: D3D12 (NVIDIA GeForce GTX 1650)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 22.0.1 - kisak-mesa PPA
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 3.3 (Compatibility Profile) Mesa 22.0.1 - kisak-mesa PPA
OpenGL shading language version string: 3.30
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile

OpenGL ES profile version string: OpenGL ES 3.1 Mesa 22.0.1 - kisak-mesa PPA
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10

Segmentation fault

run qtcreator, close app, and get

~$ qtcreator
Segmentation fault

Diagnostic Logs

everything goes well before doing wsl --update, and after windows update(Windows 11 Insider Preview 22598.1 (ni_release)). dmeg:

qtcreator[341]: segfault at 7f9e2c5e4000 ip 00007f9e18405aca sp 00007f9e264e5db0 error 4 in libnvwgf2umx.so[7f9e17aba000+1415000]
[  235.082142] Code: 01 be 00 00 00 00 31 d2 45 31 f6 eb 1d 90 90 90 8b 45 10 48 8b 54 24 10 48 83 c2 01 89 c1 83 c6 01 48 39 ca 0f 83 56 01 00 00 <41> 8b 4c 95 00 0f b7 c9 3b 8c 95 44 04 00 00 74 dd 89 8c 95 44 04
[  235.083786] potentially unexpected fatal signal 11.
[  235.084190] CPU: 3 PID: 341 Comm: qtcreator Not tainted 5.10.102.1-microsoft-standard-WSL2 microsoft/WSL#1
[  235.084897] RIP: 0033:0x7f9e18405aca
[  235.085212] Code: 01 be 00 00 00 00 31 d2 45 31 f6 eb 1d 90 90 90 8b 45 10 48 8b 54 24 10 48 83 c2 01 89 c1 83 c6 01 48 39 ca 0f 83 56 01 00 00 <41> 8b 4c 95 00 0f b7 c9 3b 8c 95 44 04 00 00 74 dd 89 8c 95 44 04
[  235.086602] RSP: 002b:00007f9e264e5db0 EFLAGS: 00010246
[  235.087094] RAX: 000000000000000e RBX: 000000000135f000 RCX: 00000000013ca020
[  235.087889] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000135f000
[  235.088617] RBP: 00000000013c8be0 R08: 0000000000000000 R09: 00007ffe18a5c080
[  235.089205] R10: 000000000000aea4 R11: 0000000000000000 R12: 00007f9e264e5e28
[  235.089835] R13: 00007f9e2c5e4000 R14: 0000000000000000 R15: 00007f9e16ec2101
[  235.090474] FS:  00007f9e264e6700 GS:  0000000000000000
[  235.090978] dxgk:err: wait_for_complition failed: fffffe00
[  235.091020] dxgk:err: wait_for_complition failed: fffffe00
[  235.092278] dxgk:err: did not find packet to complete
CarlosNihelton commented 2 years ago

This issue caught my attention due Flutter apps crashing on exit, right after updating Windows to build 22598.200. I'm right now on 22610.1 and still any Flutter app, and the other apps mentioned above crash on exit with segmentation fault, so something related to OpenGL changed recently.

To add more context about my development machine, here it is:

wsl --version:

Versão do WSL: 0.58.3.0
Versão do kernel: 5.10.102.1
Versão do WSLg: 1.0.33
Versão do MSRDC: 1.2.2924
Versão do Direct3D: 1.601.0
Versão do Windows: 10.0.22610.1

Experimented with Ubuntu versions 20.04and 22.04.

CarlosNihelton commented 2 years ago

Also, this bare bones OpenGL sample crashes in the very same way (either compiled with gcc or clang):

cc tetrahedron.cpp -o tetrahedron -lglut -lGL

// This is a simple introductory program; its main window contains a static
// picture of a tetrahedron, whose top vertex is white and whose bottom
// vertices are red, green and blue.  The program illustrates viewing by
// defining an object at a convenient location, then transforming it so that
// it lies within the view volume.  This is a lousy way to do things (it's
// easier to use gluLookAt()), but it's nice to see how viewing is done at
// a very low level.

#ifdef __APPLE_CC__
#include <GLUT/glut.h>
#else
#include <GL/glut.h>
#endif

// Clears the window and draws the tetrahedron.  The tetrahedron is  easily
// specified with a triangle strip, though the specification really isn't very
// easy to read.
void display() {
  glClear(GL_COLOR_BUFFER_BIT);

  // Draw a white grid "floor" for the tetrahedron to sit on.
  glColor3f(1.0, 1.0, 1.0);
  glBegin(GL_LINES);
  for (GLfloat i = -2.5; i <= 2.5; i += 0.25) {
    glVertex3f(i, 0, 2.5); glVertex3f(i, 0, -2.5);
    glVertex3f(2.5, 0, i); glVertex3f(-2.5, 0, i);
  }
  glEnd();

  // Draw the tetrahedron.  It is a four sided figure, so when defining it
  // with a triangle strip we have to repeat the last two vertices.
  glBegin(GL_TRIANGLE_STRIP);
    glColor3f(1, 1, 1); glVertex3f(0, 2, 0);
    glColor3f(1, 0, 0); glVertex3f(-1, 0, 1);
    glColor3f(0, 1, 0); glVertex3f(1, 0, 1);
    glColor3f(0, 0, 1); glVertex3f(0, 0, -1.4);
    glColor3f(1, 1, 1); glVertex3f(0, 2, 0);
    glColor3f(1, 0, 0); glVertex3f(-1, 0, 1);
  glEnd();

  glFlush();
}

// Sets up global attributes like clear color and drawing color, enables and
// initializes any needed modes (in this case we want backfaces culled), and
// sets up the desired projection and modelview matrices. It is cleaner to
// define these operations in a function separate from main().
void init() {

  // Set the current clear color to sky blue and the current drawing color to
  // white.
  glClearColor(0.1, 0.39, 0.88, 1.0);
  glColor3f(1.0, 1.0, 1.0);

  // Tell the rendering engine not to draw backfaces.  Without this code,
  // all four faces of the tetrahedron would be drawn and it is possible
  // that faces farther away could be drawn after nearer to the viewer.
  // Since there is only one closed polyhedron in the whole scene,
  // eliminating the drawing of backfaces gives us the realism we need.
  // THIS DOES NOT WORK IN GENERAL.
  glEnable(GL_CULL_FACE);
  glCullFace(GL_BACK);

  // Set the camera lens so that we have a perspective viewing volume whose
  // horizontal bounds at the near clipping plane are -2..2 and vertical
  // bounds are -1.5..1.5.  The near clipping plane is 1 unit from the camera
  // and the far clipping plane is 40 units away.
  glMatrixMode(GL_PROJECTION);
  glLoadIdentity();
  glFrustum(-2, 2, -1.5, 1.5, 1, 40);

  // Set up transforms so that the tetrahedron which is defined right at
  // the origin will be rotated and moved into the view volume.  First we
  // rotate 70 degrees around y so we can see a lot of the left side.
  // Then we rotate 50 degrees around x to "drop" the top of the pyramid
  // down a bit.  Then we move the object back 3 units "into the screen".
  glMatrixMode(GL_MODELVIEW);
  glLoadIdentity();
  glTranslatef(0, 0, -3);
  glRotatef(50, 1, 0, 0);
  glRotatef(70, 0, 1, 0);
}

// Initializes GLUT, the display mode, and main window; registers callbacks;
// does application initialization; enters the main event loop.
int main(int argc, char** argv) {
  glutInit(&argc, argv);
  glutInitDisplayMode(GLUT_SINGLE | GLUT_RGB);
  glutInitWindowPosition(80, 80);
  glutInitWindowSize(800, 600);
  glutCreateWindow("A Simple Tetrahedron");
  glutDisplayFunc(display);
  init();
  glutMainLoop();
}
antasp commented 2 years ago

I have the same issue.

WSL-version: 0.61.4.0
Kernelversion: 5.10.102.1
WSLg-version: 1.0.39
MSRDC-version: 1.2.3213
Direct3D-version: 1.601.0
DXCore-version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows: 10.0.22000.739
glxinfo -B
name of display: :0
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Microsoft Corporation (0xffffffff)
    Device: D3D12 (NVIDIA RTX A3000 Laptop GPU) (0xffffffff)
    Version: 21.2.6
    Accelerated: yes
    Video memory: 38412MB
    Unified memory: no
    Preferred profile: core (0x1)
    Max core profile version: 3.3
    Max compat profile version: 3.1
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.0
OpenGL vendor string: Microsoft Corporation
OpenGL renderer string: D3D12 (NVIDIA RTX A3000 Laptop GPU)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 21.2.6
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 3.1 Mesa 21.2.6
OpenGL shading language version string: 1.40
OpenGL context flags: (none)

OpenGL ES profile version string: OpenGL ES 3.0 Mesa 21.2.6
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.00

Segmentation fault
facboy commented 2 years ago

i'm getting segfaults in libnvwgf2umx.so running Intellij IDEA.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f8d5a7c73a1, pid=1065, tid=1120
#
# JRE version: OpenJDK Runtime Environment JBR-11.0.15.10-2043.56-jcef (11.0.15+10) (build 11.0.15+10-b2043.56)
# Java VM: OpenJDK 64-Bit Server VM JBR-11.0.15.10-2043.56-jcef (11.0.15+10-b2043.56, mixed mode, tiered, compressed oop
s, parallel gc, linux-amd64)
# Problematic frame:
# C  [libnvwgf2umx.so+0x1a463a1]
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c
 -d%d -P%P -u%u -g%g -- %E" (or dumping to /home/chrisng/core.1065)
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#

and THREAD details:

Host: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz, 12 cores, 15G, Ubuntu 22.04 LTS
Time: Fri Jul  8 10:23:08 2022 BST elapsed time: 711.917453 seconds (0d 0h 11m 51s)

---------------  T H R E A D  ---------------

Current thread is native thread

Stack: [0x00007f8d3685c000,0x00007f8d3705b000],  sp=0x00007f8d37059d90,  free space=8183k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libnvwgf2umx.so+0x1a463a1]

[error occurred during error reporting (printing native stack), id 0xb, SIGSEGV (0xb) at pc=0x00007f8e927ba292]

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x00007f8d530b1000
PhyX-Meow commented 1 year ago

I have same issue, actually I got segfault when I exit any gui apps.

CarlosNihelton commented 1 year ago

I've made some upgrades recently:

Still crashing on exit in the very same way:

❯ glxinfo -B
name of display: :0
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Microsoft Corporation (0xffffffff)
    Device: D3D12 (NVIDIA GeForce MX130) (0xffffffff)
    Version: 22.0.1
    Accelerated: yes
    Video memory: 10126MB
    Unified memory: no
    Preferred profile: core (0x1)
    Max core profile version: 3.3
    Max compat profile version: 3.3
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.1
OpenGL vendor string: Microsoft Corporation
OpenGL renderer string: D3D12 (NVIDIA GeForce MX130)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 22.0.1
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 3.3 (Compatibility Profile) Mesa 22.0.1
OpenGL shading language version string: 3.30
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile

OpenGL ES profile version string: OpenGL ES 3.1 Mesa 22.0.1
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10

[1]    1519 segmentation fault  glxinfo -B
sarim commented 1 year ago

WSL version: 0.64.0.0 Kernel version: 5.10.102.1 WSLg version: 1.0.40 MSRDC version: 1.2.3213 Direct3D version: 1.601.0 DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp Windows version: 10.0.22621.232

$ glxinfo -B
name of display: :0
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Microsoft Corporation (0xffffffff)
    Device: D3D12 (NVIDIA GeForce RTX 2060 SUPER) (0xffffffff)
    Version: 22.0.1
    Accelerated: yes
    Video memory: 24371MB
    Unified memory: no
    Preferred profile: core (0x1)
    Max core profile version: 3.3
    Max compat profile version: 3.3
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.1
OpenGL vendor string: Microsoft Corporation
OpenGL renderer string: D3D12 (NVIDIA GeForce RTX 2060 SUPER)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 22.0.1
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 3.3 (Compatibility Profile) Mesa 22.0.1
OpenGL shading language version string: 3.30
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile

OpenGL ES profile version string: OpenGL ES 3.1 Mesa 22.0.1
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10

Segmentation fault

Anyone got any workarounds? I had to resort back to VcXsrv and disablw wslg :(

Venorcis commented 1 year ago

Same issue with IntelliJ crashes on my NVIDIA laptop. On my AMD desktop I don't experience this.

sarim commented 1 year ago

Another observation: if we use systemd bottle, genie / distrod, the segfault occurs for more apps / occasions. For example android emulator always segfaults for me when inside systemd. But outside systemd It open once or twice over ~20 tries. As the crush is happening from libnv... .so, I wonder if its even fixable from wslg? Who maintains those nvidia linux wsl drivers?

catkira commented 1 year ago

any progress here? I have the same problem.

4x7y commented 1 year ago

Same issue for my NVIDIA GeForce GTX 1650 card with driver version 516.94.

lucmann commented 1 year ago

same issue when I ran glmark2 or glmark2-es2

some backtraces, HTH

Reading symbols from glmark2...
(gdb) r -b build:nframes=1
Starting program: /usr/bin/glmark2 -b build:nframes=1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Detaching after vfork from child process 6542]
[New Thread 0x7fffec81e700 (LWP 6544)]
[New Thread 0x7fffe7a9e700 (LWP 6545)]
[New Thread 0x7fffe729d700 (LWP 6546)]
[New Thread 0x7fffe6a9c700 (LWP 6547)]
[New Thread 0x7fffe629b700 (LWP 6548)]
** GLX does not support GLX_EXT_swap_control or GLX_MESA_swap_control!
** Failed to set swap interval. Results may be bounded above by refresh rate.
=======================================================
    glmark2 2021.12
=======================================================
    OpenGL Information
    GL_VENDOR:      Microsoft Corporation
    GL_RENDERER:    D3D12 (NVIDIA GeForce MX250)
    GL_VERSION:     3.1 Mesa 21.2.6
    Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0
    Surface Size:   800x600 windowed
=======================================================
** GLX does not support GLX_EXT_swap_control or GLX_MESA_swap_control!
** Failed to set swap interval. Results may be bounded above by refresh rate.
[build] nframes=1: FPS: 31 FrameTime: 32.258 ms
=======================================================
                                  glmark2 Score: 31
=======================================================
Thread 2 "glmark2" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffec81e700 (LWP 6544)]
0x00007fffee5113aa in ?? () from /usr/lib/wsl/drivers/nvdm.inf_amd64_201e30fdba70f061/libnvwgf2umx.so
(gdb) bt
#0  0x00007fffee5113aa in ?? () from /usr/lib/wsl/drivers/nvdm.inf_amd64_201e30fdba70f061/libnvwgf2umx.so
#1  0x00007fffee50f94c in ?? () from /usr/lib/wsl/drivers/nvdm.inf_amd64_201e30fdba70f061/libnvwgf2umx.so
#2  0x00007fffee50f8d6 in ?? () from /usr/lib/wsl/drivers/nvdm.inf_amd64_201e30fdba70f061/libnvwgf2umx.so
#3  0x00007ffff7309609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#4  0x00007ffff7996133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) info shared
From                To                  Syms Read   Shared Object Library
0x00007ffff7fd0100  0x00007ffff7ff2684  Yes         /lib64/ld-linux-x86-64.so.2
0x00007ffff7fb2220  0x00007ffff7fb3179  Yes         /lib/x86_64-linux-gnu/libdl.so.2
0x00007ffff7f30540  0x00007ffff7f73469  Yes (*)     /lib/x86_64-linux-gnu/libjpeg.so.8
0x00007ffff7ef9510  0x00007ffff7f1d29d  Yes (*)     /lib/x86_64-linux-gnu/libpng16.so.16
0x00007ffff7dd00c0  0x00007ffff7e59766  Yes (*)     /lib/x86_64-linux-gnu/libX11.so.6
0x00007ffff7c73160  0x00007ffff7d5b452  Yes (*)     /lib/x86_64-linux-gnu/libstdc++.so.6
0x00007ffff7a933c0  0x00007ffff7b39fa8  Yes         /lib/x86_64-linux-gnu/libm.so.6
0x00007ffff7a6c5e0  0x00007ffff7a7d045  Yes (*)     /lib/x86_64-linux-gnu/libgcc_s.so.1
0x00007ffff7899630  0x00007ffff7a0e27d  Yes         /lib/x86_64-linux-gnu/libc.so.6
0x00007ffff785d280  0x00007ffff786df3b  Yes (*)     /lib/x86_64-linux-gnu/libz.so.1
0x00007ffff783c620  0x00007ffff784f699  Yes (*)     /lib/x86_64-linux-gnu/libxcb.so.1
0x00007ffff782c360  0x00007ffff782d052  Yes (*)     /lib/x86_64-linux-gnu/libXau.so.6
0x00007ffff78231a0  0x00007ffff7824a03  Yes (*)     /lib/x86_64-linux-gnu/libXdmcp.so.6
0x00007ffff780be40  0x00007ffff7819e69  Yes (*)     /lib/x86_64-linux-gnu/libbsd.so.0
0x00007ffff77be1c0  0x00007ffff77c101d  Yes (*)     /lib/x86_64-linux-gnu/libGL.so
0x00007ffff7703240  0x00007ffff77054eb  Yes (*)     /lib/x86_64-linux-gnu/libGLdispatch.so.0
0x00007ffff7692700  0x00007ffff76ac48c  Yes (*)     /lib/x86_64-linux-gnu/libGLX.so.0
0x00007ffff7630d80  0x00007ffff7672382  Yes (*)     /lib/x86_64-linux-gnu/libGLX_mesa.so.0
0x00007ffff75e81a0  0x00007ffff75f5a60  Yes (*)     /lib/x86_64-linux-gnu/libglapi.so.0
0x00007ffff75cac40  0x00007ffff75d4e19  Yes         /lib/x86_64-linux-gnu/libdrm.so.2
0x00007ffff75b30a0  0x00007ffff75bbb29  Yes (*)     /lib/x86_64-linux-gnu/libxcb-glx.so.0
0x00007ffff75a4040  0x00007ffff75a411f  Yes (*)     /lib/x86_64-linux-gnu/libX11-xcb.so.1
0x00007ffff759e0a0  0x00007ffff759f309  Yes (*)     /lib/x86_64-linux-gnu/libxcb-dri2.so.0
0x00007ffff758b5e0  0x00007ffff759584e  Yes (*)     /lib/x86_64-linux-gnu/libXext.so.6
0x00007ffff7581300  0x00007ffff75836ea  Yes (*)     /lib/x86_64-linux-gnu/libXfixes.so.3
0x00007ffff7579240  0x00007ffff757b9f9  Yes (*)     /lib/x86_64-linux-gnu/libXxf86vm.so.1
0x00007ffff75740e0  0x00007ffff7574c37  Yes (*)     /lib/x86_64-linux-gnu/libxcb-shm.so.0
0x00007ffff7549230  0x00007ffff7564b07  Yes (*)     /lib/x86_64-linux-gnu/libexpat.so.1
0x00007ffff75410e0  0x00007ffff7541fd7  Yes (*)     /lib/x86_64-linux-gnu/libxcb-dri3.so.0
0x00007ffff753b0a0  0x00007ffff753b97a  Yes (*)     /lib/x86_64-linux-gnu/libxcb-present.so.0
0x00007ffff7533100  0x00007ffff7535416  Yes (*)     /lib/x86_64-linux-gnu/libxcb-sync.so.1
0x00007ffff732e960  0x00007ffff732ec2c  Yes (*)     /lib/x86_64-linux-gnu/libxshmfence.so.1
0x00007ffff73270a0  0x00007ffff73298de  Yes (*)     /lib/x86_64-linux-gnu/libxcb-xfixes.so.0
0x00007ffff7307ae0  0x00007ffff7317535  Yes         /lib/x86_64-linux-gnu/libpthread.so.0
0x00007ffff09b6e30  0x00007ffff330f633  Yes (*)     /lib/x86_64-linux-gnu/libLLVM-12.so.1
0x00007fffefe64230  0x00007fffefe69a46  Yes (*)     /lib/x86_64-linux-gnu/libffi.so.7
0x00007fffefe32fb0  0x00007fffefe4cf24  Yes (*)     /lib/x86_64-linux-gnu/libedit.so.2
0x00007fffefe22720  0x00007fffefe25d70  Yes         /lib/x86_64-linux-gnu/librt.so.1
0x00007fffefdfe6a0  0x00007fffefe0c17c  Yes (*)     /lib/x86_64-linux-gnu/libtinfo.so.6
0x00007fffefd62720  0x00007fffefde23af  Yes (*)     /usr/lib/wsl/lib/libdxcore.so
0x00007fffefc99df0  0x00007fffefd114d3  Yes (*)     /usr/lib/wsl/lib/libd3d12.so
0x00007fffed73f010  0x00007fffeea0a8f1  Yes (*)     /usr/lib/wsl/drivers/nvdm.inf_amd64_201e30fdba70f061/libnvwgf2umx.so
(*): Shared library is missing debugging information.
(gdb) info thread
  Id   Target Id                                  Frame
  1    Thread 0x7ffff7805dc0 (LWP 6538) "glmark2" 0x00007ffff798583b in __GI___close (fd=4) at ../sysdeps/unix/sysv/linux/close.c:27
* 2    Thread 0x7fffec81e700 (LWP 6544) "glmark2" 0x00007fffee5113aa in ?? () from /usr/lib/wsl/drivers/nvdm.inf_amd64_201e30fdba70f061/libnvwgf2umx.so
  3    Thread 0x7fffe7a9e700 (LWP 6545) "glmark2" futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fffe0000b88) at ../sysdeps/nptl/futex-internal.h:183
  4    Thread 0x7fffe729d700 (LWP 6546) "glmark2" futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fffe0000b88) at ../sysdeps/nptl/futex-internal.h:183
  5    Thread 0x7fffe6a9c700 (LWP 6547) "glmark2" futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fffe0000b88) at ../sysdeps/nptl/futex-internal.h:183
  6    Thread 0x7fffe629b700 (LWP 6548) "glmark2" futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fffe0000b88) at ../sysdeps/nptl/futex-internal.h:183

/mnt/wslg/versions.txt

WSLg ( x86_64 ): 1.0.42+Branch.main.Sha.78fa15d6ffd8fad62343b194d39c142e57cc1869
Mariner: VERSION="2.0.20220426"
DirectX-Headers:
mesa:
pulseaudio: 2f0f0b8c3872780f15e275fc12899f4564f01bd5
FreeRDP: c574044a10003e50453acb4cf42801c5833fb572
weston: d57e0de202fd049f6cc0ce8c25b7120feea3468f
fc4525 commented 1 year ago

Hello,

I am also running into segmentation fault when trying any of the graphics CUDA Sample files (e.g. fluidGL). The GUI window opens briefly before exiting with segmentation fault. The installation is below.

I also tried the same samples from Windows 11 using VS and they work. The graphics segmentation fault happens only in WSL2 in my case.

- CUDA 11.6 (also tried 11.7)
- Ubuntu 22.04 on WSL2
- Windows 11  22H2
- Thinkpad P1 gen2 with nVidia Quadro T2000
- nVidia drivers tried (516.94 and 517.40)

  /mnt/wslg/versions.txt: 
  WSLg ( x86_64 ): 1.0.45+Branch.main.Sha.254cbbc0eb26fe4bf82e9656470ebb5546a84a23
  Mariner: VERSION="2.0.20220426"
  DirectX-Headers:
  mesa:
  pulseaudio: 2f0f0b8c3872780f15e275fc12899f4564f01bd5
  FreeRDP: c574044a10003e50453acb4cf42801c5833fb572
  weston: 2270ceb3cf75a03e8b3f073eca2c5dc04b12e504

   glxinfo -B:
   name of display: :0
   display: :0  screen: 0
   direct rendering: Yes
   Extended renderer info (GLX_MESA_query_renderer):
        Vendor: Microsoft Corporation (0xffffffff)
        Device: D3D12 (NVIDIA Quadro T2000) (0xffffffff)
        Version: 22.0.5
        Accelerated: yes
        Video memory: 36583MB
        Unified memory: no
        Preferred profile: core (0x1)
        Max core profile version: 3.3
        Max compat profile version: 3.3
        Max GLES1 profile version: 1.1
        Max GLES[23] profile version: 3.1
   OpenGL vendor string: Microsoft Corporation
   OpenGL renderer string: D3D12 (NVIDIA Quadro T2000)
   OpenGL core profile version string: 3.3 (Core Profile) Mesa 22.0.5
   OpenGL core profile shading language version string: 3.30
   OpenGL core profile context flags: (none)
   OpenGL core profile profile mask: core profile

   OpenGL version string: 3.3 (Compatibility Profile) Mesa 22.0.5
   OpenGL shading language version string: 3.30
   OpenGL context flags: (none)
   OpenGL profile mask: compatibility profile

   OpenGL ES profile version string: OpenGL ES 3.1 Mesa 22.0.5
   OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10

   Segmentation fault

Felix

junaire commented 1 year ago

Having similar issues:

wsl --version
WSL version: 0.70.0.0
kernel version: 5.15.68.1
WSLg version: 1.0.45
MSRDC version: 1.2.3575
Direct3D version: 1.606.4
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.22621.675

Running hotspot build from the recent source, when doing nothing but exit:

* thread #25, name = 'hotspot', stop reason = signal SIGSEGV: invalid address (fault address: 0x7fffc80b4000)
    frame #0: 0x00007fff7f4ce4ba libnvwgf2umx.so`___lldb_unnamed_symbol185521 + 106
libnvwgf2umx.so`___lldb_unnamed_symbol185521:
->  0x7fff7f4ce4ba <+106>: movl   (%r13,%rdx,4), %ecx
    0x7fff7f4ce4bf <+111>: movzwl %cx, %ecx
    0x7fff7f4ce4c2 <+114>: cmpl   0x444(%rsi,%rdx,4), %ecx
    0x7fff7f4ce4c9 <+121>: je     0x7fff7f4ce4a8            ; <+88>
(lldb) bt
* thread #25, name = 'hotspot', stop reason = signal SIGSEGV: invalid address (fault address: 0x7fffc80b4000)
  * frame #0: 0x00007fff7f4ce4ba libnvwgf2umx.so`___lldb_unnamed_symbol185521 + 106
    frame #1: 0x00007fff7f4cc8ee libnvwgf2umx.so`___lldb_unnamed_symbol185494 + 126
    frame #2: 0x00007fff7f4cc866 libnvwgf2umx.so`___lldb_unnamed_symbol185493 + 6
    frame #3: 0x00007ffff54b7b43 libc.so.6`start_thread(arg=<unavailable>) at pthread_create.c:442:8
    frame #4: 0x00007ffff5549a00 libc.so.6`__clone3 at clone3.S:81
(lldb)

Got a segment fault...

alonbl commented 1 year ago

Hello Microsoft,

How come no response from anyone at Microsoft, this is not an open source project in which contributors can fix issues. This issue is easy to reproduce and confirm that exists and a bug.

Also reproduced using:

WSL version: 0.70.4.0
Kernel version: 5.15.68.1
WSLg version: 1.0.45
MSRDC version: 1.2.3575
Direct3D version: 1.606.4
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.22621.674

glxgears and glxinfo may be used to reproduce, nothing special is required, below is an example of glxinfo of mesa-utils-8.4.0-1ubuntu1.

Recent windows, ubuntu-22.04.

$ gdb --args glxinfo -B
<snip>
Thread 2 "glxinfo" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffea468640 (LWP 507)]
0x00007fffec15a0ca in ?? () from /usr/lib/wsl/drivers/nvlt.inf_amd64_922792e2d39c47ae/libnvwgf2umx.so
(gdb) bt
#0  0x00007fffec15a0ca in ?? () from /usr/lib/wsl/drivers/nvlt.inf_amd64_922792e2d39c47ae/libnvwgf2umx.so
#1  0x00007fffec15866c in ?? () from /usr/lib/wsl/drivers/nvlt.inf_amd64_922792e2d39c47ae/libnvwgf2umx.so
#2  0x00007fffec1585f6 in ?? () from /usr/lib/wsl/drivers/nvlt.inf_amd64_922792e2d39c47ae/libnvwgf2umx.so
#3  0x00007ffff7c49b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#4  0x00007ffff7cdba00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Regards,

PhyX-Meow commented 1 year ago

Hey, the issue seems disappear on my machine today, suddenly. My environment:

WSL version: 0.70.5.0
Kernel version: 5.15.68.1
WSLg version: 1.0.45
MSRDC version: 1.2.3575
Direct3D version: 1.606.4
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.25236.1010

GPU: NVIDIA RTX 2060 Super
GPU driver version: game ready driver 526.47

Distro: Archlinux
mesa version: 22.2.2-1 (Updated today)

glxinfo -B output:

name of display: :0
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Microsoft Corporation (0xffffffff)
    Device: D3D12 (NVIDIA GeForce RTX 2060 SUPER) (0xffffffff)
    Version: 22.2.2
    Accelerated: yes
    Video memory: 24365MB
    Unified memory: no
    Preferred profile: core (0x1)
    Max core profile version: 4.2
    Max compat profile version: 4.2
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.1
OpenGL vendor string: Microsoft Corporation
OpenGL renderer string: D3D12 (NVIDIA GeForce RTX 2060 SUPER)
OpenGL core profile version string: 4.2 (Core Profile) Mesa 22.2.2
OpenGL core profile shading language version string: 4.20
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 4.2 (Compatibility Profile) Mesa 22.2.2
OpenGL shading language version string: 4.20
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile

OpenGL ES profile version string: OpenGL ES 3.1 Mesa 22.2.2
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10
huiminghao commented 1 year ago

not fixed yet, verions:

WSL 版本: 0.70.8.0
内核版本: 5.15.74.2
WSLg 版本: 1.0.47
MSRDC 版本: 1.2.3575
Direct3D 版本: 1.606.4
DXCore 版本: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows版本: 10.0.22622.586

just use following to avoid.

cat /etc/profile.d/wslg.sh 
#export MESA_D3D12_DEFAULT_ADAPTER_NAME="AMD"
export MESA_D3D12_DEFAULT_ADAPTER_NAME="llvm"
export LIBGL_ALWAYS_SOFTWARE=true
NinovanderMark commented 1 year ago

I have this same issue on my new Thinkpad P16S laptop running Ubuntu 22.04 in WSL on Windows 10.

PS C:\Users\NinovanderMark> wsl --version
WSL version: 1.0.0.0
Kernel version: 5.15.74.2
WSLg version: 1.0.47
MSRDC version: 1.2.3575
Direct3D version: 1.606.4
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.19045.2311
ninovdmark@LAPTOP-M3OIV1UR:~$ glxinfo -B
name of display: :0
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Microsoft Corporation (0xffffffff)
    Device: D3D12 (NVIDIA T550 Laptop GPU) (0xffffffff)
    Version: 22.0.5
    Accelerated: yes
    Video memory: 20201MB
    Unified memory: no
    Preferred profile: core (0x1)
    Max core profile version: 3.3
    Max compat profile version: 3.3
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.1
OpenGL vendor string: Microsoft Corporation
OpenGL renderer string: D3D12 (NVIDIA T550 Laptop GPU)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 22.0.5
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 3.3 (Compatibility Profile) Mesa 22.0.5
OpenGL shading language version string: 3.30
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile

OpenGL ES profile version string: OpenGL ES 3.1 Mesa 22.0.5
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10

Segmentation fault

Running from gdb indicates that the issue is coming from the NVIDIA driver.

OpenGL ES profile version string: OpenGL ES 3.1 Mesa 22.0.5
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10

[Thread 0x7fffe26ad640 (LWP 2906) exited]

Thread 2 "glxinfo" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffea480640 (LWP 2899)]
0x00007fffec0f7391 in ?? () from /usr/lib/wsl/drivers/nvlt.inf_amd64_0e3c491f201d0b53/libnvwgf2umx.so
mflagg2814 commented 1 year ago

I’m also seeing this issue. I’ve been forced to completely disable my Dell Precision 5550’s Quadro T1000 dGPU so I can do my job. My issue matches the reproduction steps and backtraces already posted.

IkonOne commented 1 year ago

Still present for me as well.

PS C:\Users\ikono> wsl --version
WSL version: 1.0.3.0
Kernel version: 5.15.79.1
WSLg version: 1.0.47
MSRDC version: 1.2.3575
Direct3D version: 1.606.4
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.22621.963
ikonone@devbot:~$ glxinfo -B
name of display: :0
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Microsoft Corporation (0xffffffff)
    Device: D3D12 (NVIDIA GeForce RTX 3060 Laptop GPU) (0xffffffff)
    Version: 22.0.5
    Accelerated: yes
    Video memory: 26335MB
    Unified memory: no
    Preferred profile: core (0x1)
    Max core profile version: 3.3
    Max compat profile version: 3.3
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.1
OpenGL vendor string: Microsoft Corporation
OpenGL renderer string: D3D12 (NVIDIA GeForce RTX 3060 Laptop GPU)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 22.0.5
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 3.3 (Compatibility Profile) Mesa 22.0.5
OpenGL shading language version string: 3.30
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile

OpenGL ES profile version string: OpenGL ES 3.1 Mesa 22.0.5
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10

Segmentation fault
ikonone@devbot:~$
(gdb) run -B
Starting program: /usr/bin/glxinfo -B
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
name of display: :0
[Detaching after vfork from child process 161]
[New Thread 0x7fffe9699640 (LWP 163)]
[New Thread 0x7fffe3fff640 (LWP 164)]
[New Thread 0x7fffe37fe640 (LWP 165)]
[New Thread 0x7fffe2ffd640 (LWP 166)]
[New Thread 0x7fffe27fc640 (LWP 167)]
[New Thread 0x7fffe16fb640 (LWP 168)]
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Microsoft Corporation (0xffffffff)
    Device: D3D12 (NVIDIA GeForce RTX 3060 Laptop GPU) (0xffffffff)
    Version: 22.0.5
    Accelerated: yes
    Video memory: 26335MB
    Unified memory: no
    Preferred profile: core (0x1)
    Max core profile version: 3.3
    Max compat profile version: 3.3
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.1
OpenGL vendor string: Microsoft Corporation
OpenGL renderer string: D3D12 (NVIDIA GeForce RTX 3060 Laptop GPU)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 22.0.5
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
[New Thread 0x7fffe0efa640 (LWP 169)]
[Thread 0x7fffe16fb640 (LWP 168) exited]

OpenGL version string: 3.3 (Compatibility Profile) Mesa 22.0.5
OpenGL shading language version string: 3.30
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile
[New Thread 0x7fffe16fb640 (LWP 170)]
[Thread 0x7fffe0efa640 (LWP 169) exited]

OpenGL ES profile version string: OpenGL ES 3.1 Mesa 22.0.5
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10

[Thread 0x7fffe16fb640 (LWP 170) exited]

Thread 2 "glxinfo" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe9699640 (LWP 163)]
0x00007fffec15bf0a in ?? () from /usr/lib/wsl/drivers/nvamig.inf_amd64_e3bf6f587f5b65de/libnvwgf2umx.so
(gdb)
timweckx commented 1 year ago

Same issue

glxinfo and glxgears both run, but always end with segmentation fault. other graphics applications that require openGL fail to run altogether

WSL version: 1.0.3.0
Kernel version: 5.15.79.1
WSLg version: 1.0.47
MSRDC version: 1.2.3575
Direct3D version: 1.606.4
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.19045.2486
> LIBGL_DEBUG=VERBOSE glxinfo -B
name of display: :0
libGL: Can't open configuration file /etc/drirc: No such file or directory.
libGL: Can't open configuration file /home/tweckx/.drirc: No such file or directory.
libGL: Can't open configuration file /etc/drirc: No such file or directory.
libGL: Can't open configuration file /home/tweckx/.drirc: No such file or directory.
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Microsoft Corporation (0xffffffff)
    Device: D3D12 (NVIDIA GeForce RTX 2070 Super) (0xffffffff)
    Version: 22.0.5
    Accelerated: yes
    Video memory: 24258MB
    Unified memory: no
    Preferred profile: core (0x1)
    Max core profile version: 3.3
    Max compat profile version: 3.3
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.1
OpenGL vendor string: Microsoft Corporation
OpenGL renderer string: D3D12 (NVIDIA GeForce RTX 2070 Super)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 22.0.5
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 3.3 (Compatibility Profile) Mesa 22.0.5
OpenGL shading language version string: 3.30
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile

OpenGL ES profile version string: OpenGL ES 3.1 Mesa 22.0.5
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10

Segmentation fault
afilp commented 1 year ago

Could the investigation of this issue be related? It seems that Microsoft is looking at it right now. I hope they can look into our "Segmentation fault" issues too, as we cannot work with some GUI apps.

https://github.com/microsoft/WSL/issues/8696

sarim commented 1 year ago

Could the investigation of this issue be related? It seems that Microsoft is looking at it right now. I hope they can look into our "Segmentation fault" issues too, as we cannot work with some GUI apps.

microsoft/WSL#8696

It not related to hibernate in any way.

dshadowwolf commented 1 year ago

Same issue, no OpenGL required -- just a simple display of an image using SDL2. On application exit there is the crash in the NVidia driver, at exactly the place shown in the lldb provided by @junaire.

wsl --version results:

WSL version: 1.1.0.0
Kernel version: 5.15.83.1
WSLg version: 1.0.48
MSRDC version: 1.2.3770
Direct3D version: 1.608.2-61064218
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.25284.1000

And error is literally:

* thread #2, name = 'game-thing', stop reason = signal SIGSEGV: invalid address (fault address: 0x7fffea7bc000)
    frame #0: 0x00007fffed23ef0a libnvwgf2umx.so`___lldb_unnamed_symbol196498$$libnvwgf2umx.so + 106
libnvwgf2umx.so`___lldb_unnamed_symbol196498$$libnvwgf2umx.so:
->  0x7fffed23ef0a <+106>: movl   (%r13,%rdx,4), %ecx
    0x7fffed23ef0f <+111>: movzwl %cx, %ecx
    0x7fffed23ef12 <+114>: cmpl   0x44c(%rsi,%rdx,4), %ecx
    0x7fffed23ef19 <+121>: je     0x7fffed23eef8            ; <+88>

I added some extra debugging to the program I was testing (just a quick&dirty bit to re-familiarize myself with SDL and get comfortable with SDL2 after a long period of doing a lot of systems work) and it reaches the final return 0 of main() (or thereabouts) and then crashes. This occurs after releasing everything that was allocated and making output about such.

Note: I did add a breakpoint in lldb and single-step and it did not crash on that run, so this might be a timing issue where one of the threads tries a use-after-free, but... I have no clue, as I didn't write any code for extra threads in the quick test program.

alonbl commented 1 year ago

This is fixed for me today with ubuntu-22.04 latest update, glxinfo no longer crashes.

mshvyndya commented 1 year ago

This is fixed for me today with ubuntu-22.04 latest update, glxinfo no longer crashes.

For me too

CarlosNihelton commented 1 year ago

I can confirm I can no longer reproduce the issue. I certified of not changing the NVidia driver on Windows and tested glxinfo -B before upgrading my instance. It was still crashing prior to but stopped after the upgrade. Not the package providing glxinfo (mesa-utils) but likely one of the libgl dependencies carries the fix. Testing on an Ubuntu 20.04 instance (which has the same version of mesa-utils, but older libgl1 and depedencies still reproduce the crash.

For reference, the non-crashing distro shows me:

c@Zero01:~$ apt list --installed | grep mesa

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

libegl-mesa0/jammy-updates,now 22.2.5-0ubuntu0.1~22.04.1 amd64 [installed,automatic]
libgl1-mesa-dri/jammy-updates,now 22.2.5-0ubuntu0.1~22.04.1 amd64 [installed,automatic]
libglapi-mesa/jammy-updates,now 22.2.5-0ubuntu0.1~22.04.1 amd64 [installed,automatic]
libglx-mesa0/jammy-updates,now 22.2.5-0ubuntu0.1~22.04.1 amd64 [installed,automatic]
mesa-utils-bin/jammy,now 8.4.0-1ubuntu1 amd64 [installed,automatic]
mesa-utils/jammy,now 8.4.0-1ubuntu1 amd64 [installed]
CarlosNihelton commented 1 year ago

I also tested the sample code referred in a previous comment of this issue and it also doesn't crash anymore. :tada:

Artem-B commented 1 year ago

I still see the crash on exit with glxinfo.

$ wsl --version
WSL version: 1.1.0.0
Kernel version: 5.15.83.1
WSLg version: 1.0.48
MSRDC version: 1.2.3770
Direct3D version: 1.608.2-61064218
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.19045.2486
$apt list --installed | grep mesa

libegl-mesa0/jammy-updates,now 22.2.5-0ubuntu0.1~22.04.1 amd64 [installed,automatic]
libgl1-mesa-dri/jammy-updates,now 22.2.5-0ubuntu0.1~22.04.1 amd64 [installed]
libglapi-mesa/jammy-updates,now 22.2.5-0ubuntu0.1~22.04.1 amd64 [installed]
libglu1-mesa/jammy,now 9.0.2-1 amd64 [installed,automatic]
libglx-mesa0/jammy-updates,now 22.2.5-0ubuntu0.1~22.04.1 amd64 [installed]
mesa-utils-bin/jammy,now 8.4.0-1ubuntu1 amd64 [installed,automatic]
mesa-utils/jammy,now 8.4.0-1ubuntu1 amd64 [installed]
mesa-va-drivers/jammy-updates,now 22.2.5-0ubuntu0.1~22.04.1 amd64 [installed,automatic]
mesa-vdpau-drivers/jammy-updates,now 22.2.5-0ubuntu0.1~22.04.1 amd64 [installed,automatic]
mesa-vulkan-drivers/jammy-updates,now 22.2.5-0ubuntu0.1~22.04.1 amd64 [installed]
$ nvidia-smi
Thu Feb  2 08:44:26 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.05    Driver Version: 528.24       CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
bersbersbers commented 1 year ago

likely one of the libgl dependencies carries the fix

Can confirm. sudo apt upgrade did not fix this due to held-back updates. Updating these manually (list is below) solved the issue. For anyone interested in bisecting, this is what I updated manually:

alsa-ucm-conf kbd libdrm-amdgpu1 libdrm-common libdrm-intel1 libdrm-nouveau2 libdrm-radeon1 libdrm2 libegl-mesa0 libgbm1 libgl1-mesa-dri libglapi-mesa libglx-mesa0 libxatracker2 python3-software-properties software-properties-common ubuntu-advantage-tools update-notifier-common

Artem-B commented 1 year ago

Didn't help in my case. All those packages are the latest version on my system. All packages are up to date and I still see the crashes. Could the crash be specific to the GPU, drivers, or system configuration?

One oddity that I've just noticed is that when run under GDB, glxinfo sometimes runs without crashing, which points to some sort of race condition.

Update: To make a clean experiment, I've installed a fresh instance of Ubuntu-22.04 LTS from the store, did apt update && apt upgrade, and still see random crashes on exit with glxinfo.

Also updated to the latest WSL pre-release 1.1.2 -- no change from 1.1.0 or stock WSL before that.

sarim commented 1 year ago

I already had latest mesa libs from kisak/kisak-mesa ppa, it didn't help. I purged ppa and moved to latest from ubuntu repo. Didn't help. From my guessing I strongly think the issue is in nvidia driver side. Not inside ubuntu distro.

CarlosNihelton commented 1 year ago

I already had latest mesa libs from kisak/kisak-mesa ppa, it didn't help. I purged ppa and moved to latest from ubuntu repo. Didn't help. From my guessing I strongly think the issue is in nvidia driver side. Not inside ubuntu distro.

I've been long considering it to be a host driver issue, but got surprised with the results of yesterday's tests. It might actually be a case by case version by version combination of host and guest drivers... :(

jonaskuske commented 1 year ago

which points to some sort of race condition.

Yeah, I think so too. Actually, if I keep running glxinfo -B it outputs the segfault most of the times (with varying numbers at the start), but from time to time it does actually work.

sarim commented 1 year ago

Interesting. Obviously glxinfo -B is not the mission critical app that needs to succeed, it's just a reliable easy reproduction of the actual underlying issue, which is graphical programs are very unstable in wslg. For me personally 1st mission critical app I tried was android emulator, which segfaults most of the time, not only at start but also while running (If it somehow manages to run). I've long since given up on using wslg for any mission critical (gui) app and just check glxinfo -B after each update.

dshadowwolf commented 1 year ago

One oddity that I've just noticed is that when run under GDB, glxinfo sometimes runs without crashing, which points to some sort of race condition.

As I'd noted in my own comment on this, when I single stepped through the end of a test program I was unable to reliably trigger the issue. One out of five runs walks through to confirm that it was dying, at the latest, at the closing return 0; -- or, potentially, in a background thread run by SDL2 after the SDL_Quit(); call. It does appear to be a crash in the driver -- libnvwgf2umx.so + 106 is the location of the crash in all my testing, and this is part of the wslg driver stuff, for nvidia drivers. (specifically, in my tests, /usr/lib/wsl/drivers/nvddi.inf_amd64_770ff19e856298b8/libnvwgf2umx.so is the file mentioned)

I had thought this might have been a timing issue, so I went back through and started adding delays, but even at a five second wait between each step I was unable to trigger the same behavior as seen when single-stepping through the cleanup&exit portion of the program I noticed this with. I'm going to move to a much larger terminal wait and remove the ones between each step of the cleanup to triple check things, but I do not know that this will make a difference at this point. And... yeah, even with a 10 second wait at the very end it still crashes when not in a single-step debugging run.

Artem-B commented 1 year ago

It's also possible that libnvwgf2umx.so may be just a messenger here and just happens to fail if some data has been corrupted by something else, long gone by the time crash happens. This actually makes me wonder if reproducibility flakiness may be due to ASLR....

One more observation -- on my system the crash happens considerably more often if I run glxinfo under taskset 1 limiting it to CPU 0. In that case I get 90+% failure rate. When it runs on any other core, typical failure rate is about 60-80%. With one specific CPU mask (0x101) Failures happen in only about half of the runs. And the weirdest thing about this experiment, is that windows task manager always showed CPU load on the same cores, regardless of the taskset used for running the tests. So, even though the effect of the taskset on the failure rate is reproducible for me, I have no plausible theory why/how it makes a difference.

Does anyone have any suggestions on how to get NVIDIA's attention to this bug? @hideyukn88 : Can someone from WSLg team help with that?

sarim commented 1 year ago

For what it's worth, I ran while true; do sleep 1; glxinfo -B >/dev/null && echo success; done for 4-5 hours, not a single success. So it's definitely 100%-time segfault in my machine.

dbPhilips commented 1 year ago

For what it is worth. Started having these segfaults at gl app exits (such as glxinfo) today after performing an ubuntu update. However, it only happens when selecting the Nvidia GPU. Setting either LIBGL_ALWAYS_SOFTWARE=1 or MESA_D3D12_DEFAULT_ADAPTER_NAME=Intel (my machine has an intel IGP + dedicated Quadro p1000) makes the apps exit without crash.

gdb also points to libnvwgf2umx.so causing the segfault for me.

Have tried updating the nvidia windows driver, but this does not seem to fix it.

WSL version: 1.0.3.0 Kernel version: 5.15.79.1 WSLg version: 1.0.47 MSRDC version: 1.2.3575 Direct3D version: 1.606.4 DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp Windows version: 10.0.19044.2486

dshadowwolf commented 1 year ago

Have done a lot more digging and this looks to, actually, be coming up out of the event poll loop -- quite literally the top call on that specific thread (apparently created just for event polling by LibSDL2 without me specifically doing it) is a call to the glibc poll function -- see:

libc.so.6!__GI___poll(struct pollfd * fds, nfds_t nfds, int timeout) (\build\glibc-SzIz7B\glibc-2.31\sysdeps\unix\sysv\linux\poll.c:29)
libnvwgf2umx.so![Unknown/Just-In-Time compiled code] (Unknown Source:0)
libpthread.so.0!start_thread(void * arg) (\build\glibc-SzIz7B\glibc-2.31\nptl\pthread_create.c:477)
libc.so.6!clone() (\build\glibc-SzIz7B\glibc-2.31\sysdeps\unix\sysv\linux\x86_64\clone.S:95)

So it looks like about 5 threads are started for handling various bits of DirectX interaction and in the one that interacts directly with the NVidia driver code we get the exception. I grabbed the above from a breakpoint (and thanks, MS, for the nice debugger integration in VSCode - first time I've used a GUI debugger that worked this well) and, well... that is what I've found. The other 4 extra threads (that do not interact with the actual root thread of the code) are all a lot of calls to libpthread and a single call to libd3d12core.so

Basically... This looks more and more like a race condition or conflict -- though I've been unable to replicate the "successful completion without exception after single-stepping" again.

ivanocj commented 1 year ago

This is fixed for me today with ubuntu-22.04 latest update, glxinfo no longer crashes.

Well, glxinfo looks good but what about glxgears? This is what happens to me.

image

dshadowwolf commented 1 year ago

This is fixed for me today with ubuntu-22.04 latest update, glxinfo no longer crashes.

And what about those of us who are using, say, ubuntu-20.04 because it's a pain to migrate a large $HOME to a new WSL setup and make sure all the tools and libraries that are in use have gotten installed proper?

I mean... It's not an issue if a migration is actually required, but I really would prefer to keep using the WSL install I've been on for close to 3 years now instead of having to take a half days work to migrate. But still... Some of us will stick with using LTS sets for the life of the support because we get comfortable and don't like having to take the time for the upgrade. (hell, if I wanted that kind of pain, I'd have stuck with running Linux as a daily driver back (1997 to 2017) instead of switching to Windows+Cygwin/MSYS2/WSL (depending on the era))

sarim commented 1 year ago

And what about those of us who are using, say, ubuntu-20.04 because it's a pain to migrate a large $HOME to a new WSL setup and make sure all the tools and libraries that are in use have gotten installed proper?

First, don't worry about it because it seems like it didn't actually solve it. You can create another fresh ubuntu 22.04 instance to check if wslg is really working there for your hardware.

Second, if the issue does really become solved using a newer mesa libraries, you can get those in ubuntu 20.04 too, from a ppa. Ex: https://launchpad.net/~kisak/+archive/ubuntu/kisak-mesa

ngc7331 commented 1 year ago

I think I have encountered the same problem when using SDL2.

Test code:

#include <SDL2/SDL.h>
int main() {
    SDL_Init(SDL_INIT_VIDEO);
    SDL_Quit();
    return 0;
}

Compile & run using gcc test.c -lSDL2 -g && gdb a.out gives to following result:

Thread 2 "a.out" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe6ee1640 (LWP 7593)]
0x00007fffe998ca6c in ?? () from /usr/lib/wsl/drivers/nvhmi.inf_amd64_5f5af535a7a77378/libnvwgf2umx.so
(gdb) bt
#0  0x00007fffe998ca6c in ?? () from /usr/lib/wsl/drivers/nvhmi.inf_amd64_5f5af535a7a77378/libnvwgf2umx.so
#1  0x00007fffe998b70e in ?? () from /usr/lib/wsl/drivers/nvhmi.inf_amd64_5f5af535a7a77378/libnvwgf2umx.so
#2  0x00007fffe998b686 in ?? () from /usr/lib/wsl/drivers/nvhmi.inf_amd64_5f5af535a7a77378/libnvwgf2umx.so
#3  0x00007ffff7c7cb43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#4  0x00007ffff7d0ea00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

And this works for me

Setting either LIBGL_ALWAYS_SOFTWARE=1 or MESA_D3D12_DEFAULT_ADAPTER_NAME=Intel (my machine has an intel IGP + dedicated Quadro p1000) makes the apps exit without crash.

wsl --version:

WSL version: 1.1.5.0
Kernel version: 5.15.90.1
WSLg version: 1.0.50
MSRDC version: 1.2.3770
Direct3D version: 1.608.2-61064218
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.22624.1465

apt search libsdl | grep installed:

libsdl2-2.0-0/jammy-updates,now 2.0.20+dfsg-2ubuntu1.22.04.1 amd64 [installed,automatic]
libsdl2-dev/jammy-updates,now 2.0.20+dfsg-2ubuntu1.22.04.1 amd64 [installed]
libsdl2-image-2.0-0/jammy,now 2.0.5+dfsg1-3build1 amd64 [installed,automatic]
libsdl2-image-dev/jammy,now 2.0.5+dfsg1-3build1 amd64 [installed]
nedsociety commented 1 year ago

Another observation: if we use systemd bottle, genie / distrod, the segfault occurs for more apps / occasions. For example android emulator always segfaults for me when inside systemd. But outside systemd It open once or twice over ~20 tries. As the crush is happening from libnv... .so, I wonder if its even fixable from wslg? Who maintains those nvidia linux wsl drivers?

I also encountered this problem on IntelliJ IDEA right after enabling systemd. It never did for months when systemd wasn't there.

ClaudioCimarelli commented 1 year ago

It is solved after sudo chmod 666 /dev/dri/*. I hope this helps. https://github.com/microsoft/WSL/issues/9523#issuecomment-1403671300

Update: There is still the same error inside a Docker container in WSL2.

Update 2: Docker is working following this https://github.com/microsoft/wslg/blob/main/samples/container/Containers.md but with unofficial mesa drivers update. https://github.com/microsoft/WSL/issues/7507#issuecomment-950235017

sarim commented 1 year ago

Apparently v1.3.10.0 fixed the segfault. I only tried glxinfo -B couple of times, and it ended successfully without segfault at the end. (off-topic: Though this version nukes systemd user sessions).

Other who have extensive graphics tests ready might want to run them in v1.3.10.0

nedsociety commented 1 year ago

Apparently v1.3.10.0 fixed the segfault. I only tried glxinfo -B couple of times, and it ended successfully without segfault at the end. (off-topic: Though this version nukes systemd user sessions).

Other who have extensive graphics tests ready might want to run them in v1.3.10.0

Yup, I can also confirm that installing 1.3.11.0 fixed the problem even with systemd enabled.

ivanocj commented 1 year ago

Apparently v1.3.10.0 fixed the segfault. I only tried glxinfo -B couple of times, and it ended successfully without segfault at the end. (off-topic: Though this version nukes systemd user sessions). Other who have extensive graphics tests ready might want to run them in v1.3.10.0

Yup, I can also confirm that installing 1.3.11.0 fixed the problem even with systemd enabled.

FYI, they are talking about WSL's pre-release version https://github.com/microsoft/WSL/releases

nsubordin81 commented 1 year ago

Apparently v1.3.10.0 fixed the segfault. I only tried glxinfo -B couple of times, and it ended successfully without segfault at the end. (off-topic: Though this version nukes systemd user sessions). Other who have extensive graphics tests ready might want to run them in v1.3.10.0

Yup, I can also confirm that installing 1.3.11.0 fixed the problem even with systemd enabled.

do I have to have windows insider to be able to upgrade my wslg to this version? doing wsl --upgrade and checking apps and features it looks like I still have v1.2.5.0. instructions for installation seem to stop at "get windows subsystem for linux and make sure you are on version 2."

context, I am trying to use the helloscala starter app to verify my environment is set up for for android development with scala through wsl2 following this guide https://docs.scala-lang.org/tutorials/scala-on-android.html and this segfault is the latest snag I've hit. using WSL2 with Ubuntu 22.04.2 on Windows 11 OS host.