ogra1 / zoom-snap

75 stars 13 forks source link

zoom-client closes immediately after start on Nvidia (again) #105

Open dimitry-dukhovny opened 2 years ago

dimitry-dukhovny commented 2 years ago

This appears to be the same issues as https://github.com/ogra1/zoom-snap/issues/2, but it is happening with the current edge version 5.9.1.1380.

My symptoms are identical to the original poster's.

This happens after snap refresh --edge zoom-client as well.

zoom_stdout_stderr.log

ahmogit commented 2 years ago

Fwiw: Seeing exactly the same failure shown in your logfile with zoom 5.9.1-1 from the Arch AUR, launched from the commandline. (/opt/zoom/version.txt says 5.9.1.1380, same as yours).

Tried on two nearly identical older laptops, one has the Intel i915 GPU and is using the i915 driver, the other has an Nvidia GPU and is using the nouveau driver. Identical failure behavior on both, just as shown in your logfile. So my guess is that this issue is either unrelated to Nvidia, or perhaps caused somehow by the mere presence of Nvidia libs on both machines (even though they are not being used at runtime on either machine.)

Just grasping at straws, really. I have no expertise here. So far, I know of no workaround and have no way to get zoom up and running. Pretty serious problem, yet doesn't seem to be getting reported very often. A little strange.

zelch commented 2 years ago

Same deal on Debian sid, also nVidia drivers with an RTX 2060.

zoom_stdout_stderr.log

If I had to take a guess, I'd say that snapd is not letting zoom access various chunks of the environment, specifically including stuff required for GLX. But the inability to see the pulse daemon is also interesting to me.

I'd really love to have a log file from when this was working correctly, even from a non-nVidia system.

ahmogit commented 2 years ago

@zelch Imo, the problem seems unlikely to be related to snap. In the two-laptop experiment I mentioned above, which was based on the Arch AUR package zoom-5.9.1-1-x86_64.pkg.tar.zst (which encapsulates zoom-5.9.1.1380) snap was not used, it's not even installed on either of those machines. In the failures I saw, zoom was launched directly from the commandline, i.e. just

$ zoom

That "zoom" command maps (via $PATH) to /usr/bin/zoom, which is a symlink to /opt/zoom/ZoomLauncher. On both machines, the key error events reported in zoom_stdout_stderr.log (attached below) seem to be essentially identical to both yours and @dimitry-dukhovny, i.e.

QGLXContext: Failed to create dummy context

followed by

Failed to create OpenGL context for format QSurfaceFormat(....)

and then death on signal 6 (SIGABRT).

Fwiw, I have been seeing the same behavior since 5.8.3.

Here's my logfile (mildly sanitized):

zoom_stdout_stderr.log

pisarik commented 2 years ago

I have exactly the same problem, as was mentioned by @ahmogit I have Ubuntu 20.04, NVIDIA drivers 340.108, Zoom installed from snap. I tried both versions 5.9.1 and 5.9.1 neither of them is working.

Also I can't find older versions of zoom like 5.8.3. Does anyone knows where can I find .deb packages?

zelch commented 2 years ago

Alright, this is in part related to the snap or snapd, because the following works on my system:

LD_LIBRARY_PATH=/snap/zoom-client/current/zoom:/snap/zoom-client/current/lib/:/snap/zoom-client/current/lib/x86_64-linux-gnu:/snap/zoom-client/current/usr/lib:/snap/zoom-client/current/usr/lib/x86_64-linux-gnu:/snap/zoom-client/current/usr/lib/x86_64-linux-gnu/pulseaudio PATH=/snap/zoom-client/current/bin/:/snap/zoom-client/current/sbin:/snap/zoom-client/current/usr/bin:/snap/zoom-client/current/usr/sbin:/snap/zoom-client/current/zoom:$PATH /snap/zoom-client/current/zoom/zoom

There are some interesting errors thrown on my Debian sid system, but zoom actually launches. I have not tried an actual call yet, but it looks quite promising.

Of course, this leaves some open questions on the subject of what exactly needs to be changed in the snap to make it work again.

zelch commented 2 years ago

Alright, at least on Debian based distributions, with the nvidia drivers installed using the distribution packages, I have found the problem:

~$ snap run --shell zoom-client -c 'ls /var/lib/snapd/lib/gl/ -l --color'
Testing for explicit PulseAudio choice...
...and PulseAudio has been explicitly chosen, so using it.
total 0
lrwxrwxrwx 1 root root 74 Jan 26 00:30 libcuda.so -> /var/lib/snapd/hostfs/etc/alternatives/nvidia--libcuda.so-x86_64-linux-gnu
lrwxrwxrwx 1 root root 76 Jan 26 00:30 libcuda.so.1 -> /var/lib/snapd/hostfs/etc/alternatives/nvidia--libcuda.so.1-x86_64-linux-gnu
lrwxrwxrwx 1 root root 82 Jan 26 00:30 libEGL_nvidia.so.0 -> /var/lib/snapd/hostfs/etc/alternatives/nvidia--libEGL_nvidia.so.0-x86_64-linux-gnu
lrwxrwxrwx 1 root root 88 Jan 26 00:30 libGLESv1_CM_nvidia.so.1 -> /var/lib/snapd/hostfs/etc/alternatives/nvidia--libGLESv1_CM_nvidia.so.1-x86_64-linux-gnu
lrwxrwxrwx 1 root root 85 Jan 26 00:30 libGLESv2_nvidia.so.2 -> /var/lib/snapd/hostfs/etc/alternatives/nvidia--libGLESv2_nvidia.so.2-x86_64-linux-gnu
lrwxrwxrwx 1 root root 82 Jan 26 00:30 libGLX_nvidia.so.0 -> /var/lib/snapd/hostfs/etc/alternatives/nvidia--libGLX_nvidia.so.0-x86_64-linux-gnu
lrwxrwxrwx 1 root root 77 Jan 26 00:30 libnvcuvid.so -> /var/lib/snapd/hostfs/etc/alternatives/nvidia--libnvcuvid.so-x86_64-linux-gnu
lrwxrwxrwx 1 root root 79 Jan 26 00:30 libnvcuvid.so.1 -> /var/lib/snapd/hostfs/etc/alternatives/nvidia--libnvcuvid.so.1-x86_64-linux-gnu
lrwxrwxrwx 1 root root 79 Jan 26 00:30 libnvidia-cfg.so.1 -> /var/lib/snapd/hostfs/etc/alternatives/glx--libnvidia-cfg.so.1-x86_64-linux-gnu
lrwxrwxrwx 1 root root 75 Jan 26 00:30 libnvidia-compiler.so.470.94 -> /var/lib/snapd/hostfs/usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.470.94
lrwxrwxrwx 1 root root 74 Jan 26 00:30 libnvidia-eglcore.so.470.94 -> /var/lib/snapd/hostfs/usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.470.94
lrwxrwxrwx 1 root root 30 Jan 26 00:30 libnvidia-egl-wayland.so.1 -> libnvidia-egl-wayland.so.1.1.9
lrwxrwxrwx 1 root root 77 Jan 26 00:30 libnvidia-egl-wayland.so.1.1.9 -> /var/lib/snapd/hostfs/usr/lib/x86_64-linux-gnu/libnvidia-egl-wayland.so.1.1.9
lrwxrwxrwx 1 root root 85 Jan 26 00:30 libnvidia-encode.so.1 -> /var/lib/snapd/hostfs/etc/alternatives/nvidia--libnvidia-encode.so.1-x86_64-linux-gnu
lrwxrwxrwx 1 root root 73 Jan 26 00:30 libnvidia-glcore.so.470.94 -> /var/lib/snapd/hostfs/usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.470.94
lrwxrwxrwx 1 root root 71 Jan 26 00:30 libnvidia-glsi.so.470.94 -> /var/lib/snapd/hostfs/usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.470.94
lrwxrwxrwx 1 root root 76 Jan 26 00:30 libnvidia-glvkspirv.so.470.94 -> /var/lib/snapd/hostfs/usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.470.94
lrwxrwxrwx 1 root root 81 Jan 26 00:30 libnvidia-ml.so.1 -> /var/lib/snapd/hostfs/etc/alternatives/nvidia--libnvidia-ml.so.1-x86_64-linux-gnu
lrwxrwxrwx 1 root root 85 Jan 26 00:30 libnvidia-opencl.so.1 -> /var/lib/snapd/hostfs/etc/alternatives/nvidia--libnvidia-opencl.so.1-x86_64-linux-gnu
lrwxrwxrwx 1 root root 93 Jan 26 00:30 libnvidia-ptxjitcompiler.so.1 -> /var/lib/snapd/hostfs/etc/alternatives/nvidia--libnvidia-ptxjitcompiler.so.1-x86_64-linux-gnu
lrwxrwxrwx 1 root root 73 Jan 26 00:30 libnvidia-rtcore.so.470.94 -> /var/lib/snapd/hostfs/usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.470.94
lrwxrwxrwx 1 root root 70 Jan 26 00:30 libnvidia-tls.so.470.94 -> /var/lib/snapd/hostfs/usr/lib/x86_64-linux-gnu/libnvidia-tls.so.470.94
drwxr-xr-x 2 root root 60 Jan 26 00:30 vdpau
~$ snap run --shell zoom-client -c 'ls --color -l /var/lib/snapd/hostfs/etc/alternatives/nvidia--libGLX_nvidia.so.0-x86_64-linux-gnu'
Testing for explicit PulseAudio choice...
...and PulseAudio has been explicitly chosen, so using it.
lrwxrwxrwx 1 root root 59 Jan  9 07:39 /var/lib/snapd/hostfs/etc/alternatives/nvidia--libGLX_nvidia.so.0-x86_64-linux-gnu -> /usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.0
~$ snap run --shell zoom-client -c 'ls --color -l /usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.0'
Testing for explicit PulseAudio choice...
...and PulseAudio has been explicitly chosen, so using it.
ls: cannot access '/usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.0': No such file or directory
~$ snap run --shell zoom-client -c 'ls --color -l /var/lib/snapd/hostfs/usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.0'
Testing for explicit PulseAudio choice...
...and PulseAudio has been explicitly chosen, so using it.
lrwxrwxrwx 1 root root 23 Jan  4 19:08 /var/lib/snapd/hostfs/usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.0 -> libGLX_nvidia.so.470.94
~$ snap run --shell zoom-client -c 'ls --color -l /var/lib/snapd/hostfs/usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.470.94'
Testing for explicit PulseAudio choice...
...and PulseAudio has been explicitly chosen, so using it.
-rw-r--r-- 1 root root 1289616 Dec  6 14:30 /var/lib/snapd/hostfs/usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.470.94

I'll try and file the snapd bug about needing to handle /etc/alternatives correctly in the next day or two.

zelch commented 2 years ago

Ah, it already exists: https://bugs.launchpad.net/snapd/+bug/1866855

See the workaround at the bottom, but, really, a more proper fix should be practical, grr.

pisarik commented 2 years ago

I've contacted support and they sent me an older version of Zoom (5.8.6). They also were very willing to help me to make it work with the newest version, but I didn't had time for that

ahmogit commented 2 years ago

@zelch:

Alright, this is in part related to the snap or snapd...

Not sure I see how that can be, given that exactly the same sorts of errors you're reporting occured in my experiments on systems in which neither snap nor any Nvidia GPU were even installed.

After a bit more farting about, it seems to me that a more general statement of the problem underlying these particular libGL "failed to create openGL context" errors is that the ZoomLauncher executable (as provided by upstream, and symlinked from /usr/bin/zoom) attempts -- for reasons unknown to me -- to utilize the Nvidia GL libs if they are merely present in the LD path, even if the system on which it is running is not using an Nvidia driver and/or does not even have an Nvidia GPU present.

One of the experiments I ran (mentioned in earlier post) was just this case: The machine had neither snap nor an Nvidia GPU, but did happen to have the Nvidia GL libs installed in /usr/lib/nvidia. My expectation in this situation would be that ZoomLauncher would somehow figure this out and hence effectively blacklist any GL libs in /usr/lib/nvidia. Yet that is not what is happening:

      $ ldd /opt/zoom/zoom  | grep nvidia
      libGL.so.1 => /usr/lib/nvidia/libGL.so.1 (0x00007f0e8cece000)....
      libnvidia-tls.so.340.108 => /usr/lib/nvidia/libnvidia-tls.so.340.108...
      libnvidia-glcore.so.340.108 => /usr/lib/nvidia/libnvidia-glcore.so...

Seems to me what ought to happen here is for ZoomLauncher to arrange for use of /usr/lib/libGL.so and ignore /usr/lib/nvidia entirely. Or maybe this is supposed to be magically figured out by libgldispatch? I don't know.

In any case, based on that hypothetical understanding, the following seems to be a viable and simple workaround, at least on my setup:

    LD_PRELOAD=/usr/lib/libGL.so /opt/zoom/ZoomLauncher

Not sure why it's necessary though. Seems like an upstream config bug to me, but maybe there is some other explanation. Seems to me that merely having "inappropriate" GL libraries present in the LD path shouldn't imply that those libs ought to wind up being used in preference to those in "standard" locations.

@pisarik: Maybe try this with 5.9.1 and see if it takes care of the issue for you.

pisarik commented 2 years ago

Keep you updated: the newest version 5.9.6.2225 still does not start at Ubuntu.

The older version 5.8.6 that I got from support previously now stopped to work for me as well. The microphone did not work, even though in the sound settings I saw a response of microphone to sounds. So I deleted the old version and installed version 5.9.6.2225 - and it still does not start.

zelch commented 1 year ago

Alright, so part of the problem is that snap-confine simply doesn't know how to handle nVidia drivers on a modern Ubuntu release.

This is specifically due to not knowing how to handle /etc/alternatives/ symlinks.

But that is only part of the problem, because once you fix it, it still doesn't work.

glxinfo and glxgears can be made to work at that point, but zoom is still failing to load things.

I'm still investigating this, so we'll see what I come up with.

zelch commented 1 year ago

Alright! I have a progress update, and some more information for people trying to figure out WTF is going on with it all.

There can be many causes for zoom failing with 'failed to open GL context', but in the context of nVidia and the zoom snap, it really boils down to one big thing:

Currently, the snap nVidia handling is unable to handle Debian based systems with the nVidia drivers installed via packages, because (for good reasons) the library files go through /etc/alternatives, and the snap-confile mount support for nvidia drops everything on the floor.

Fixing that was actually the easy part, it involved a fairly straight forward patch to snap-confine.

However, there are some additional related problems with performance, and with getting smart virtual backgrounds working inside the snap.

The first is that zoom will attempt to use the vaapi to accelerate video encode/decode, but that's... Broken. I have it about 90% fixed, but at the moment my fix requires the ability to set an environment variable which is getting stomped on by what looks like a builtin snapcraft extension. I'm still looking into how this can be better fixed.

The second is that zoom requires the use of mq_open in order for virtual backgrounds to work (I thought it was related to vaapi, but, no, it's mq_open), and worse, it uses mq_open with a dynamically generated path, containing a PID.

Now, there is a posix-mq interface, and we can absolutely generate a snap with both a posix-mq interface and a posix-mq slot, with the same path information, so that we can use posix mq inside the same snap.

But currently, it is not possible to use a wildcard or any other pattern bits for the posix-mq path.

Which is problematic, since the path that zoom uses is dynamically generated.

I have another patch for that.

Oh, and to make the libva pieces work, I ended up having to drag zoom-client up to core22, based on Ubuntu 22.04, because otherwise we have some significant libc symbol issues with the host libva libraries.

So I have patches to zoom-snap as well.

Hopefully next weekend or so I'll have the time and energy to start making PRs for everything, but I'm expecting that it may be a little while before enough bits are merged upstream that we can start merging changes into zoom-snap without breaking things for everyone even more.

rolandd commented 5 months ago

Hi @zelch I think I rediscovered the same issues you did (https://github.com/ogra1/zoom-snap/issues/128#issuecomment-1998260540) and I have some energy to try and merge fixes upstream, although I'm kind of a newbie around snapd stuff. But could you share the work you already did on the posix-mq dynamic path? Anything no matter how rough is fine, I'd just rather avoid duplicating your work from scratch. thanks!