vmware / open-vm-tools

Official repository of VMware open-vm-tools project
http://sourceforge.net/projects/open-vm-tools/
2.27k stars 427 forks source link

Graphical Issues in VMware Guest Machine (mostly chromium and vscode) #674

Open ocelik94 opened 1 year ago

ocelik94 commented 1 year ago

Describe the bug

Hello,

I am experiencing a similar problem like in https://github.com/vmware/open-vm-tools/issues/417

As I don't understand where the root cause of this problem is I also wrote an issue at nixos (my guest machine os): https://github.com/NixOS/nixpkgs/issues/239598

Screenshots: image image image

A workaround is:

Facts:

Reproduction steps

  1. Run NixOS Live ISO with Gnome
  2. nix-shell -p vscode - you also have this problem on their live iso

Expected behavior

No glitches, bugs, artifacts

Additional context

Host: Windows 11 (22H2 build 22621.1848)
Product: VMWare Workstation Pro 17.0.2 build-21581411
Guest: NixOs with 6.3.9 Kernel
open-vm-tools: open-vm-tools-desktop-12.2.0
polarathene commented 1 year ago

Here is some tips from my own experiences with VMware. It has great performance vs current alternatives, but I have encountered various gotchas / workarounds.

Guest Compositor

It may be worth noting your compositor / DE and if running in X11 or Wayland can differ the experience. Potentially host GPU as well?

I recall Gnome having some latency issues vs KDE Plasma recently when run in VMWare.

As for NixOS, I'm not familiar with their packaging, but open-vm-tools does not provide the systemd units, nor any reference (I think there is some in VMware docs). You may want to compare to another distro like Arch if you find that working fine.

Presently there's an issue with vmware-user-suid-wrapper being autostarted from it's .desktop file via systemd-xdg-autostart-generator which KDE Plasma at least defaults to now, Gnome doesn't (at least on Arch). That issue AFAIK wouldn't have anything to do with 3D accel issues, I'm just highlighting that implementation varies by distro packages and how DE/compositors are configured.

vmwgfx kernel driver

Make sure this is loaded early during boot.

On Arch the package provides a systemd service to start vmtoolsd, which loads the kernel module if hasn't been yet which can be racey (loads, but some functionality doesn't work until the service is restarted). For example if resizing the VM guest window is automatically adjusting the guest display resolution then this may not make much of a difference, as it's a symptom of that race condition.

I describe loading the kernel module early here, for the most part this should work:

echo vmwgfx | sudo tee /etc/modules-load.d/vmware.conf

When you update your kernel, it may detect that and include it as part of the initrd loading even earlier. This happened at least with Dracut for me.

inxi -G will also report the driver as version 2.20.0, which has been that version since mid 2021 I think? You might want to compare with your 5.4 kernel and note the version of the driver there. Could be a regression that was introduced since. You could perhaps try newer LTS kernels like 5.15 and 5.10 to see if those are also affected.

IIRC a fairly recent change in Mesa git for VMwares driver was switching to NIR, and within the past year a related change to that.

Vulkan Renderer

This feature arrived in the 16.2 series I think?

In the host folder with .vmx file for the VM, there shouold be some log files beside it such as mksSandbox-0.log. When you run the VM guest that'll get logged to and you can grep it for some insights like if the Vulkan Renderer is used:

98:2023-04-25T23:43:56.938Z In(05) mks  Vulkan Renderer: VKRDevice_Create trying DISPLAY=:0
99:2023-04-25T23:43:56.975Z In(05) mks  Vulkan Renderer: Pre-Turing Nvidia GPUs not supported at this time.  Falling back to GL renderer...
100:2023-04-25T23:43:56.975Z Wa(03) mks  Vulkan Renderer: No supported Vulkan device/driver found (See mks.vk.allowUnsupportedDevices or mks.vk.forceDevice configuration options).

If you get that, you can add mks.gl.allowBlacklistedDrivers = "TRUE" into the .vmx file for GL, I can't recall if the vulkan equivalent can be added there, on a linux host you'd add mks.vk.allowUnsupportedDevices = "TRUE" into ~/.vmware/preferences, might be something similar on Windows host somewhere?

Then the log would no longer have the log lines shown above, and the Vulkan renderer is used :+1:

However in my experience Firefox still had rendering issues, while Chromium didn't seem affected either way but I recall it having rendering issues a year ago.

Chromium - Enable Vulkan rendering backend

With Chromium there is a chrome:// flag setting that allows you to enable an experimental feature to use Vulkan for rendering, I remember that fixing rendering issues.

SVGA3D driver - Limited GL

I'm not sure about the proper name for this, but there is an ENV you can use that removes support of new versions of OpenGL / ES, which may be acceptable for some 3D Accel. This also fixed Chromium rendering issues.

Adding export SVGA_VGPU10=0 into your shell profile (eg: ~/.bashrc) should change OpenGL ES 3.0 to 2.0, and OpenGL 4.1 to 2.1. You can verify this with glxinfo | grep -i opengl.

Not perfect

Despite these tips, you can still encounter issues. I did with KwinFT on Arch with X11, but this was also an issue with QEMU + KVM which has it's own 3D accel, so probably not VMware specific there.

zackr commented 1 year ago

Is Arch running on the same host? If so can I see output of "journalctl -b" and "glxinfo -B" from both guests and vmware.log for both?

ocelik94 commented 1 year ago

Arch:

name of display: :0
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: VMware, Inc. (0x15ad)
    Device: SVGA3D; build: RELEASE;  LLVM; (0x405)
    Version: 23.1.2
    Accelerated: no
    Video memory: 1MB
    Unified memory: no
    Preferred profile: core (0x1)
    Max core profile version: 4.3
    Max compat profile version: 4.3
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.1
OpenGL vendor string: VMware, Inc.
OpenGL renderer string: SVGA3D; build: RELEASE;  LLVM;
OpenGL core profile version string: 4.3 (Core Profile) Mesa 23.1.2
OpenGL core profile shading language version string: 4.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL version string: 4.3 (Compatibility Profile) Mesa 23.1.2
OpenGL shading language version string: 4.30
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile
OpenGL ES profile version string: OpenGL ES 3.1 Mesa 23.1.2
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10

NixOS

> glxinfo -B
name of display: :0
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: VMware, Inc. (0x15ad)
    Device: SVGA3D; build: RELEASE;  LLVM; (0x405)
    Version: 23.1.2
    Accelerated: no
    Video memory: 1MB
    Unified memory: no
    Preferred profile: core (0x1)
    Max core profile version: 4.3
    Max compat profile version: 4.3
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.1
OpenGL vendor string: VMware, Inc.
OpenGL renderer string: SVGA3D; build: RELEASE;  LLVM;
OpenGL core profile version string: 4.3 (Core Profile) Mesa 23.1.2
OpenGL core profile shading language version string: 4.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL version string: 4.3 (Compatibility Profile) Mesa 23.1.2
OpenGL shading language version string: 4.30
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile
OpenGL ES profile version string: OpenGL ES 3.1 Mesa 23.1.2
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10

The output seems to be identical

I will post the journalctl -b logs when I am at home

ocelik94 commented 1 year ago

I tried everything you listed there. Sadly it did not help. Still thanks!

I am having this issue on all 3 hosts of me. Following hardware do they have: A: i9 12900k & nvidia rtx 2080 ti

B: i7 1260p & nvidia t550

c: AMD Ryzen 5800X3D & nvidia rtx 4070 ti

All of them have the same problem in nixos but work fine in arch.

polarathene commented 1 year ago

I tried everything you listed there. Sadly it did not help. Still thanks!

That seems odd. I am definitely familiar with the Chrome browser artifacts, and the ENV or chrome:// flag (provided vulkan is working) workarounds resolved that issue.

All of them have the same problem in nixos but work fine in arch.

You could try copying over the arch kernel and booting that on nixos :man_shrugging:

ocelik94 commented 1 year ago

Can you try disable the nvidia GPUs on systems that have an iGPU to fallback to? (should be doable via BIOS) Would be good to confirm if it's related to the host GPU drivers since all systems are using nvidia.

Using only iGPU also resolves the problem. So it must be related to nvidia.

Did you check glxinfo for the ENV workaround to ensure it rolled back the OpenGL / ES versions?

it did

On your 5.4 kernel, what version is the vmwgfx kernel module?

2.15.0

for Kernel 6+ it is 2.20.0

Did you compare Gnome vs KDE on nixos (different startup process like XDG autostart .desktop files, compositor may play a role too

Didnt test kde but in Gnome I am facing the same problem with everything using webbased (webgl?) like vscode or browser

I will test using highter LTS Kernel Versions up to 5.15 and will give you feedback.

Vulkan is already activated on my side in Brave-Browser and I couldnt see any logs like yours in my vmware logs.

I will also test wayland and compare the open-vm-tools package

ocelik94 commented 1 year ago

I will test using highter LTS Kernel Versions up to 5.15 and will give you feedback.

I am facing this issue with 5.10, too

ocelik94 commented 1 year ago

I have following logs in journalctl and I can find some other similiar graphical issues in other repositories/forums:

Jun 30 12:03:22 w0rk xsession[15074]: [15111:15111:0630/120322.800208:ERROR:gbm_wrapper.cc(258)] Failed to export buffer to dma_buf: No such file or directory (2)
Jun 30 12:04:23 w0rk rofi[16460]: g_string_insert_len: assertion 'len == 0 || val != NULL' failed
Jun 30 12:04:35 w0rk picom[1801]: [ 06/30/2023 12:04:35.635 x_log_error WARN ] X error 3 WINDOW request 2 minor 0 serial 69339
Jun 30 12:05:35 w0rk xsession[15074]: Warning: remove_all_non_valid_override_layers: Failed to get executable path and name
Jun 30 12:05:35 w0rk xsession[15074]: Warning: vkCreateInstance: Found no drivers!
Jun 30 12:05:35 w0rk xsession[15074]: Error: vkCreateInstance failed with VK_ERROR_INCOMPATIBLE_DRIVER
Jun 30 12:05:35 w0rk xsession[15074]:     at CheckVkSuccessImpl (../../third_party/dawn/src/dawn/native/vulkan/VulkanError.cpp:88)
Jun 30 12:05:35 w0rk xsession[15074]:     at CreateVkInstance (../../third_party/dawn/src/dawn/native/vulkan/BackendVk.cpp:416)
Jun 30 12:05:35 w0rk xsession[15074]:     at Initialize (../../third_party/dawn/src/dawn/native/vulkan/BackendVk.cpp:302)
Jun 30 12:05:35 w0rk xsession[15074]:     at Create (../../third_party/dawn/src/dawn/native/vulkan/BackendVk.cpp:232)
Jun 30 12:05:35 w0rk xsession[15074]:     at operator() (../../third_party/dawn/src/dawn/native/vulkan/BackendVk.cpp:492)
Jun 30 12:05:35 w0rk xsession[15074]: Error: eglChooseConfig returned zero configs
Jun 30 12:05:35 w0rk xsession[15074]:     at Create (../../third_party/dawn/src/dawn/native/opengl/ContextEGL.cpp:53)
Jun 30 12:05:35 w0rk xsession[15074]: Error: EGL_EXT_create_context_robustness must be supported
Jun 30 12:05:35 w0rk xsession[15074]:     at Create (../../third_party/dawn/src/dawn/native/opengl/ContextEGL.cpp:67)

edit:

SVGA3D driver - Limited GL I'm not sure about the proper name for this, but there is an ENV you can use that removes support of new versions of OpenGL / ES, which may be acceptable for some 3D Accel. This also fixed Chromium rendering issues. Adding export SVGA_VGPU10=0 into your shell profile (eg: ~/.bashrc) should change OpenGL ES 3.0 to 2.0, and OpenGL 4.1 to 2.1. You can verify this with glxinfo | grep -i opengl.

this works too - I made a failure on my first try as I am new to nixos :D

zackr commented 1 year ago

In order for us to be able to diagnose any graphical glitches you still need to provide the logs I mentioned before. We need the "journalctl -b" from both nixos and arch and vmware.log from both. You can as well attach mksSandbox.log from both so we don't have to ask for that later.

polarathene commented 1 year ago

I have following logs in journalctl

This appears to be using Wayland (GBM) and a different compositor than Gnome or KDE? (picom)

Since it's logged from the VM guest, I assume it's an issue with Vulkan drivers there. Can you confirm what kernel these logs are from? I'm not sure what the older vmwgfx support for Vulkan was like, there's a command to get vulkan info similar to glxinfo I think which might help with that.

Remember there is the two host side settings (different config files IIRC) for better allowing Vulkan/OpenGL support if there is any blacklisting by VMWare, which was the case for my Nvidia GTX 1070.


this works too - I made a failure on my first try as I am new to nixos :D

You may want to go over my other suggestions too then if there's anything different about how nixos handles it vs other distros. Or identify another distro that has the same issues without nixos more different approach, may make it easier to troubleshoot first, then figure out how to do equivalent on nixos.

Make sure vmwgfx is properly loaded early, perhaps include the kernel module within initrd, check your journalctl logs to ensure it's loaded early on. Probably want it handled before the compositor begins.

If you resolve the vulkan issue then the chrome:// flag for vulkan renderer enabling should also work well (might be possible to enable for electron apps via CLI arg).

Otherwise that ENV works well if you don't need newer OpenGL / ES, I think Wayland / GBM might want newer than OpenGL ES 2.0 I'm not sure, compositors might expect newer OpenGL than 2.1 for X11 too (but there's presumably some bugs with vmwgfx support, at least when the host GPU is nvidia).

Using only iGPU also resolves the problem. So it must be related to nvidia.

Great. If you can try with VMWare Player/Workstation before 16.2 that might help. Or if one of the VMWare devs can advise on disabling the Vulkan rendering backend, it may also be an option.


Didnt test kde but in Gnome I am facing the same problem with everything using webbased (webgl?) like vscode or browser

I am familiar with some other electron apps not working well. They would even fail to start unless launched via CLI sometimes, requiring sandbox to be disabled (unless using the ENV workaround IIRC). I think it was something to do with initializing GPU backend.

I had some similar problems on my host (Linux, Arch) without VMWare on the nvidia GPU when the third-party va-api support for video accel broke from a kernel update (IBT enabled by default), required booting with ibt=off until nvidia updated their drivers to support ibt=on. However, it doesn't sound like your Windows host makes much of a difference to my linux host, so it's not likely an issue similar to that one.

All I know is Arch Linux definitely had the Chrome rendering problem in the past at least (I can't verify any time soon, but my last recall was Chrome no longer had the issues, but Firefox does). Could you also check with Firefox? That might provide a common graphics glitch with vmwgfx between both Arch and NixOS.


I am facing this issue with 5.10, too

Was the vmwgfx version newer than the 5.4 kernel? It was released Dec 2020, so probably older than 2.20.0, but maybe newer than 2.15.0? Might help the VMWare devs pinpoint when a regression in the driver happened.

ocelik94 commented 1 year ago

In order for us to be able to diagnose any graphical glitches you still need to provide the logs I mentioned before. We need the "journalctl -b" from both nixos and arch and vmware.log from both. You can as well attach mksSandbox.log from both so we don't have to ask for that later.

Oh yeah, forgot them. I added them now :) NixOs.log ArchLinux.log

ocelik94 commented 1 year ago

Was the vmwgfx version newer than the 5.4 kernel? It was released Dec 2020, so probably older than 2.20.0, but maybe newer than 2.15.0? Might help the VMWare devs pinpoint when a regression in the driver happened.

Yeah, in 5.10 it is using 2.18.0

drolevar commented 12 months ago

Just for the reference, that's how the corresponding part of glxinfo output looks in WSL 2 on Windows 10:

Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Microsoft Corporation (0xffffffff)
    Device: D3D12 (Intel(R) UHD Graphics 630) (0xffffffff)
    Version: 23.0.4
    Accelerated: yes
    Video memory: 32744MB
    Unified memory: yes
    Preferred profile: core (0x1)
    Max core profile version: 4.1
    Max compat profile version: 4.1
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.0

Here accelerated is set to yes and memory size is correctly detected.

Acters commented 7 months ago

same issue on Windows Host using NVidia gpu and DX12. This isnt isolated to Linux hosts. Its very much happening to Linux Guests on this driver. This mostly happens to any xWayland application. Native Wayland applications work fine for the most part.

liangqi commented 5 months ago

It is perhaps a bug in Xwayland in mutter(GNOME window manager), you could have a try with KDE Plasma 6.

See also https://gitlab.gnome.org/GNOME/mutter/-/issues/3383 .

Acters commented 2 months ago

It is perhaps a bug in Xwayland in mutter(GNOME window manager), you could have a try with KDE Plasma 6.

See also https://gitlab.gnome.org/GNOME/mutter/-/issues/3383 .

I should have mentioned that I was trying out various distros on wayland only; X11 was working fine for me. which the issue you linked is about XOrg's X11? KDE Plasma 5.x on wayland was displaying these issues. However, Gnome 46 does not and KDE Plasma 6.1 seem to have less issues now for me.

idk, it is a little of column A and little of column B.

I've noticed on a GNOME 46 Wayland session with 3D accel that the VMware host app crashes.

VMware Workstation unrecoverable error: (mks)

ISBRendererComm: Lost connection to mksSandbox

If I turn off 3D acceleration, then the gnome-shell in the guest crashes instead. I wonder if this is what causes mks to crash. however, I don't know why this is.

I reinstalled the gnome setup and now it doesn't crash. I am guessing there was something wrong and now it is stable. sometimes the solution is to retry. Would be nice to see a more helpful hint on why it crashed.