Open JoeSalmeri opened 10 months ago
You forgot to mention what source did you use to build the modules. As I have actually seen this warning before, I suspect that it was either unpatched source from VMware or an older snapshot of workstation-17.5.0
branch without commit 4c2a103fd2d7 ("vmmon: use get_user_pages to get page PFN").
(I also reported this issue on on VMware Communities website but nobody seems to care.)
SORRY, my bad !
I stopped bothering with vmware communities a while back because as you said nobody seems to care.
Since Tumbleweed is a rolling distro, I usually try the source modules provided by VMWare first and then if they have an issue, I replace them with your modules. ( THANKS for maintain them ! )
Looking at my notes I see that when I updated to kernel 6.5.9.1 I also updated to vmware 17.5.0 at the same time and when I did that I switched to back to the vmware modules and they worked ( after I signed them ).
The latest TW release now has the new 6.6.1-1.1 kernel and that's where the problem happened, I'll pull your latest 17.5 modules and see if that resolves it and report back.
THANK YOU !
Ok, I got the latest workstation 17.5.0 modules, compiled them, and signed them.
The VM ( Win10 in case it matters ) comes up and seems to work fine ( just like before ) but instead of the errors above which occurred when I shutdown the VM and which forced me to reboot to recover, with the latest workstation 17.5.0 modules, when you shutdown the VM now it coredumps but does not hang linux or cause me to have to reboot.
Here are the systemd journal entries
Nov 19 09:21:16 vmnetBridge[5293]: RTM_NEWLINK: name:eno1 index:2 flags:0x00011043 Nov 19 09:21:16 kernel: e1000e 0000:00:19.0 eno1: entered promiscuous mode Nov 19 09:21:16 kernel: bridge-eno1: enabled promiscuous mode Nov 19 09:21:16 kernel: Lockdown: vmx-vcpu-0: /dev/mem,kmem,port is restricted; see man kernel_lockdown.7 Nov 19 09:22:40 vmnetBridge[5293]: RTM_NEWLINK: name:eno1 index:2 flags:0x00011043 Nov 19 09:22:40 kernel: e1000e 0000:00:19.0 eno1: left promiscuous mode Nov 19 09:22:40 kernel: bridge-eno1: disabled promiscuous mode Nov 19 09:22:41 plasmashell[5672]: Unexpected signal: 11. Nov 19 09:22:41 systemd[1]: Started Process Core Dump (PID 5841/UID 0). ¦¦ Subject: A start job for unit systemd-coredump@2-5841-0.service has finished successfully ¦¦ Defined-By: systemd ¦¦ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel ¦¦ ¦¦ A start job for unit systemd-coredump@2-5841-0.service has finished successfully. ¦¦ ¦¦ The job identifier is 4234. Nov 19 09:22:41 systemd-coredump[5842]: [??] Process 5840 (vmplayer) of user 1000 dumped core.
Module libcds.so without build-id.
Stack trace of thread 5840:
#0 0x00007f1d221161bd syscall (libc.so.6 + 0x1161bd)
#1 0x00007f1d1e24d723 n/a (libvmwarebase.so + 0x24d723)
#2 0x00007f1d1e24da59 n/a (libvmwarebase.so + 0x24da59)
#3 0x00007f1d1e153993 Panic_Panic (libvmwarebase.so + 0x153993)
#4 0x00007f1d1e153a2c Panic (libvmwarebase.so + 0x153a2c)
#5 0x00007f1d1e24c510 n/a (libvmwarebase.so + 0x24c510)
#6 0x00007f1d1e24d337 n/a (libvmwarebase.so + 0x24d337)
#7 0x00007f1d2203f190 __restore_rt (libc.so.6 + 0x3f190)
#8 0x00007f1d1d060fa0 _ZNK3cui3MKS22GetGuestTopologyLimitsERjS1_S1_S1_Rc (libvmwareui.so + 0x1060fa0)
#9 0x00007f1d1cf8c67d _ZN3cui19IsTopologySupportedERKNS_2VMERKSt6vectorINS_4RectESaIS4_EERbS9_ (libvmwareui.so + 0xf8c67d)
#10 0x00007f1d1cd4ef63 _ZNK3cui13FullscreenMgr18CompatibleTopologyEPKNS_2VMERKSt6vectorIjSaIjEERbS9_ (libvmwareui.so + 0xd4ef63)
#11 0x00007f1d1cd4ff88 _ZN3cui13FullscreenMgr11CanMultiMonEPNS_2VMEPSt6vectorIN3utf6stringESaIS5_EEb (libvmwareui.so + 0xd4ff88)
#12 0x00007f1d1d40cc4a _ZN3lui13FullscreenMgr11CanMultiMonEPN3cui2VMEPSt6vectorIN3utf6stringESaIS6_EEb (libvmwareui.so + 0x140cc4a)
#13 0x00007f1d2159fb1b _ZN6player6Window16UpdateAddMonitorEv (libvmplayer.so + 0x11eb1b)
#14 0x00007f1d1cfa971d _ZNK3cui10Capability8EvaluateEv (libvmwareui.so + 0xfa971d)
#15 0x00007f1d1cfa9947 _ZN3cui10Capability18OnTestDisconnectedEPv (libvmwareui.so + 0xfa9947)
#16 0x00007f1d1cd30e11 _ZN3cui16ClearConnectionsISt4listIN4sigc10connectionESaIS3_EEEEvRT_ (libvmwareui.so + 0xd30e11)
#17 0x00007f1d1cea6ef7 _ZN3cui14VMCapabilities10ConnectMKSEv (libvmwareui.so + 0xea6ef7)
#18 0x00007f1d1ce73aaa _ZN3cui2VM8UnsetMKSEPNS_3MKSE (libvmwareui.so + 0xe73aaa)
#19 0x00007f1d1d45f0dd _ZN3lui2VM8UnsetMKSEPN3cui3MKSE (libvmwareui.so + 0x145f0dd)
#20 0x00007f1d215679ed _ZN6player6Player7CloseVMEN4sigc4slotIvbRKN3cui5ErrorENS1_3nilES7_S7_S7_S7_EENS2_IvS7_S7_S7_S7_S7_S7_S7_EE (libvmplayer.so + 0xe69ed)
#21 0x00007f1d2156f4a2 _ZN4sigc8internal10slot_call2INS_18bound_mem_functor2IvN6player6PlayerENS_4slotIvbRKN3cui5ErrorENS_3nilESA_SA_SA_SA_EENS5_IvSA_SA_SA_SA_SA_SA_SA_EEEEvSB_SC_E7call_itEPNS0_8slot_repERKSB_RKSC_ (libvmplayer.so + 0xee4a2)
#22 0x00007f1d1d0f3faa _ZN3cui15LoggedSlotChain11SlotWrapperEN4sigc4slotIvbRKNS_5ErrorENS1_3nilES6_S6_S6_S6_EENS2_IvS6_S6_S6_S6_S6_S6_S6_EERKN3utf6stringENS2_IvS7_S8_S6_S6_S6_S6_S6_EE (libvmwareui.so + 0x10f3faa)
#23 0x00007f1d1d0f4b4b _ZN4sigc8internal10slot_call2INS_12bind_functorILin1ENS_18bound_mem_functor4IvN3cui15LoggedSlotChainENS_4slotIvbRKNS4_5ErrorENS_3nilESA_SA_SA_SA_EENS6_IvSA_SA_SA_SA_SA_SA_SA_EERKN3utf6stringENS6_IvSB_SC_SA_SA_SA_SA_SA_EEEESE_SH_SA_SA_SA_SA_SA_EEvSB_SC_E7call_itEPNS0_8slot_repERKSB_RKSC_ (libvmwareui.so + 0x10f4b4b)
#24 0x00007f1d1cfb34b8 _ZN3cui9SlotChain8NextSlotEj (libvmwareui.so + 0xfb34b8)
#25 0x00007f1d2156c19a _ZN4sigc8internal10slot_call0INS_19bind_return_functorIbNS_4slotIvNS_3nilES4_S4_S4_S4_S4_S4_EEEEbE7call_itEPNS0_8slot_repE (libvmplayer.so + 0xeb19a)
#26 0x00007f1d2104a93d n/a (libglibmm-2.4.so.1 + 0x4a93d)
#27 0x00007f1d21397924 n/a (libglib-2.0.so.0 + 0x5e924)
#28 0x00007f1d21394f30 n/a (libglib-2.0.so.0 + 0x5bf30)
#29 0x00007f1d21396b58 n/a (libglib-2.0.so.0 + 0x5db58)
#30 0x00007f1d2139742f g_main_loop_run (libglib-2.0.so.0 + 0x5e42f)
#31 0x00007f1d201f6c2d gtk_main (libgtk-3.so.0 + 0x1f6c2d)
#32 0x00007f1d21531fea main (libvmplayer.so + 0xb0fea)
#33 0x000055df0d42fa50 n/a (appLoader + 0x1ca50)
#34 0x000055df0d42bba0 n/a (appLoader + 0x18ba0)
#35 0x00007f1d220281b0 __libc_start_call_main (libc.so.6 + 0x281b0)
#36 0x00007f1d22028279 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x28279)
#37 0x000055df0d42c045 n/a (appLoader + 0x19045)
ELF object binary architecture: AMD x86-64
¦¦ Subject: Process 5840 (vmplayer) dumped core ¦¦ Defined-By: systemd ¦¦ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel ¦¦ Documentation: man:core(5) ¦¦ ¦¦ Process 5840 (vmplayer) crashed and dumped core. ¦¦ ¦¦ This usually indicates a programming error in the crashing program and ¦¦ should be reported to its vendor as a bug. Nov 19 09:22:41 systemd[1]: systemd-coredump@2-5841-0.service: Deactivated successfully. ¦¦ Subject: Unit succeeded ¦¦ Defined-By: systemd ¦¦ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel ¦¦ ¦¦ The unit systemd-coredump@2-5841-0.service has successfully entered the 'dead' state. Nov 19 09:22:41 plasmashell[5672]: VMware Player Error: Nov 19 09:22:41 plasmashell[5672]: VMware Player unrecoverable error: (vmplayer) Nov 19 09:22:41 plasmashell[5672]: Unexpected signal: 11. Nov 19 09:22:41 plasmashell[5672]: A log file is available in "/tmp/vmware-joe/vmware-vmplayer-5672.log". Nov 19 09:22:41 plasmashell[5672]: You can request support. Nov 19 09:22:41 plasmashell[5672]: To collect data to submit to VMware technical support, run "vm-support". Nov 19 09:22:41 plasmashell[5672]: We will respond on the basis of your support entitlement. Nov 19 09:22:41 systemd[1670]: app-vmware\x2dplayer-bb6650976f1b44908e4d7bb4a508213c.scope: Consumed 2min 21.436s CPU time. ¦¦ Subject: Resources consumed by unit runtime ¦¦ Defined-By: systemd ¦¦ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel ¦¦ ¦¦ The unit UNIT completed and consumed the indicated resources.
I also saved the "/tmp/vmware-joe/vmware-vmplayer-5672.log" file in case you want to see that too.
I was a bit afraid there might be some problem like this. Unfortunately this is a closed source application so there is no way to debug this for anyone except VMware. And VMware won't care until a "supported host operating system" with 6.6+ kernel appears. :-(
On the other hand, based on function names from the stack trace, it rather looks like a problem between GUI and your desktop environment, i.e. not really related to what the kernel modules are doing.
It sucks, that they don't consider openSUSE Tumbleweed a supported host operating system since it is now using the 6.6 kernel.
I'm curious, which function names in the stack trace makes it look like a GUI / Desktop problem ?
FWIW, it also coredumped on the 6.5.9.1 kernel at shutdown too. But when 6.6.1.1 was installed that caused Linux to slowly become unresponsive forcing a reboot.
With your patch ( THANKS ! ) I've been using 6.6.1.1 now for a few days with no issues other than the coredump at shutdown.
Might be time to reconsider moving to KVM again.
same problem for me, crashing the whole host with similar messages as above with vmware 17.5 and both the p17.0.1 and w17.0.2. modules (I replaced pte_offset_map to pte_offset_kernel to make it compile) . I had to roll all back and I now stay on Tumbleweed from August which has 6.4.11 kernel for this reason. with 6.4.11 everything still works fine, but I now have a backlog of 4013 packages to update... any advice? switch to Virtualbox?
I'm curious, which function names in the stack trace makes it look like a GUI / Desktop problem ?
Most of all, frames 10-13, in particular the FullscreenMgr
, Window
, UpdateAddMonitorEv
and CanMultiMonEPN
parts.
FWIW, it also coredumped on the 6.5.9.1 kernel at shutdown too. But when 6.6.1.1 was installed that caused Linux to slowly become unresponsive forcing a reboot.
The coredump with 6.5.9 kernel was with unpatched modules from VMware? That would suggest it's a userspace problem unrelated to these modules.
With your patch ( THANKS ! ) I've been using 6.6.1.1 now for a few days with no issues other than the coredump at shutdown.
Far from perfect but certainly better than paralyzing the whole system as soon as you start a VM.
Might be time to reconsider moving to KVM again.
That's one of the options, sure.
same problem for me, crashing the whole host with similar messages as above with vmware 17.5 and both the p17.0.1 and w17.0.2. modules
That's a bad idea, you should use branch for your VMware version, workstation-17.5.0
in your case. Also, w17.0.2
is a tag marking unpatched
module source, i.e. exactly the same as provided by VMware. If you build those, it's the same as not using this repository at all.
I replaced pte_offset_map to pte_offset_kernel to make it compile
Another bad idea, even if that's exactly what VMware decided to do - but that's what this issue is about.
but I now have a backlog of 4013 packages to update
You can always add lock for kernel packages (e.g. zypper addlock 'kernel-*'
) and update the rest.
any advice?
Try updated workstation-17.5.0
branch instead. Unless you run into the same issue as JoeSalmeri (which may not even be related), that's what I would suggest.
Hi Michal, thanks for your help! I followed your advice, I froze 'kernel-*' and the virtual box rpms (i am using virtual box and vmware for different legacy machines). Then update 4200+ packets - seems successful. Then I tried to re-install vmware 17.5.0. (to confirm the correct modules to be used). When starting vmplayer, the module creation failed at first. Then i copied /usr/lib/vmware/modules/source/vmmon.tar and vmnet.tar into your environent, replacing the sources of vmmon and vmnet. after make and make install, everythings seems to run fine, although the process complained not to use the same compiler. Anyhow - while still on kernel 6.4.11 I can bring up the vmware virtual machines. So maybe I can keep this setup for a few weeks.... until hopefully we have modules which match kernel 6.6.x....
One more question: the vmware service seems to have been installed on /etc/init.d/vmware where I need to restart the vmware service manually after reach reboot. How can I get this back to /usr/lib/systemd/system/vmware.service where it actually belongs?
Any suggestion how to investigate further? Thanks!
and installed vmware 17.5.0 with the related modules from this version
What exactly does this mean? Unless you mean current head of workstation-17.5.0
branch in this repository (i.e. commit 4c2a103fd2d7), you have to report that issue somewhere else.
Please understand that I do not work for VMware and have no special relation or contract with them. Thus I have no more influence on what they ship in their products than any other customer who paid for a single Workstation license (in other words: none). All I'm trying to is to do my best to help their users (myself included) to work around some deficiencies in their development process.
It may be interesting to investigate further why exactly the pte_kernel_offset()
hack results in these warnings, why it only appeared after switching from 6.5 to 6.6 or what makes openSUSE kernel different from other 6.6 based distribution kernels (different config options?) so that other users with 6.6 kernels are not affected (yet?). But I'm not really expert on RCU or memory management internals and the amount of time I can (and want to) devote to this work is limited so knowing that the get_user_pages
approach does not suffer from it is enough for me at the moment.
Hi Michal, sorry my bad. I used originally the vmmon and vmnet which came with vmware 17.5.0. Now I compiled yours (head of workstation-17.5.0 branch) and installed these inside my Virtual Box. What I can report is that I can start the vmware service and I can start vmplayer, but when I start a very simple vm inside vmplayer (without any operation system) I cannot even reach the BIOS of this vm. The logfile is here: vmware.log I also tried to gdb the vmmcores file, but gdb complains "file format not recognized". any suggestion how to proceed?
I'm sorry, this is a problem in a closed source userspace application, i.e. something I cannot possibly help you with.
Hi JoeSalmeri, mkubecek et al. I can report that I seem to have fixed the above issue for my system now. The problem has most likely been a BIOS update in my system. It is running on a Asus Z170A which hat a Bios version 1602 from 2016 before. I updated now to BIOS version 3802 from 15.3.2018 - seems to be the latest available for this board. After updating tumbleweed - everything but the kernel, I now released the kernel lock and updated the kernel to 6.6.2-1-default. Then, using the "vmware-host-modules-workstation-17.5.0" (make clean, make, make install, systemctl start vmware) everything seems fine now! I can start my 2 VMware virtual machines (one Linux, one Windows) without any visible issues. Good luck for Joe now!
2023_12_05-13.10
Just to confirm Linux host lockups for Fedora-39 6.6.x kernels
[root@meon:/boot]$ ltr|grep vml -rwxr-xr-x. 1 root root 14560456 Nov 8 00:00 vmlinuz-6.5.11-300.fc39.x86_64 -rwxr-xr-x. 1 root root 14540552 Nov 20 00:00 vmlinuz-6.5.12-300.fc39.x86_64 -rwxr-xr-x. 1 root root 14661960 Nov 22 00:00 vmlinuz-6.6.2-201.fc39.x86_64 -rwxr-xr-x. 1 root root 14662792 Nov 28 00:00 vmlinuz-6.6.3-200.fc39.x86_64
Currently running this kernel which works fine. [root@meon:/boot]$ uname -a Linux meon.jaa.org.uk 6.5.12-300.fc39.x86_64
Failing kernels vmlinuz-6.6.2-201.fc39.x86_64 & 6.6.3-200.fc39.x86_64
vmware runs OK but as soon an attempt is made to start a virtual machine a physical power off is required to restore control of the host machine.
Both Windows 11 and Ubuntu 22.04 clients cause the crash.
John
@ja-jaa-org-uk , I recommend to report also the problem here: https://communities.vmware.com/t5/VMware-Workstation-Pro/Ubuntu-22-04-freezes-randomly-on-VMWare-Professional-17/td-p/2942773/page/3
Hi,
Just FYI: I had this same issue, I updated bios and used these modules and its working fine.
Pop OS Kernel 6.6.6 workstation v17.5.0
Hi everybody,
same situation over here, watching rcp cpu stalls everytime I powered up a VM on with Workstation 17.5.0. Anyway, moving on to Kernel 6.6.6.-1, doing the latest BIOS-Update for my ASUS Z170 PRO GAMING and compiling the modules again, seems to solve the issue. Running now for 24 hours with out trouble
Thanks for the heads up. Beelink GTR7 Pro Ryzen 9 7940HS Very quick test on Fedora 39 kernel 6.6.7, seems OK for Ubuntu 22 & Windows 11 clients. ja@meon GitHub 2$ uname -a Linux meon.jaa.org.uk 6.6.7-200.fc39.x86_64 /global/db/sw/VMware_17/mkubeck_17.5.0 John
Experiencing the same issue, fedora 39, Windows 10 guest, kernel 6.6.9. BIOS up to date.
Fix of missing prototypes in https://github.com/mkubecek/vmware-host-modules/commit/2c6d66f3f1947384038b765c897b102ecdb18298 seemed to have solved several issues. I recommend everyone to upgrade
Updated information from when I originally reported this.
I am now running TW 20231228 and using kernel 6.6.7-1.
I just downloaded and installed the latest 17.5.0 modules with the fixes discussed above and recompiled, signed and tested out vmware.
The service starts fine, VM comes up and appears to work fine, but when doing the shutdown vmware coredumps. It does not crash linux or seem to cause any other issues, however, the resulting journal errors are different now.
Jan 24 14:04:55 Server systemd-coredump[27317]: [??] Process 27315 (vmplayer) of user 1000 dumped core.
Module libcds.so without build-id.
Stack trace of thread 27315:
#0 0x00007fe93c7161bd syscall (libc.so.6 + 0x1161bd)
#1 0x00007fe93884d723 n/a (libvmwarebase.so + 0x24d723)
#2 0x00007fe93884da59 n/a (libvmwarebase.so + 0x24da59)
#3 0x00007fe938753993 Panic_Panic (libvmwarebase.so + 0x153993)
#4 0x00007fe938753a2c Panic (libvmwarebase.so + 0x153a2c)
#5 0x00007fe93884c510 n/a (libvmwarebase.so + 0x24c510)
#6 0x00007fe93884d337 n/a (libvmwarebase.so + 0x24d337)
#7 0x00007fe93c63f190 __restore_rt (libc.so.6 + 0x3f190)
#8 0x00007fe937660fa0 _ZNK3cui3MKS22GetGuestTopologyLimitsERjS1_S1_S1_Rc (libvmwareui.so + 0x1060fa0)
#9 0x00007fe93758c67d _ZN3cui19IsTopologySupportedERKNS_2VMERKSt6vectorINS_4RectESaIS4_EERbS9_ (libvmwareui.so + 0xf8c67d)
#10 0x00007fe93734ef63 _ZNK3cui13FullscreenMgr18CompatibleTopologyEPKNS_2VMERKSt6vectorIjSaIjEERbS9_ (libvmwareui.so + 0xd4ef63)
#11 0x00007fe93734ff88 _ZN3cui13FullscreenMgr11CanMultiMonEPNS_2VMEPSt6vectorIN3utf6stringESaIS5_EEb (libvmwareui.so + 0xd4ff88)
#12 0x00007fe937a0cc4a _ZN3lui13FullscreenMgr11CanMultiMonEPN3cui2VMEPSt6vectorIN3utf6stringESaIS6_EEb (libvmwareui.so + 0x140cc4a)
#13 0x00007fe93bb9fb1b _ZN6player6Window16UpdateAddMonitorEv (libvmplayer.so + 0x11eb1b)
#14 0x00007fe9375a971d _ZNK3cui10Capability8EvaluateEv (libvmwareui.so + 0xfa971d)
#15 0x00007fe9375a9947 _ZN3cui10Capability18OnTestDisconnectedEPv (libvmwareui.so + 0xfa9947)
#16 0x00007fe937330e11 _ZN3cui16ClearConnectionsISt4listIN4sigc10connectionESaIS3_EEEEvRT_ (libvmwareui.so + 0xd30e11)
#17 0x00007fe9374a6ef7 _ZN3cui14VMCapabilities10ConnectMKSEv (libvmwareui.so + 0xea6ef7)
#18 0x00007fe937473aaa _ZN3cui2VM8UnsetMKSEPNS_3MKSE (libvmwareui.so + 0xe73aaa)
#19 0x00007fe937a5f0dd _ZN3lui2VM8UnsetMKSEPN3cui3MKSE (libvmwareui.so + 0x145f0dd)
#20 0x00007fe93bb679ed _ZN6player6Player7CloseVMEN4sigc4slotIvbRKN3cui5ErrorENS1_3nilES7_S7_S7_S7_EENS2_IvS7_S7_S7_S7_S7_S7_S7_EE (libvmplayer.so + 0xe69ed)
#21 0x00007fe93bb6f4a2 _ZN4sigc8internal10slot_call2INS_18bound_mem_functor2IvN6player6PlayerENS_4slotIvbRKN3cui5ErrorENS_3nilESA_SA_SA_SA_EENS5_IvSA_SA_SA_SA_SA_SA_SA_EEEEvSB_SC_E7call_itEPNS0_8slot_repERKSB_RKSC_ (libvmplayer.so + 0xee4a2)
#22 0x00007fe9376f3faa _ZN3cui15LoggedSlotChain11SlotWrapperEN4sigc4slotIvbRKNS_5ErrorENS1_3nilES6_S6_S6_S6_EENS2_IvS6_S6_S6_S6_S6_S6_S6_EERKN3utf6stringENS2_IvS7_S8_S6_S6_S6_S6_S6_EE (libvmwareui.so + 0x10f3faa)
#23 0x00007fe9376f4b4b _ZN4sigc8internal10slot_call2INS_12bind_functorILin1ENS_18bound_mem_functor4IvN3cui15LoggedSlotChainENS_4slotIvbRKNS4_5ErrorENS_3nilESA_SA_SA_SA_EENS6_IvSA_SA_SA_SA_SA_SA_SA_EERKN3utf6stringENS6_IvSB_SC_SA_SA_SA_SA_SA_EEEESE_SH_SA_SA_SA_SA_SA_EEvSB_SC_E7call_itEPNS0_8slot_repERKSB_RKSC_ (libvmwareui.so + 0x10f4b4b)
#24 0x00007fe9375b34b8 _ZN3cui9SlotChain8NextSlotEj (libvmwareui.so + 0xfb34b8)
#25 0x00007fe93bb6c19a _ZN4sigc8internal10slot_call0INS_19bind_return_functorIbNS_4slotIvNS_3nilES4_S4_S4_S4_S4_S4_EEEEbE7call_itEPNS0_8slot_repE (libvmplayer.so + 0xeb19a)
#26 0x00007fe93b64a93d n/a (libglibmm-2.4.so.1 + 0x4a93d)
#27 0x00007fe93b997924 n/a (libglib-2.0.so.0 + 0x5e924)
#28 0x00007fe93b994f30 n/a (libglib-2.0.so.0 + 0x5bf30)
#29 0x00007fe93b996b58 n/a (libglib-2.0.so.0 + 0x5db58)
#30 0x00007fe93b99742f g_main_loop_run (libglib-2.0.so.0 + 0x5e42f)
#31 0x00007fe93a7f6a9d gtk_main (libgtk-3.so.0 + 0x1f6a9d)
#32 0x00007fe93bb31fea main (libvmplayer.so + 0xb0fea)
#33 0x0000564a3808ba50 n/a (appLoader + 0x1ca50)
#34 0x0000564a38087ba0 n/a (appLoader + 0x18ba0)
#35 0x00007fe93c6281b0 __libc_start_call_main (libc.so.6 + 0x281b0)
#36 0x00007fe93c628279 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x28279)
#37 0x0000564a38088045 n/a (appLoader + 0x19045)
ELF object binary architecture: AMD x86-64
So to summarize:
Using the 6.6.7.1 kernel and the 17.5.0 modules compiled from what I downloaded from here back on 11/19/2023 and using the 6.6.7.1 kernel and the 17.5.0 modules compiled from what I downloaded here today produce the above coredump when the vm is shutdown.
So still get a coredump at shutdown but the coredump I now get is different with the 6.6.7.1 kernel than it was back when originally reported and using the 6.6.1-1.1 kernel.
Since the latest 17.5.0 modules seem to work with the only issue being the coredump at shutdown I will leave them installed and see if any other issues occur.
Hope that is helpful
Joe
This is a userspace application crash, I cannot really help you with that. Looks very similar to what was discussed above on Nov 19-22, as far as I can say.
Yeah, when I saw libvmwarebase.so I knew it in the closed source part but figured I'd post an update for everyone that has been part of this thread.
I'm curious, what distro and kernel do you use ?
I'm using Leap 15.5 but with a newer kernel. At the moment, it's 6.7, essentially the same as Tumbleweed (or what TW is going to get soon, I'm not sure). But I plan to test 6.8-rc1 later this evening or tomorrow morning.
So you see the same issues talked about here on Leap right ? ( since the bug is in the vmware closed source )
No, I haven't seen those yet. But I have VMware Player (17.5.0) on this machine, I only have Workstation on another which I'm using remotely most of the time.
I'm on VMWare Player 17.5.0 & kernel 6.6.9-200 (fc39). Currently vmplayer causes cpu hangs. It does actually work but other processes start to misbehave and I end up having to hard reset the host machine because it won't even shutdown cleanly afterwards.
The service starts fine, VM comes up and appears to work fine, but when doing the shutdown vmware coredumps. It does not crash linux or seem to cause any other issues...
So that would be preferable to what's happening here!
I'm on VMWare Player 17.5.0 & kernel 6.6.9-200 (fc39). Currently vmplayer causes cpu hangs. It does actually work but other processes start to misbehave and I end up having to hard reset the host machine because it won't even shutdown cleanly afterwards.
Does this happen with modules built from (up to date) source from this repository or with unpatched modules from VMware?
Does this happen with modules built from (up to date) source from this repository or with unpatched modules from VMware?
With the unpatched stock modules. I can't find any branch or tag relating to p17.5.0 here or would try it. Or did I miss something?
The modules have been exactly the same in Workstation and Player for years so starting with 17.0.0, I no longer maintain two branches with identical content. Just use the head of workstation-17.5.0
branch also for Player. (It is mentioned in the INSTALL
file.)
Just use the head of
workstation-17.5.0
branch also for Player.
That's much better. Got the core-dump on guest OS (Windows 10) shutdown but no more CPU hangs.
Many thanks for your help :smiley:
I'm on VMWare Player 17.5.0 & kernel 6.6.9-200 (fc39). Currently vmplayer causes cpu hangs. It does actually work but other processes start to misbehave and I end up having to hard reset the host machine because it won't even shutdown cleanly afterwards.
The service starts fine, VM comes up and appears to work fine, but when doing the shutdown vmware coredumps. It does not crash linux or seem to cause any other issues...
So that would be preferable to what's happening here!
Sounds like you are using vmmon/vmnet modules provided with VMWare 17.5.0 as that is the behavior I also so.
Installing the vmmon/vmnet modules from here fixed that and I only have the coredump at shutdown issue now, although, I am using kernel 6.6.7-1 right now but will be updating to a newer TW build using kernel 6.7.1-2.1 sometime shortly after the start of next month.
The modules have been exactly the same in Workstation and Player for years so starting with 17.0.0, I no longer maintain two branches with identical content. Just use the head of
workstation-17.5.0
branch also for Player. (It is mentioned in theINSTALL
file.)
i have the same issue, i did used your VMware Modules from your latest Branch (17.5.0) but after starting the guest the host kernel seems to acting weird (such as TTY doesn't work, can't shutdown,etc) here's the dmesg
@mkubecek
No, I haven't seen those yet. But I have VMware Player (17.5.0) on this machine, I only have Workstation on another which I'm using remotely most of the time.
Interesting I wonder why you are not seeing the issue on Leap 15.5 with the newer 6.7 kernel ?
I am working on a different / unrelated issue and suse support created a special 6.7.2 kernel for me to test out the other issue and since I had it installed I tried vmware with that and it also coredumps when you shutdown the vm but like with the other kernels the vm runs fine until shutdown.
Interesting I wonder why you are not seeing the issue on Leap 15.5 with the newer 6.7 kernel ?
As I said before, this rather looks like a userspace problem and Leap 15.5 userspace is quite different from Tumbleweed.
Just for the record, I've experienced the problems described here on Debian Testing with kernel 6.6.15 / motherboard ASUS Z170A BIOS 3802 / VMware Workstation 17.5 and up-to-date modules from this repo (thanks @mkubecek !). The system becomes increasingly unusable after closing a VM, eventually requiring a hard reset.
IIUC, with up-to-date modules I should just get a coredump at shutdown, but that's not what I'm experiencing.
I'll look into downgrading to kernel 6.5 for now.
Just for info Fedora 39 - 6.7.6-200.fc39.x86_64 Updated to 17.5.1 hoping that things had been fixed. Ran Windows 11 VM without mkubeck improvements - host locked up but I think only at shutdown of VM. See errors attached. Ran mkubeck for 1.17.1 and things appeared to be OK - thanks again!
Updated to 17.5.1 hoping that things had been fixed.
Unfortunately not. There was no update of the modules source, 17.5.1 has exactly the same modules as 17.5.0.
Today on my tumbleweed kernel 6.8.1 arrived which does solve this issue for me. With 6.7.x kernels running a local VM, did always break the host system on various places (e.g. firefox, sudo) and didn't let me shutdown the host completely. No matter if I used original kernel modules or the ones from this repository (on 6.8 I have to use the latter).
As the person that originally opened this issue, I thought I would post an update which others struggling with these issues may find helpful.
I have used VMWare products since around VMWare 2.0 or 3.0......so quite a long time.
I have worked as a System Administrator ( mostly Windows but also some Linux and Solaris ), a database administrator ( Oracle and MS Sql Server ) and a Network Administrator with customers all over the US for 25+ years.
Overall I was pretty happy with VMware, however, up until about 3 years ago I was always using a Windows host and Linux guests.
At that point, I decided to make the switch to Linux ( openSUSE Tumbleweed ) for those that are care.
I had grown tired of issues not getting resolved on Windows and after testing out my workflow on Linux I made the switch.
In order to make the move easier, I stuck with VMware since I could easily move my vms over to the Linux environment and the only new issue was the need to compile the kernel modules when new TW builds came out.
That worked ok, but there were always various issues, like the shutdown one discussed here as well as others over time.
Then Broadcomm acquired vmware and started making changes to how licensing worked and having past experience with another company they acquired it was concerning to me.
So I decided to spend some time looking into using KVM since the modules are part of the kernel and it was also supposed to perform better.
That was 2 months ago.
My initial tests went well AND it was pretty easy to migrate ALL my vms over.
There are tools that can help with this but instead I did the following:
1) Copy the vms *.vmdk file to the storage location for my KVM disk images 2) Create a new KVM vm and point it to the disk image
There are a few little nuances ( like arch linux didn't support secure boot and it took me a little while to figure out that was the issue ) but overall it was really that simple.
After running that way for a little while, I used qemu-img to convert the copied vmdk files to qcow2 files because other features are available when you use qcow2 files.
After getting everything up and running I decided to see how far I could push this new computer ( i7-14700k 64 GB memory ).
I started up 12 kvms ( 10 Linux different distros + 1 Windows 10 and 1 Windows 11 ) and started distro updates / windows updates on all of them at the same time.
At the same time 2 users were remote desktop into my PC doing various tasks, I was remote desktop to another server, and there was a 60 GB file transfer occurring on the network.
None noticed ANY performance degradation or even that all that was going on.
To push the system further, I installed and ran the s-tui tool in stress test mode so all 28 cores were pegged at 100%.
I was quite shocked to see that even with that load all the vms ran smoothly and continued their updates and no users noticed any performance issues.
I'll don't think I'll be going back VMWare :-)
I DEFINITELY appreciate all of @mkubecek efforts to provide these modules for Linux to address issues that vmware / broadcomm hasn't fixed yet but if you are using a Linux host, I would seriously consider looking into KVM.
NOTE: You cannot run multiple hypervisors at the same time, BUT, you can have BOTH vmware and KVM installed at the same time.
Just shutdown the vmware services and modprobe -r vmmon and vmnet modules and then you can use KVM and reverse the process if you need to switch back to vmware.
Hope this is helpful to others !
Joe
I update to TW 20231113 today which is using Kernel 6.6.1-1.1.
I am using VMWare WorkStation Player 17.5
The modules compile fine and then I sign them with my key and everything works fine.
When I bring up a VM it also works fine until I attempt to shut it down.
At that point it starts the vm shutdown but then the process hangs.
At that point the journal has the following messages:
TW becomes less responsive and I end up having to reboot.
Looking at the journal I found the following trace information after I rebooted
Nov 16 16:26:45 kernel: WARNING: CPU: 3 PID: 6026 at kernel/rcu/tree_plugin.h:734 rcu_sched_clock_irq+0xb2c/0x1120 Nov 16 16:26:45 kernel: Modules linked in: vmnet(O) vmmon(O) binfmt_misc snd_seq_dummy snd_hrtimer snd_seq af_packet nf_conntrack_netbios_ns nf_conntrack_b> Nov 16 16:26:45 kernel: irqbypass wmi_bmof rfkill i2c_i801 mxm_wmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi pcspkr i2c_smbus efi_pstore uvcvideo > Nov 16 16:26:45 kernel: CPU: 3 PID: 6026 Comm: vmware-vmx Tainted: G O 6.6.1-1-default #1 openSUSE Tumbleweed 0c6504f7d2c054731662677f280b3> Nov 16 16:26:45 kernel: Hardware name: ASUS All Series/MAXIMUS VI FORMULA, BIOS 1603 08/15/2014 Nov 16 16:26:45 kernel: RIP: 0010:rcu_sched_clock_irq+0xb2c/0x1120 Nov 16 16:26:45 kernel: Code: 38 08 00 00 85 c0 0f 84 f2 f5 ff ff e9 98 fc ff ff c6 87 39 08 00 00 01 e9 e1 f5 ff ff 4c 89 e7 e8 b9 8e f3 ff e9 0e ff ff ff> Nov 16 16:26:45 kernel: RSP: 0018:ffffc9000019ce08 EFLAGS: 00010082 Nov 16 16:26:45 kernel: RAX: 00000000ffffffc2 RBX: 0000000000000000 RCX: 0000000009e820b1 Nov 16 16:26:45 kernel: RDX: 000000000000c773 RSI: ffffffff9739b328 RDI: ffff8881bbe75180 Nov 16 16:26:45 kernel: RBP: ffff8888209a8200 R08: 0000000000000000 R09: 0000000000000000 Nov 16 16:26:45 kernel: R10: 0000000000000000 R11: ffffc9000019cff8 R12: ffff8888209aac80 Nov 16 16:26:45 kernel: R13: ffffc90000cabb98 R14: ffff8888209aac90 R15: ffff8888209aa740 Nov 16 16:26:45 kernel: FS: 00007fdb08868c00(0000) GS:ffff888820980000(0000) knlGS:0000000000000000 Nov 16 16:26:45 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 16 16:26:45 kernel: CR2: 00007fdb060e8000 CR3: 0000000184474005 CR4: 00000000001706e0 Nov 16 16:26:45 kernel: Call Trace: Nov 16 16:26:45 kernel:
Nov 16 16:26:45 kernel: ? rcu_sched_clock_irq+0xb2c/0x1120
Nov 16 16:26:45 kernel: ? warn+0x81/0x130
Nov 16 16:26:45 kernel: ? rcu_sched_clock_irq+0xb2c/0x1120
Nov 16 16:26:45 kernel: ? report_bug+0x171/0x1a0
Nov 16 16:26:45 kernel: ? handle_bug+0x3c/0x80
Nov 16 16:26:45 kernel: ? exc_invalid_op+0x17/0x70
Nov 16 16:26:45 kernel: ? asm_exc_invalid_op+0x1a/0x20
Nov 16 16:26:45 kernel: ? rcu_sched_clock_irq+0xb2c/0x1120
Nov 16 16:26:45 kernel: ? load_balance+0x2e9/0xed0
Nov 16 16:26:45 kernel: ? reweight_entity+0x273/0x280
Nov 16 16:26:45 kernel: ? update_load_avg+0x7e/0x780
Nov 16 16:26:45 kernel: update_process_times+0x5f/0x90
Nov 16 16:26:45 kernel: tick_sched_handle+0x21/0x60
Nov 16 16:26:45 kernel: tick_sched_timer+0x6f/0x90
Nov 16 16:26:45 kernel: ? __pfx_tick_sched_timer+0x10/0x10
Nov 16 16:26:45 kernel: hrtimer_run_queues+0x112/0x2b0
Nov 16 16:26:45 kernel: hrtimer_interrupt+0xf8/0x230
Nov 16 16:26:45 kernel: __sysvec_apic_timer_interrupt+0x50/0x140
Nov 16 16:26:45 kernel: sysvec_apic_timer_interrupt+0x6d/0x90
Nov 16 16:26:45 kernel:
Nov 16 16:26:45 kernel:
Nov 16 16:26:45 kernel: asm_sysvec_apic_timer_interrupt+0x1a/0x20
Nov 16 16:26:45 kernel: RIP: 0010:rep_movs_alternative+0x4a/0x70
Nov 16 16:26:45 kernel: Code: 75 f1 c3 cc cc cc cc 66 0f 1f 84 00 00 00 00 00 48 8b 06 48 89 07 48 83 c6 08 48 83 c7 08 83 e9 08 74 df 83 f9 08 73 e8 eb c9>
Nov 16 16:26:45 kernel: RSP: 0018:ffffc90000cabc48 EFLAGS: 00010206
Nov 16 16:26:45 kernel: RAX: 00007fdb060e9010 RBX: 0000000000001000 RCX: 00000000000005e0
Nov 16 16:26:45 kernel: RDX: 0000000000000000 RSI: ffff8883221dca20 RDI: 00007fdb060e8a30
Nov 16 16:26:45 kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 000000000135e000
Nov 16 16:26:45 kernel: R10: 000000000000000f R11: 000000000135e000 R12: ffffc90000cabe18
Nov 16 16:26:45 kernel: R13: 0000000000001000 R14: ffff8883221dc000 R15: 0000000000000000
Nov 16 16:26:45 kernel: copyout+0x20/0x30
Nov 16 16:26:45 kernel: _copy_to_iter+0x5e/0x4a0
Nov 16 16:26:45 kernel: copy_page_to_iter+0x8b/0x140
Nov 16 16:26:45 kernel: filemap_read+0x1af/0x320
Nov 16 16:26:45 kernel: vfs_read+0x1b8/0x300
Nov 16 16:26:45 kernel: ksys_read+0x67/0xe0
Nov 16 16:26:45 kernel: do_syscall_64+0x60/0x90
Nov 16 16:26:45 kernel: ? do_user_addr_fault+0x20f/0x660
Nov 16 16:26:45 kernel: ? exc_page_fault+0x71/0x160
Nov 16 16:26:45 kernel: entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Nov 16 16:26:45 kernel: RIP: 0033:0x7fdb0830a3bc
Nov 16 16:26:45 kernel: Code: ec 28 48 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 b7 18 f8 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 31 c0 0f 05>
Nov 16 16:26:45 kernel: RSP: 002b:00007fff1393dc10 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
Nov 16 16:26:45 kernel: RAX: ffffffffffffffda RBX: 0000000000553f88 RCX: 00007fdb0830a3bc
Nov 16 16:26:45 kernel: RDX: 0000000000553f88 RSI: 00007fdb060aa010 RDI: 000000000000004c
Nov 16 16:26:45 kernel: RBP: 000055754832d8c0 R08: 0000000000000000 R09: 0000000000000000
Nov 16 16:26:45 kernel: R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000553f88
Nov 16 16:26:45 kernel: R13: 0000000000000027 R14: 00007fdb060aa010 R15: 0000000000000001
Nov 16 16:26:45 kernel:
Nov 16 16:26:45 kernel: ---[ end trace 0000000000000000 ]---
If I boot up using Kernel 6.5.9.1 shutting down the same vm does not cause those same issues.
This journal entry also made me think it is a kernel issue
I was thinking it was the 6.6 kernel because it is supposed to include a new CPU scheduler which promises to improve performance and reduce latency and those messages, especially the RIP message sound like they might be related to that .
Anybody else using vmware 17.5 with Kernel 6.6.1-1.1 ?