mkubecek / vmware-host-modules

Patches needed to build VMware (Player and Workstation) host modules against recent kernels
GNU General Public License v2.0
2.14k stars 336 forks source link

VMware Workstation 17.5 Pro has been released #223

Closed stefantalpalaru closed 8 months ago

stefantalpalaru commented 8 months ago

https://docs.vmware.com/en/VMware-Workstation-Pro/17.5/rn/vmware-workstation-175-pro-release-notes/index.html

rakotomandimby commented 8 months ago

For information,

I am runing Archlinux, up to date, with 6.5.7 kernel I upgraded with the embeded upgrade script. I mean I did not manually uninstall then install a downloaded version.

After upgrade, I installed the vmnet of this repo branch tmp/workstation-17.0.2-k6.5 : no problem

After a reboot, launching a VM freezes the system.

I tried compiling modules with the embeded vmware-modconfig --console --install-all , and it worked like a charm.

After a reboot, launching a VM doent freeze the system anymore.

My guess, but still need to look deeper in order to confirm: This 17.5 flavour embeds a fix of several patches in the tmp/workstation-17.0.2-k6.5 branch

mkubecek commented 8 months ago

I'll look at 17.5 later today. Until then, you might try branch tmp/workstation-18.22060606 which was created for recent Tech Preview so that it may have better chance to work with 17.5.

mkubecek commented 8 months ago

Branch workstation-17.5.0 created. Compared to the tech preview, they only added the net/gso.h include (might break with some RHEL/SLE backports) and a hack for pte_offset_map() (the trivial one I hoped they wouldn't use). So I just omitted last two commits from tmp/workstation-18.22060606.

stefantalpalaru commented 8 months ago

Works for me. Thanks!

mkubecek commented 8 months ago

Well, this is interesting... Testing the original workstation-17.5.0 went well with Workstation 17.5 on stock openSUSE Leap 15.5 kernel (15.4 + backports) but when I tried to start a WinXP VM under Player 17.5 on a 6.6-rc6 kernel, I ended up with RCU related WARNs in scheduler code and mostly unusable desktop. So I tried to cherry-pick commit 91e1fa732dcf (the original workaround for pte_offset_map() issue using get_user_pages) and the same VM started and finished twice without an issue.

mkubecek commented 8 months ago

For now, I pushed the cherry-pick mentioned in previous comment as tmp/workstation-17.5.0-k6.6. I'll run few more tests with this, original workstation-17.5.0 and also unpatched 17.5.0 modules to see if VMware's hack is really broken or if it was an unrelated problem.

mkubecek commented 8 months ago

I repeated the test again and I'm afraid it's pretty consistent. Both unpatched source from 17.5.0 and branch workstation-17.5.0 at commit f29c1d7df4a2 consistently trigger a warn check for negative rcu_preempt_depth() in rcu_flavor_sched_clock_irq() introduced in 5.8-rc1

[198483.256103] WARNING: CPU: 3 PID: 11129 at kernel/rcu/tree_plugin.h:734 rcu_sched_clock_irq+0xb21/0x1110
[198483.266520] Modules linked in: vmnet(OE) parport_pc vmmon(OE) tun echainiv esp4 bluetooth ecdh_generic af_packet xt_REDIRECT xt_MASQUERADE xt_nat nft_chain_nat nf_nat deflate sm4_generic sm4_aesni_avx2_x86_64 sm4_aesni_avx_x86_64 sm4 twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common camellia_generic camellia_aesni_avx2 camellia_aesni_avx_x86_64 camellia_x86_64 ppdev parport vmw_vsock_vmci_transport serpent_avx2 vsock serpent_avx_x86_64 serpent_sse2_x86_64 xt_LOG nf_log_syslog serpent_generic vmw_vmci blowfish_generic blowfish_x86_64 blowfish_common xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 cast5_avx_x86_64 cast5_generic cast_common des_generic libdes sm3_generic sm3_avx_x86_64 sm3 cmac xcbc rmd160 af_key xfrm_algo ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_set nft_compat nf_tables ip_set_hash_ip ip_set nfnetlink 8021q garp mrp stp llc iscsi_ibft iscsi_boot_sysfs rfkill snd_seq_dummy snd_seq_oss snd_emu10k1_synth snd_emux_synth snd_seq_midi_emul snd_seq_virmidi snd_seq_midi
[198483.266574]  snd_seq_midi_event snd_seq dmi_sysfs msr hwmon_vid dm_crypt essiv authenc trusted asn1_encoder tee i2c_dev xfs amdgpu libcrc32c uvcvideo drm_exec amdxcp videobuf2_vmalloc drm_buddy uvc videobuf2_memops gpu_sched videobuf2_v4l2 drm_suballoc_helper snd_usb_audio videodev intel_rapl_msr drm_display_helper intel_rapl_common snd_usbmidi_lib videobuf2_common snd_ump drm_ttm_helper edac_mce_amd joydev mc ttm irqbypass gigabyte_wmi acpi_cpufreq pcspkr wmi_bmof k10temp i2c_piix4 cec igb rc_core video i2c_algo_bit tiny_power_button dca thermal button fuse configfs ip_tables x_tables ext4 mbcache jbd2 uas usb_storage hid_generic usbhid crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic sr_mod gf128mul cdrom sd_mod ghash_clmulni_intel xhci_pci sha512_ssse3 xhci_pci_renesas xhci_hcd ahci aesni_intel libahci crypto_simd nvme usbcore libata ccp cryptd sp5100_tco nvme_core t10_pi wmi snd_emu10k1 snd_hwdep snd_util_mem snd_ac97_codec ac97_bus snd_pcm snd_timer snd_rawmidi snd_seq_device snd
[198483.358943]  soundcore sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod scsi_common
[198483.450633] Unloaded tainted modules: vmnet(OE):1 vmmon(OE):1 [last unloaded: vmnet(OE)]
[198483.470142] CPU: 3 PID: 11129 Comm: vmware-vmx Kdump: loaded Tainted: G           OE      6.6.0-rc6-lp155.1.g8f5995d-default #1 openSUSE Tumbleweed (unreleased) f5847c587e58be4556074b1780a42d41742cfa1d
[198483.489147] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS ELITE/X570 AORUS ELITE, BIOS F38e 07/18/2023
[198483.500205] RIP: 0010:rcu_sched_clock_irq+0xb21/0x1110
[198483.506293] Code: 38 08 00 00 85 c0 0f 84 fd f5 ff ff e9 a3 fc ff ff c6 87 39 08 00 00 01 e9 ec f5 ff ff 4c 89 e7 e8 c4 a7 f3 ff e9 0e ff ff ff <0f> 0b e9 89 f5 ff ff be 03 00 00 00 e8 de e8 4a 00 e9 f8 fe ff ff
[198483.526180] RSP: 0018:ffffbbd2c0320e08 EFLAGS: 00010082
[198483.532295] RAX: 00000000ffffffd0 RBX: 0000000000000000 RCX: 000000000ae807c1
[198483.540350] RDX: 000000000000db93 RSI: ffffffffb8b9b0b6 RDI: ffff98a5a8ae0000
[198483.548403] RBP: ffff98bc5e3a8200 R08: 0000000000000000 R09: 0000000000000000
[198483.556457] R10: 0000000000000000 R11: ffffbbd2c0320ff8 R12: ffff98bc5e3aac80
[198483.564512] R13: ffffbbd2c166bba8 R14: ffff98bc5e3aac90 R15: ffff98bc5e3aa740
[198483.572572] FS:  00007f80626d1c00(0000) GS:ffff98bc5e380000(0000) knlGS:0000000000000000
[198483.581600] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[198483.588242] CR2: 00007f805dc00000 CR3: 00000001b2612000 CR4: 0000000000750ee0
[198483.596296] PKRU: 55555554
[198483.599761] Call Trace:
[198483.603114]  <IRQ>
[198483.605961]  ? rcu_sched_clock_irq+0xb21/0x1110
[198483.611370]  ? __warn+0x81/0x130
[198483.615455]  ? rcu_sched_clock_irq+0xb21/0x1110
[198483.620865]  ? report_bug+0x171/0x1a0
[198483.625391]  ? handle_bug+0x41/0x70
[198483.629738]  ? exc_invalid_op+0x17/0x70
[198483.634439]  ? asm_exc_invalid_op+0x1a/0x20
[198483.639494]  ? rcu_sched_clock_irq+0xb21/0x1110
[198483.644904]  ? srso_alias_return_thunk+0x5/0x7f
[198483.650310]  ? update_load_avg+0x7e/0x780
[198483.655187]  ? srso_alias_return_thunk+0x5/0x7f
[198483.660594]  ? srso_alias_return_thunk+0x5/0x7f
[198483.666001]  ? place_entity+0x1b/0xf0
[198483.670524]  update_process_times+0x5f/0x90
[198483.675580]  tick_sched_handle+0x21/0x60
[198483.680366]  tick_sched_timer+0x6f/0x90
[198483.685067]  ? __pfx_tick_sched_timer+0x10/0x10
[198483.690472]  __hrtimer_run_queues+0x112/0x2b0
[198483.695702]  hrtimer_interrupt+0xf8/0x230
[198483.700578]  __sysvec_apic_timer_interrupt+0x50/0x140
[198483.706426]  sysvec_apic_timer_interrupt+0x6d/0x90
[198483.712163]  </IRQ>
[198483.715097]  <TASK>
[198483.718030]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
[198483.723965] RIP: 0010:copyout+0x1b/0x30
[198483.728734] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 48 89 d0 48 89 d1 31 d2 48 01 f8 0f 92 c2 48 85 c0 78 10 48 85 d2 75 0b 0f 01 cb <f3> a4 0f 1f 00 0f 01 ca 89 c8 e9 96 1a 68 00 66 0f 1f 44 00 00 90
[198483.748623] RSP: 0018:ffffbbd2c166bc50 EFLAGS: 00040246
[198483.754735] RAX: 00007f805ddc0010 RBX: 0000000000001000 RCX: 0000000000000510
[198483.762700] RDX: 0000000000000000 RSI: ffff98a716d72af0 RDI: 00007f805ddbfb00
[198483.770821] RBP: 0000000000000000 R08: 000000000000006e R09: 0000000001635000
[198483.778874] R10: 000000000000000f R11: 0000000001635000 R12: ffffbbd2c166be10
[198483.786936] R13: 0000000000001000 R14: ffff98a716d72000 R15: 0000000000000000
[198483.794992]  _copy_to_iter+0x5e/0x500
[198483.799514]  ? srso_alias_return_thunk+0x5/0x7f
[198483.804922]  copy_page_to_iter+0x8b/0x140
[198483.809799]  filemap_read+0x1af/0x320
[198483.814325]  vfs_read+0x1b8/0x300
[198483.818498]  ksys_read+0x67/0xe0
[198483.822579]  do_syscall_64+0x5f/0x90
[198483.827013]  ? srso_alias_return_thunk+0x5/0x7f
[198483.832419]  ? do_user_addr_fault+0x21d/0x660
[198483.837650]  ? srso_alias_return_thunk+0x5/0x7f
[198483.843053]  ? exc_page_fault+0x6d/0x150
[198483.847840]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[198483.853774] RIP: 0033:0x7f80626ed754
[198483.858230] Code: 84 00 00 00 00 00 41 54 55 49 89 d4 53 48 89 f5 89 fb 48 83 ec 10 e8 09 fc ff ff 4c 89 e2 41 89 c0 48 89 ee 89 df 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 38 44 89 c7 48 89 44 24 08 e8 55 fc ff ff 48
[198483.878118] RSP: 002b:00007fff7a693260 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[198483.886614] RAX: ffffffffffffffda RBX: 000000000000004a RCX: 00007f80626ed754
[198483.894669] RDX: 0000000000553f88 RSI: 00007f805daaa010 RDI: 000000000000004a
[198483.894670] RBP: 00007f805daaa010 R08: 0000000000000000 R09: 0000000000000000
[198483.894671] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000553f88
[198483.894672] R13: 0000000000000027 R14: 00007f805daaa010 R15: 0000000000000001
[198483.894675]  </TASK>

when I start a WinXP VM under VMware Player 17.5.0.

Replacing PgtblVa2MPN() with the version using get_user_pages_unlocked() as in branch workstation-17.0.2 , the same VM boots and shuts down cleanly. Therefore I'm going to add this commit (with updated commit message) to branch workstation-17.5.0 and I recommend everyone to use it rather than unpatched 17.5.0 modules or version without this commit (even if I don't see how is the warn check related to the pte_offset_kernel() hack used by VMware).