Closed stefantalpalaru closed 8 months ago
For information,
I am runing Archlinux, up to date, with 6.5.7
kernel
I upgraded with the embeded upgrade script. I mean I did not manually uninstall then install a downloaded version.
After upgrade, I installed the vmnet
of this repo branch tmp/workstation-17.0.2-k6.5
: no problem
After a reboot, launching a VM freezes the system.
I tried compiling modules with the embeded vmware-modconfig --console --install-all
, and it worked like a charm.
After a reboot, launching a VM doent freeze the system anymore.
My guess, but still need to look deeper in order to confirm: This 17.5 flavour embeds a fix of several patches in the tmp/workstation-17.0.2-k6.5
branch
I'll look at 17.5 later today. Until then, you might try branch tmp/workstation-18.22060606
which was created for recent Tech Preview so that it may have better chance to work with 17.5.
Branch workstation-17.5.0
created. Compared to the tech preview, they only added the net/gso.h
include (might break with some RHEL/SLE backports) and a hack for pte_offset_map()
(the trivial one I hoped they wouldn't use). So I just omitted last two commits from tmp/workstation-18.22060606
.
Works for me. Thanks!
Well, this is interesting... Testing the original workstation-17.5.0
went well with Workstation 17.5 on stock openSUSE Leap 15.5 kernel (15.4 + backports) but when I tried to start a WinXP VM under Player 17.5 on a 6.6-rc6 kernel, I ended up with RCU related WARNs in scheduler code and mostly unusable desktop. So I tried to cherry-pick commit 91e1fa732dcf (the original workaround for pte_offset_map()
issue using get_user_pages
) and the same VM started and finished twice without an issue.
For now, I pushed the cherry-pick mentioned in previous comment as tmp/workstation-17.5.0-k6.6
. I'll run few more tests with this, original workstation-17.5.0
and also unpatched 17.5.0 modules to see if VMware's hack is really broken or if it was an unrelated problem.
I repeated the test again and I'm afraid it's pretty consistent. Both unpatched source from 17.5.0 and branch workstation-17.5.0
at commit f29c1d7df4a2 consistently trigger a warn check for negative rcu_preempt_depth()
in rcu_flavor_sched_clock_irq()
introduced in 5.8-rc1
[198483.256103] WARNING: CPU: 3 PID: 11129 at kernel/rcu/tree_plugin.h:734 rcu_sched_clock_irq+0xb21/0x1110
[198483.266520] Modules linked in: vmnet(OE) parport_pc vmmon(OE) tun echainiv esp4 bluetooth ecdh_generic af_packet xt_REDIRECT xt_MASQUERADE xt_nat nft_chain_nat nf_nat deflate sm4_generic sm4_aesni_avx2_x86_64 sm4_aesni_avx_x86_64 sm4 twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common camellia_generic camellia_aesni_avx2 camellia_aesni_avx_x86_64 camellia_x86_64 ppdev parport vmw_vsock_vmci_transport serpent_avx2 vsock serpent_avx_x86_64 serpent_sse2_x86_64 xt_LOG nf_log_syslog serpent_generic vmw_vmci blowfish_generic blowfish_x86_64 blowfish_common xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 cast5_avx_x86_64 cast5_generic cast_common des_generic libdes sm3_generic sm3_avx_x86_64 sm3 cmac xcbc rmd160 af_key xfrm_algo ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_set nft_compat nf_tables ip_set_hash_ip ip_set nfnetlink 8021q garp mrp stp llc iscsi_ibft iscsi_boot_sysfs rfkill snd_seq_dummy snd_seq_oss snd_emu10k1_synth snd_emux_synth snd_seq_midi_emul snd_seq_virmidi snd_seq_midi
[198483.266574] snd_seq_midi_event snd_seq dmi_sysfs msr hwmon_vid dm_crypt essiv authenc trusted asn1_encoder tee i2c_dev xfs amdgpu libcrc32c uvcvideo drm_exec amdxcp videobuf2_vmalloc drm_buddy uvc videobuf2_memops gpu_sched videobuf2_v4l2 drm_suballoc_helper snd_usb_audio videodev intel_rapl_msr drm_display_helper intel_rapl_common snd_usbmidi_lib videobuf2_common snd_ump drm_ttm_helper edac_mce_amd joydev mc ttm irqbypass gigabyte_wmi acpi_cpufreq pcspkr wmi_bmof k10temp i2c_piix4 cec igb rc_core video i2c_algo_bit tiny_power_button dca thermal button fuse configfs ip_tables x_tables ext4 mbcache jbd2 uas usb_storage hid_generic usbhid crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic sr_mod gf128mul cdrom sd_mod ghash_clmulni_intel xhci_pci sha512_ssse3 xhci_pci_renesas xhci_hcd ahci aesni_intel libahci crypto_simd nvme usbcore libata ccp cryptd sp5100_tco nvme_core t10_pi wmi snd_emu10k1 snd_hwdep snd_util_mem snd_ac97_codec ac97_bus snd_pcm snd_timer snd_rawmidi snd_seq_device snd
[198483.358943] soundcore sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod scsi_common
[198483.450633] Unloaded tainted modules: vmnet(OE):1 vmmon(OE):1 [last unloaded: vmnet(OE)]
[198483.470142] CPU: 3 PID: 11129 Comm: vmware-vmx Kdump: loaded Tainted: G OE 6.6.0-rc6-lp155.1.g8f5995d-default #1 openSUSE Tumbleweed (unreleased) f5847c587e58be4556074b1780a42d41742cfa1d
[198483.489147] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS ELITE/X570 AORUS ELITE, BIOS F38e 07/18/2023
[198483.500205] RIP: 0010:rcu_sched_clock_irq+0xb21/0x1110
[198483.506293] Code: 38 08 00 00 85 c0 0f 84 fd f5 ff ff e9 a3 fc ff ff c6 87 39 08 00 00 01 e9 ec f5 ff ff 4c 89 e7 e8 c4 a7 f3 ff e9 0e ff ff ff <0f> 0b e9 89 f5 ff ff be 03 00 00 00 e8 de e8 4a 00 e9 f8 fe ff ff
[198483.526180] RSP: 0018:ffffbbd2c0320e08 EFLAGS: 00010082
[198483.532295] RAX: 00000000ffffffd0 RBX: 0000000000000000 RCX: 000000000ae807c1
[198483.540350] RDX: 000000000000db93 RSI: ffffffffb8b9b0b6 RDI: ffff98a5a8ae0000
[198483.548403] RBP: ffff98bc5e3a8200 R08: 0000000000000000 R09: 0000000000000000
[198483.556457] R10: 0000000000000000 R11: ffffbbd2c0320ff8 R12: ffff98bc5e3aac80
[198483.564512] R13: ffffbbd2c166bba8 R14: ffff98bc5e3aac90 R15: ffff98bc5e3aa740
[198483.572572] FS: 00007f80626d1c00(0000) GS:ffff98bc5e380000(0000) knlGS:0000000000000000
[198483.581600] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[198483.588242] CR2: 00007f805dc00000 CR3: 00000001b2612000 CR4: 0000000000750ee0
[198483.596296] PKRU: 55555554
[198483.599761] Call Trace:
[198483.603114] <IRQ>
[198483.605961] ? rcu_sched_clock_irq+0xb21/0x1110
[198483.611370] ? __warn+0x81/0x130
[198483.615455] ? rcu_sched_clock_irq+0xb21/0x1110
[198483.620865] ? report_bug+0x171/0x1a0
[198483.625391] ? handle_bug+0x41/0x70
[198483.629738] ? exc_invalid_op+0x17/0x70
[198483.634439] ? asm_exc_invalid_op+0x1a/0x20
[198483.639494] ? rcu_sched_clock_irq+0xb21/0x1110
[198483.644904] ? srso_alias_return_thunk+0x5/0x7f
[198483.650310] ? update_load_avg+0x7e/0x780
[198483.655187] ? srso_alias_return_thunk+0x5/0x7f
[198483.660594] ? srso_alias_return_thunk+0x5/0x7f
[198483.666001] ? place_entity+0x1b/0xf0
[198483.670524] update_process_times+0x5f/0x90
[198483.675580] tick_sched_handle+0x21/0x60
[198483.680366] tick_sched_timer+0x6f/0x90
[198483.685067] ? __pfx_tick_sched_timer+0x10/0x10
[198483.690472] __hrtimer_run_queues+0x112/0x2b0
[198483.695702] hrtimer_interrupt+0xf8/0x230
[198483.700578] __sysvec_apic_timer_interrupt+0x50/0x140
[198483.706426] sysvec_apic_timer_interrupt+0x6d/0x90
[198483.712163] </IRQ>
[198483.715097] <TASK>
[198483.718030] asm_sysvec_apic_timer_interrupt+0x1a/0x20
[198483.723965] RIP: 0010:copyout+0x1b/0x30
[198483.728734] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 48 89 d0 48 89 d1 31 d2 48 01 f8 0f 92 c2 48 85 c0 78 10 48 85 d2 75 0b 0f 01 cb <f3> a4 0f 1f 00 0f 01 ca 89 c8 e9 96 1a 68 00 66 0f 1f 44 00 00 90
[198483.748623] RSP: 0018:ffffbbd2c166bc50 EFLAGS: 00040246
[198483.754735] RAX: 00007f805ddc0010 RBX: 0000000000001000 RCX: 0000000000000510
[198483.762700] RDX: 0000000000000000 RSI: ffff98a716d72af0 RDI: 00007f805ddbfb00
[198483.770821] RBP: 0000000000000000 R08: 000000000000006e R09: 0000000001635000
[198483.778874] R10: 000000000000000f R11: 0000000001635000 R12: ffffbbd2c166be10
[198483.786936] R13: 0000000000001000 R14: ffff98a716d72000 R15: 0000000000000000
[198483.794992] _copy_to_iter+0x5e/0x500
[198483.799514] ? srso_alias_return_thunk+0x5/0x7f
[198483.804922] copy_page_to_iter+0x8b/0x140
[198483.809799] filemap_read+0x1af/0x320
[198483.814325] vfs_read+0x1b8/0x300
[198483.818498] ksys_read+0x67/0xe0
[198483.822579] do_syscall_64+0x5f/0x90
[198483.827013] ? srso_alias_return_thunk+0x5/0x7f
[198483.832419] ? do_user_addr_fault+0x21d/0x660
[198483.837650] ? srso_alias_return_thunk+0x5/0x7f
[198483.843053] ? exc_page_fault+0x6d/0x150
[198483.847840] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[198483.853774] RIP: 0033:0x7f80626ed754
[198483.858230] Code: 84 00 00 00 00 00 41 54 55 49 89 d4 53 48 89 f5 89 fb 48 83 ec 10 e8 09 fc ff ff 4c 89 e2 41 89 c0 48 89 ee 89 df 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 38 44 89 c7 48 89 44 24 08 e8 55 fc ff ff 48
[198483.878118] RSP: 002b:00007fff7a693260 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[198483.886614] RAX: ffffffffffffffda RBX: 000000000000004a RCX: 00007f80626ed754
[198483.894669] RDX: 0000000000553f88 RSI: 00007f805daaa010 RDI: 000000000000004a
[198483.894670] RBP: 00007f805daaa010 R08: 0000000000000000 R09: 0000000000000000
[198483.894671] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000553f88
[198483.894672] R13: 0000000000000027 R14: 00007f805daaa010 R15: 0000000000000001
[198483.894675] </TASK>
when I start a WinXP VM under VMware Player 17.5.0.
Replacing PgtblVa2MPN()
with the version using get_user_pages_unlocked()
as in branch workstation-17.0.2
, the same VM boots and shuts down cleanly. Therefore I'm going to add this commit (with updated commit message) to branch workstation-17.5.0
and I recommend everyone to use it rather than unpatched 17.5.0 modules or version without this commit (even if I don't see how is the warn check related to the pte_offset_kernel()
hack used by VMware).
https://docs.vmware.com/en/VMware-Workstation-Pro/17.5/rn/vmware-workstation-175-pro-release-notes/index.html