vmware / open-vm-tools

Official repository of VMware open-vm-tools project
http://sourceforge.net/projects/open-vm-tools/
2.26k stars 427 forks source link

Linux guest unable to hibernate #686

Open polarathene opened 1 year ago

polarathene commented 1 year ago

Describe the bug

On a Linux VM guest, hibernating fails to complete, leaving the VM guest stuck until powering off via VMware Workstation Player UI. Upon reboot, resume from hibernation is not successful.

Normally I do not need this functionality, as the VMware Workstation "Suspend" works well. When I did need to hibernate, I was surprised that it does not appear functional.

Reproduction steps

  1. Install a linux guest (CachyOS was used for report).
  2. Run systemctl hibernate to hibernate the system.
  3. Failure occurs.

Expected behavior

Successfully hibernate via OS.

Additional context

vmx file ``` .encoding = "windows-1252" config.version = "8" virtualHW.version = "20" pciBridge0.present = "TRUE" pciBridge4.present = "TRUE" pciBridge4.virtualDev = "pcieRootPort" pciBridge4.functions = "8" pciBridge5.present = "TRUE" pciBridge5.virtualDev = "pcieRootPort" pciBridge5.functions = "8" pciBridge6.present = "TRUE" pciBridge6.virtualDev = "pcieRootPort" pciBridge6.functions = "8" pciBridge7.present = "TRUE" pciBridge7.virtualDev = "pcieRootPort" pciBridge7.functions = "8" vmci0.present = "TRUE" hpet0.present = "TRUE" nvram = "linux_throwaway.nvram" virtualHW.productCompatibility = "hosted" gui.exitOnCLIHLT = "FALSE" powerType.powerOff = "soft" powerType.powerOn = "soft" powerType.suspend = "soft" powerType.reset = "soft" displayName = "linux_throwaway" guestOS = "other5xlinux-64" vhv.enable = "TRUE" vpmc.enable = "TRUE" tools.syncTime = "FALSE" sound.autoDetect = "TRUE" sound.fileName = "-1" sound.present = "TRUE" numvcpus = "12" cpuid.coresPerSocket = "3" vcpu.hotadd = "TRUE" memsize = "4096" mem.hotadd = "TRUE" scsi0.virtualDev = "lsilogic" scsi0.present = "TRUE" scsi0:0.fileName = "linux_throwaway.vmdk" scsi0:0.present = "TRUE" ide1:0.deviceType = "cdrom-image" ide1:0.fileName = "D:\iso\cachyos-kde-linux-230709.iso" ide1:0.present = "TRUE" usb.present = "TRUE" ehci.present = "TRUE" svga.graphicsMemoryKB = "2097152" ethernet0.connectionType = "nat" ethernet0.addressType = "generated" ethernet0.virtualDev = "e1000" ethernet0.present = "TRUE" extendedConfigFile = "linux_throwaway.vmxf" floppy0.present = "FALSE" firmware = "efi" vmxstats.filename = "linux_throwaway.scoreboard" uuid.bios = "56 4d 35 ee 8a 05 85 bf-ce 57 43 17 6a 1b 0c 74" uuid.location = "56 4d 35 ee 8a 05 85 bf-ce 57 43 17 6a 1b 0c 74" pciBridge0.pciSlotNumber = "17" pciBridge4.pciSlotNumber = "21" pciBridge5.pciSlotNumber = "22" pciBridge6.pciSlotNumber = "23" pciBridge7.pciSlotNumber = "24" scsi0.pciSlotNumber = "16" usb.pciSlotNumber = "32" ethernet0.pciSlotNumber = "33" sound.pciSlotNumber = "34" ehci.pciSlotNumber = "35" usb_xhci.pciSlotNumber = "160" scsi0:0.redo = "" svga.vramSize = "268435456" vmotion.checkpointFBSize = "4194304" vmotion.checkpointSVGAPrimarySize = "268435456" vmotion.svga.mobMaxSize = "1073741824" vmotion.svga.graphicsMemoryKB = "2097152" vmotion.svga.supports3D = "1" vmotion.svga.baseCapsLevel = "9" vmotion.svga.maxPointSize = "1" vmotion.svga.maxTextureSize = "16384" vmotion.svga.maxVolumeExtent = "2048" vmotion.svga.maxTextureAnisotropy = "16" vmotion.svga.lineStipple = "0" vmotion.svga.dxMaxConstantBuffers = "15" vmotion.svga.dxProvokingVertex = "0" vmotion.svga.sm41 = "1" vmotion.svga.multisample2x = "1" vmotion.svga.multisample4x = "1" vmotion.svga.msFullQuality = "1" vmotion.svga.logicOps = "1" vmotion.svga.bc67 = "9" vmotion.svga.sm5 = "1" vmotion.svga.multisample8x = "1" vmotion.svga.logicBlendOps = "0" vmotion.svga.maxForcedSampleCount = "16" vmotion.svga.gl43 = "1" ethernet0.generatedAddress = "00:0c:29:1b:0c:74" ethernet0.generatedAddressOffset = "0" vmci0.id = "1780157556" monitor.phys_bits_used = "45" cleanShutdown = "TRUE" softPowerOff = "TRUE" ide1:0.startConnected = "FALSE" usb.generic.allowHID = "TRUE" svga.guestBackedPrimaryAware = "TRUE" guestInfo.detailed.data = "architecture='X86' bitness='64' buildNumber='rolling' distroName='CachyOS Linux' familyName='Linux' kernelVersion='6.4.8-2-cachyos' prettyName='CachyOS'" tools.remindInstall = "FALSE" gui.lastPoweredViewMode = "fullscreen" scsi0:1.fileName = "data.vmdk" scsi0:1.present = "TRUE" scsi0:1.redo = "" checkpoint.vmState = "" usb_xhci:6.speed = "2" usb_xhci:6.present = "TRUE" usb_xhci:6.deviceType = "hub" usb_xhci:6.port = "6" usb_xhci:6.parent = "-1" usb_xhci:7.speed = "4" usb_xhci:7.present = "TRUE" usb_xhci:7.deviceType = "hub" usb_xhci:7.port = "7" usb_xhci:7.parent = "-1" scsi0:2.fileName = "linux_throwaway-swap.vmdk" scsi0:2.present = "TRUE" scsi0:2.redo = "" usb:1.speed = "2" usb:1.present = "TRUE" usb:1.deviceType = "hub" usb:1.port = "1" usb:1.parent = "-1" mks.enable3d = "TRUE" usb_xhci.present = "TRUE" usb_xhci:4.present = "TRUE" usb_xhci:4.deviceType = "hid" usb_xhci:4.port = "4" usb_xhci:4.parent = "-1" ```

vmware.log just prior to running systemctl hibernate:

Click to expand ``` 2023-08-31T06:24:05.002Z In(05) vmx GuestInfo: HostinfoDetailedDataHeader version: 1 2023-08-31T06:24:05.002Z No(00) vmx ConfigDB: Setting guestInfo.detailed.data = 2023-08-31T06:24:05.051Z In(05) vcpu-10 TOOLS call to unity.show.taskbar failed. 2023-08-31T06:26:19.028Z In(05) vcpu-9 Guest: Mesa: SVGA3D; build: RELEASE; LLVM; 2023-08-31T06:26:19.028Z In(05) vcpu-9 Guest: Mesa: 23.1.5 2023-08-31T06:26:25.451Z In(05) vmx GuestRpcSendTimedOut: message to toolbox timed out. 2023-08-31T06:26:25.451Z In(05) vmx Tools: [AppStatus] Last heartbeat value 171 (last received 5s ago) 2023-08-31T06:26:25.451Z In(05) vmx TOOLS: appName=toolbox, oldStatus=1, status=2, guestInitiated=0. 2023-08-31T06:26:27.183Z In(05) svga SWBScreen: Screen 1 Destroyed: xywh(0, 0, 1280, 800) flags=0x2 2023-08-31T06:26:27.183Z In(05) vmx GuestRpc: Got error for channel 0 connection 7: Remote disconnected 2023-08-31T06:26:27.183Z In(05) vmx GuestRpc: GuestRpcResetVsockChannel: channel 0 2023-08-31T06:26:27.183Z In(05) vmx GuestRpc: Closing channel 0 connection 7 2023-08-31T06:26:27.183Z In(05) vmx GuestRpc: Got error for channel 1 connection 8: Remote disconnected 2023-08-31T06:26:27.183Z In(05) vmx GuestRpc: GuestRpcResetVsockChannel: channel 1 2023-08-31T06:26:27.183Z In(05) vmx GuestRpc: Closing channel 1 connection 8 2023-08-31T06:26:27.195Z In(05) vcpu-11 SCSI0: RESET BUS 2023-08-31T06:26:30.606Z In(05) vmx GuestRpcSendTimedOut: message to toolbox-dnd timed out. 2023-08-31T06:26:30.606Z In(05) vmx TOOLS: appName=toolbox-dnd, oldStatus=1, status=2, guestInitiated=0. 2023-08-31T06:26:41.201Z In(05) vcpu-0 Tools: Tools heartbeat timeout. 2023-08-31T06:26:41.201Z In(05) vcpu-0 Tools: Running status rpc handler: 1 => 0. 2023-08-31T06:26:41.201Z In(05) vcpu-0 Tools: Changing running status: 1 => 0. 2023-08-31T06:26:41.201Z In(05) vcpu-0 Tools: [RunningStatus] Last heartbeat value 171 (last received 20s ago) 2023-08-31T06:26:45.452Z In(05) vmx GuestRpcSendTimedOut: message to toolbox timed out. 2023-08-31T06:26:45.452Z In(05) vmx GuestRpc: app toolbox's second ping timeout; assuming app is down 2023-08-31T06:26:45.452Z In(05) vmx Tools: [AppStatus] Last heartbeat value 171 (last received 25s ago) 2023-08-31T06:26:45.452Z In(05) vmx TOOLS: appName=toolbox, oldStatus=2, status=0, guestInitiated=0. 2023-08-31T06:26:45.457Z In(05) vmx GuestRpc: Reinitializing Channel 0(toolbox) 2023-08-31T06:26:45.457Z In(05) vmx GuestMsg: Channel 0, Cannot unpost because the previous post is already completed 2023-08-31T06:26:45.457Z In(05) vmx Tools: [AppStatus] Last heartbeat value 171 (last received 25s ago) 2023-08-31T06:26:45.457Z In(05) vmx TOOLS: appName=toolbox, oldStatus=0, status=0, guestInitiated=0. 2023-08-31T06:26:50.608Z In(05) vmx GuestRpcSendTimedOut: message to toolbox-dnd timed out. 2023-08-31T06:26:50.608Z In(05) vmx GuestRpc: app toolbox-dnd's second ping timeout; assuming app is down 2023-08-31T06:26:50.608Z In(05) vmx TOOLS: appName=toolbox-dnd, oldStatus=2, status=0, guestInitiated=0. 2023-08-31T06:26:50.608Z In(05) vmx GuestRpc: Reinitializing Channel 1(toolbox-dnd) 2023-08-31T06:26:50.608Z In(05) vmx GuestMsg: Channel 1, Cannot unpost because the previous post is already completed 2023-08-31T06:26:50.608Z In(05) vmx TOOLS: appName=toolbox-dnd, oldStatus=0, status=0, guestInitiated=0. 2023-08-31T06:29:45.459Z In(05) vmx GuestRpcSendTimedOut: message to toolbox timed out. 2023-08-31T06:29:45.459Z In(05) vmx Vix: [guestCommands.c:1945]: Error VIX_E_TOOLS_NOT_RUNNING in VMAutomationTranslateGuestRpcError(): VMware Tools are not running in the guest 2023-08-31T06:30:30.393Z In(05) vmx GuestRpcSendTimedOut: message to toolbox-dnd timed out. ```

When 3D accel feature is not enabled. systemctl hibernate instead fails differently:

vmware.log:

Click to expand ``` 2023-08-31T06:06:14.698Z In(05) vmx GuestRpc: Got error for channel 0 connection 6: Remote disconnected 2023-08-31T06:06:14.698Z In(05) vmx GuestRpc: GuestRpcResetVsockChannel: channel 0 2023-08-31T06:06:14.698Z In(05) vmx GuestRpc: Closing channel 0 connection 6 2023-08-31T06:06:14.698Z In(05) vmx GuestRpc: Got error for channel 1 connection 7: Remote disconnected 2023-08-31T06:06:14.698Z In(05) vmx GuestRpc: GuestRpcResetVsockChannel: channel 1 2023-08-31T06:06:14.698Z In(05) vmx GuestRpc: Closing channel 1 connection 7 2023-08-31T06:06:14.699Z In(05) svga SWBScreen: Screen 1 Destroyed: xywh(0, 0, 1280, 800) flags=0x2 2023-08-31T06:06:14.700Z In(05) svga SVGA-ScreenMgr: Screen type changed to RegisterMode 2023-08-31T06:06:14.700Z In(05) svga SWBScreen: Screen 1 Defined: xywh(0, 0, 1024, 768) flags=0x2 2023-08-31T06:06:14.707Z In(05) svga SWBScreen: Screen 1 Destroyed: xywh(0, 0, 1024, 768) flags=0x2 2023-08-31T06:06:14.708Z In(05) svga SVGA-ScreenMgr: Screen type changed to ScreenTarget 2023-08-31T06:06:14.708Z In(05) svga SWBScreen: Screen 1 Defined: xywh(0, 0, 1280, 800) flags=0x2 2023-08-31T06:06:15.234Z In(05) vcpu-11 VMMouse: Disabling VMMouse mode 2023-08-31T06:06:15.234Z In(05) vcpu-11 VMMouse: CMD Read ID 2023-08-31T06:06:15.236Z In(05) vcpu-10 SOUND 235.267058 ES1371Win32GetMixerCtl mixerGetLineInfo error 1024 2023-08-31T06:06:15.951Z In(05) vcpu-1 VMCI QueuePair: VMCIQueuePairDetach failed --- handle 6a1b0c74:400 not found. 2023-08-31T06:06:15.952Z In(05) vmx GuestRpc: Got RPCI vsocket connection 8, assigned to channel 0. ```

This variation allows for some additional log info:

journalctl -b | grep -i hibernation:

Click to expand ``` Aug 31 18:01:29 vm-cachyos kernel: PM: hibernation: Registered nosave memory: [mem 0x00000000-0x00000fff] Aug 31 18:01:29 vm-cachyos kernel: PM: hibernation: Registered nosave memory: [mem 0x00001000-0x00001fff] Aug 31 18:01:29 vm-cachyos kernel: PM: hibernation: Registered nosave memory: [mem 0x000a0000-0x000bffff] Aug 31 18:01:29 vm-cachyos kernel: PM: hibernation: Registered nosave memory: [mem 0x000c0000-0x000fffff] Aug 31 18:01:29 vm-cachyos kernel: PM: hibernation: Registered nosave memory: [mem 0x0dbb7000-0x0dbb7fff] Aug 31 18:01:29 vm-cachyos kernel: PM: hibernation: Registered nosave memory: [mem 0x0dbb8000-0x0dbb8fff] Aug 31 18:01:29 vm-cachyos kernel: PM: hibernation: Registered nosave memory: [mem 0x0dbb9000-0x0dbb9fff] Aug 31 18:01:29 vm-cachyos kernel: PM: hibernation: Registered nosave memory: [mem 0x0dbbb000-0x0dbbbfff] Aug 31 18:01:29 vm-cachyos kernel: PM: hibernation: Registered nosave memory: [mem 0x0dbbc000-0x0dbbcfff] Aug 31 18:01:29 vm-cachyos kernel: PM: hibernation: Registered nosave memory: [mem 0x0dbc4000-0x0dbc4fff] Aug 31 18:01:29 vm-cachyos kernel: PM: hibernation: Registered nosave memory: [mem 0x0dbc6000-0x0dbc6fff] Aug 31 18:01:29 vm-cachyos kernel: PM: hibernation: Registered nosave memory: [mem 0x0e04b000-0x0e050fff] Aug 31 18:01:29 vm-cachyos kernel: PM: hibernation: Registered nosave memory: [mem 0x0fb6f000-0x0fbdefff] Aug 31 18:01:29 vm-cachyos kernel: PM: hibernation: Registered nosave memory: [mem 0x0fbdf000-0x0fbfafff] Aug 31 18:01:29 vm-cachyos kernel: PM: hibernation: Registered nosave memory: [mem 0x0fbfb000-0x0fbfefff] Aug 31 18:01:29 vm-cachyos kernel: PM: hibernation: Registered nosave memory: [mem 0xc0000000-0xffbfffff] Aug 31 18:01:29 vm-cachyos kernel: PM: hibernation: Registered nosave memory: [mem 0xffc00000-0xffc29fff] Aug 31 18:01:29 vm-cachyos kernel: PM: hibernation: Registered nosave memory: [mem 0xffc2a000-0xffffffff] Aug 31 18:06:04 vm-cachyos kernel: PM: hibernation: hibernation entry Aug 31 18:06:12 vm-cachyos kernel: PM: hibernation: Marking nosave pages: [mem 0x00000000-0x00001fff] Aug 31 18:06:12 vm-cachyos kernel: PM: hibernation: Marking nosave pages: [mem 0x000a0000-0x000fffff] Aug 31 18:06:12 vm-cachyos kernel: PM: hibernation: Marking nosave pages: [mem 0x0dbb7000-0x0dbb9fff] Aug 31 18:06:12 vm-cachyos kernel: PM: hibernation: Marking nosave pages: [mem 0x0dbbb000-0x0dbbcfff] Aug 31 18:06:12 vm-cachyos kernel: PM: hibernation: Marking nosave pages: [mem 0x0dbc4000-0x0dbc4fff] Aug 31 18:06:12 vm-cachyos kernel: PM: hibernation: Marking nosave pages: [mem 0x0dbc6000-0x0dbc6fff] Aug 31 18:06:12 vm-cachyos kernel: PM: hibernation: Marking nosave pages: [mem 0x0e04b000-0x0e050fff] Aug 31 18:06:12 vm-cachyos kernel: PM: hibernation: Marking nosave pages: [mem 0x0fb6f000-0x0fbfefff] Aug 31 18:06:12 vm-cachyos kernel: PM: hibernation: Marking nosave pages: [mem 0xc0000000-0xffffffff] Aug 31 18:06:12 vm-cachyos kernel: PM: hibernation: Basic memory bitmaps created Aug 31 18:06:12 vm-cachyos kernel: PM: hibernation: Preallocating image memory Aug 31 18:06:12 vm-cachyos kernel: PM: hibernation: Allocated 364819 pages for snapshot Aug 31 18:06:12 vm-cachyos kernel: PM: hibernation: Allocated 1459276 kbytes in 7.40 seconds (197.19 MB/s) Aug 31 18:06:12 vm-cachyos kernel: PM: hibernation: Basic memory bitmaps freed Aug 31 18:06:12 vm-cachyos kernel: PM: hibernation: hibernation exit Aug 31 18:06:12 vm-cachyos systemd[1]: Dependency failed for System Hibernation. ```

systemctl status systemd-hibernate

Click to expand ``` × systemd-hibernate.service - Hibernate Loaded: loaded (/usr/lib/systemd/system/systemd-hibernate.service; static) Active: failed (Result: exit-code) since Thu 2023-08-31 18:06:12 NZST; 6min ago Docs: man:systemd-hibernate.service(8) Process: 3899 ExecStart=/usr/lib/systemd/systemd-sleep hibernate (code=exited, status=1/FAILURE) Main PID: 3899 (code=exited, status=1/FAILURE) CPU: 7.427s Aug 31 18:06:04 vm-cachyos systemd[1]: Starting Hibernate... Aug 31 18:06:04 vm-cachyos systemd-sleep[3899]: Entering sleep state 'hibernate'... Aug 31 18:06:12 vm-cachyos systemd-sleep[3899]: Failed to put system to sleep. System resumed again: Device or resource busy Aug 31 18:06:12 vm-cachyos systemd[1]: systemd-hibernate.service: Main process exited, code=exited, status=1/FAILURE Aug 31 18:06:12 vm-cachyos systemd[1]: systemd-hibernate.service: Failed with result 'exit-code'. Aug 31 18:06:12 vm-cachyos systemd[1]: Failed to start Hibernate. Aug 31 18:06:12 vm-cachyos systemd[1]: systemd-hibernate.service: Consumed 7.427s CPU time. ```

Avoiding systemctl hibernate and instead trying to hibernate directly (suspend to disk) fails for the same reason:

$ echo 'disk' | sudo tee /sys/power/state

tee: /sys/power/state: Device or resource busy

Due to that failure, the VM guest is still functional, but some of the functionality is not restored. Such as the VM guest cursor has become invisible as part of the hibernation attempt.

polarathene commented 1 year ago
Resolved - Unrelated compatibility issue with update to 6.5 kernel **UPDATE:** As the 6.4 kernel was working fine, but the issues described below started after the update to 6.5 kernel, I verified with the 6.1.50 LTS kernel which is working properly (still no hibernation) performance wise. --- Not quite sure what happened. After this bug report, I powered off a VM guest stuck in the blank hibernation window (_1st scenario with 3D graphics enabled_), and booted the VM again. It booted into the desktop, but only the top quarter of the display output was rendered, the other 3 quarters were black. The visible region was not responding to mouse input. Checked the `vmware.log` and it spammed the last line repeatedly: ``` 2023-08-31T07:21:19.993Z In(05) vcpu-5 SVGA: FIFO is already mapped 2023-08-31T07:21:19.993Z In(05) vcpu-5 SVGA: Registering IOSpace at 0x2140 2023-08-31T07:21:19.993Z In(05) vcpu-5 SVGA: Registering MemSpace at 0xf0000000(0xf0000000) and 0xfb800000(0xfb800000) 2023-08-31T07:21:19.994Z In(05) vcpu-5 SVGA: FIFO is already mapped 2023-08-31T07:21:20.220Z In(05) vcpu-3 UHCI: HCReset 2023-08-31T07:21:20.335Z In(05) vcpu-4 UHCI: HCReset 2023-08-31T07:21:21.287Z In(05) vcpu-10 VIDE: (0x170) Unexpected IN for cmd 0xa3 len 2 2023-08-31T07:21:21.287Z In(05) vcpu-10 VIDE: (0x170) Unexpected IN for cmd 0xa3 len 2 2023-08-31T07:21:21.287Z In(05) vcpu-10 VIDE: (0x170) Unexpected IN for cmd 0xa3 len 2 2023-08-31T07:21:21.287Z In(05) vcpu-10 VIDE: (0x170) Unexpected IN for cmd 0xa3 len 2 ... 2023-08-31T07:21:21.876Z In(05) vcpu-10 VIDE: (0x170) Unexpected IN for cmd 0xa3 len 2 2023-08-31T07:21:21.876Z In(05) vcpu-10 VIDE: (0x170) Unexpected IN for cmd 0xa3 len 2 2023-08-31T07:21:21.876Z In(05) vcpu-10 VIDE: (0x170) Switching active drive from 0 to 1 while command 0xa0 is in progress (status 0x08). 2023-08-31T07:21:21.876Z In(05) vcpu-10 VIDE: IDE1:0: Aborting command a0, I/O was not in progress 2023-08-31T07:21:22.053Z In(05) vcpu-9 VIDE: (0x170) Unexpected IN for cmd 0xa3 len 2 2023-08-31T07:21:22.053Z In(05) vcpu-9 VIDE: (0x170) Unexpected IN for cmd 0xa3 len 2 2023-08-31T07:21:22.053Z In(05) vcpu-9 VIDE: (0x170) Unexpected IN for cmd 0xa3 len 2 2023-08-31T07:21:22.053Z In(05) vcpu-9 VIDE: (0x170) Unexpected IN for cmd 0xa3 len 2 ... 2023-08-31T07:21:22.053Z In(05) vcpu-9 VIDE: (0x170) Unexpected IN for cmd 0xa3 len 2 2023-08-31T07:21:22.053Z In(05) vcpu-9 VIDE: (0x170) Unexpected IN for cmd 0xa3 len 2 2023-08-31T07:21:26.632Z No(00) vcpu-9 <<< file Throttled >>> ``` Using the Player "Suspend" UI button, I opened VMware Workstation Player again and resumed the VM. It restored the guest with display fully rendering again and responsive to mouse input. However the `vmware.log` contains similar spam output as shown above, but for different `vcpu` (`vcpu-0`, `vcpu-7`, `vcpu-6`). Unclear if that was related to the failed hibernation attempt. A shutdown of the VM guest and starting it again seems to cause the same behaviour. Stalling on `Triggering uevents...` for about 5 minutes or so. The first time (_the quarter display issue) it proceeded to boot after the long delay. This one updated with the following and stalled: image A few minutes later that updated again: image And the system booted up as normal and quickly again after that. This time with the full display output as it should. Running the command `sudo blkid` stalled for 38 seconds until responding with output.
Ignore - Not related to the swap partition Here is the root partition (`sda2`) and the swap partition (`sdc1`): ``` /dev/sdc1: LABEL="hibernation" UUID="d3b3e4d1-0f2e-446b-8cd5-629f8c5fafbf" TYPE="swap" PARTUUID="deb27f41-49ea-4256-900d-04f9ed2a157a" /dev/sda2: UUID="fe346b77-491a-4a5b-a780-af173f112b3e" BLOCK_SIZE="512" TYPE="xfs" PARTLABEL="root" PARTUUID="7e2cfa9d-ff4d-4ed6-b147-f1300eb9ae85" ``` Swap device is functional: ```console $ swapon --show NAME TYPE SIZE USED PRIO /dev/sdc1 partition 8G 0B -2 /dev/zram0 partition 970.2M 0B 100 $ cat /etc/fstab # UUID=7DE1-3F5B /boot vfat defaults,noatime 0 2 UUID=fe346b77-491a-4a5b-a780-af173f112b3e / xfs defaults,lazytime,noatime,attr2,inode64,logbsize=256k,noquota 0 1 UUID=3406082d-37a8-4328-ac72-ccfb70528a5e /data btrfs defaults,lazytime,noatime,compress-force=zstd UUID=d3b3e4d1-0f2e-446b-8cd5-629f8c5fafbf none swap sw 0 0 $ cat /proc/cmdline initrd=\amd-ucode.img initrd=\initramfs-linux-cachyos.img root=PARTUUID=7e2cfa9d-ff4d-4ed6-b147-f1300eb9ae85 rw zswap.enabled=0 nowatchdog $ cat /boot/loader/entries/linux-cachyos.conf title Linux Cachyos linux /vmlinuz-linux-cachyos initrd /amd-ucode.img initrd /initramfs-linux-cachyos.img options root=PARTUUID=7e2cfa9d-ff4d-4ed6-b147-f1300eb9ae85 rw zswap.enabled=0 nowatchdog ``` As 2nd half of above shows, the kernel boot parameter `resume` is now missing. I updated the guest after the bug report, and was not aware that CachyOS was configured to modify the `linux-cachyos.conf` removing my change. Instead that is intended to be managed by `/etc/sdboot-manage.conf` (_appears to be a [tool of their own](https://github.com/CachyOS/systemd-boot-manager)_). Since `/etc/mkinitcpio.conf` still had the `resume` hook I added to `HOOKS`, the hibernation boot error message is due to lacking the `resume` kernel parameter. Correcting the kernel boot parameters to include `resume` did not help with the performance degradation.
--- Although that does not explain why `sudo blkid` (_a 2nd attempt took almost 3 minutes to respond_) would be stalled for so long, or the massive log spam to `vmware.log`? Looking at the boot logs via `journalctl -b` I wonder if this is related: ``` Aug 31 19:35:42 vm-cachyos kernel: pci 0000:00:15.0: BAR 15: assigned [mem 0xfef00000-0xff0fffff 64bit pref] Aug 31 19:35:42 vm-cachyos kernel: pci 0000:00:10.0: BAR 6: assigned [mem 0xff100000-0xff103fff pref] Aug 31 19:35:42 vm-cachyos kernel: pci 0000:00:15.0: BAR 13: assigned [io 0x3000-0x3fff] Aug 31 19:35:42 vm-cachyos kernel: pci 0000:00:15.1: BAR 13: assigned [io 0x4000-0x4fff] Aug 31 19:35:42 vm-cachyos kernel: pci 0000:00:15.2: BAR 13: assigned [io 0x6000-0x6fff] Aug 31 19:35:42 vm-cachyos kernel: pci 0000:00:15.3: BAR 13: assigned [io 0x7000-0x7fff] Aug 31 19:35:42 vm-cachyos kernel: pci 0000:00:15.4: BAR 13: assigned [io 0x8000-0x8fff] Aug 31 19:35:42 vm-cachyos kernel: pci 0000:00:15.5: BAR 13: assigned [io 0x9000-0x9fff] Aug 31 19:35:42 vm-cachyos kernel: pci 0000:00:15.6: BAR 13: assigned [io 0xa000-0xafff] Aug 31 19:35:42 vm-cachyos kernel: pci 0000:00:15.7: BAR 13: assigned [io 0xb000-0xbfff] Aug 31 19:35:42 vm-cachyos kernel: pci 0000:00:16.0: BAR 13: assigned [io 0xc000-0xcfff] Aug 31 19:35:42 vm-cachyos kernel: pci 0000:00:16.1: BAR 13: assigned [io 0xd000-0xdfff] Aug 31 19:35:42 vm-cachyos kernel: pci 0000:00:16.2: BAR 13: assigned [io 0xe000-0xefff] Aug 31 19:35:42 vm-cachyos kernel: pci 0000:00:16.3: BAR 13: assigned [io 0xf000-0xffff] Aug 31 19:35:42 vm-cachyos kernel: pci 0000:00:16.4: BAR 13: no space for [io size 0x1000] Aug 31 19:35:42 vm-cachyos kernel: pci 0000:00:16.4: BAR 13: failed to assign [io size 0x1000] Aug 31 19:35:42 vm-cachyos kernel: pci 0000:00:16.5: BAR 13: no space for [io size 0x1000] Aug 31 19:35:42 vm-cachyos kernel: pci 0000:00:16.5: BAR 13: failed to assign [io size 0x1000] Aug 31 19:35:42 vm-cachyos kernel: pci 0000:00:16.6: BAR 13: no space for [io size 0x1000] Aug 31 19:35:42 vm-cachyos kernel: pci 0000:00:16.6: BAR 13: failed to assign [io size 0x1000] ``` **UPDATE:** These logs above remain with the 6.1.50 LTS kernel, but the stalls experienced with the 6.5 kernel are gone and these logs no longer show in `vmware.log`: ``` 2023-08-31T07:21:22.053Z In(05) vcpu-9 VIDE: (0x170) Unexpected IN for cmd 0xa3 len 2 ```
polarathene commented 1 year ago

Related is systemctl suspend (S3). I was filing a separate issue for that, with logs collected but I when I returned to complete the bug report Windows had decided to force restart the system to apply an update... 😫

That took a while to prepare, and I don't have time to go through that again. Gist of it was:

rprabhud commented 1 year ago

Thank you for reporting the problem. We have informed the product team responsible for this issue. As this issue is not related to open-vm-tools, please use VMTN (https://communities.vmware.com/) or VMware Support service for further updates on this issue or other product issues unrelated to open-vm-tools.

polarathene commented 1 year ago

As this issue is not related to open-vm-tools

How do you determine when it is related? There is clearly log entries from open-vm-tools related to the failure.

rprabhud commented 1 year ago

Guest hibernate is not an open-vm-tools feature. The log messages about "Tools heartbeat timeout" or GuestRpc timeout/error are result of hibernate/power-off indicating loss of connection between vmx and tools. Those errors messages are not causing hibernate failure.