vmware / open-vm-tools

Official repository of VMware open-vm-tools project
http://sourceforge.net/projects/open-vm-tools/
2.21k stars 419 forks source link

SCSI UNMAP from inside of VM does not work after removing snapshot #581

Open Spockie opened 2 years ago

Spockie commented 2 years ago

Describe the bug

SCSI UNMAP from inside of VM does not work after removing snapshot, when UNMAP operation was attempted during that snapshot removal

we have this problem with VMware ESXi, 7.0.2, 18426014 and later 7.0.2 builds

Reproduction steps

how to demonstrate:
inside Rocky Linux 8 (or CentOS 8, or openSUSE) VM run fstrim -v / it does work (we have thin provisioned disk in our VMs)
create VM snapshot then during snapshot removal run inside fstrim -v /
[root@xxx ~]# fstrim -v /
fstrim: /: the discard operation is not supported

unfortunatelly discard starts automatically to work again only after reboot

I know how to force linux kernel to try DISCARD again after failure but even so I would prefer to VMware does not returning error when deleting VMware snapshot, if SCSI UNMAP is imposible to do when deleting snapshot, then queue it or something like that
- but VMware support told me that problem is not on their side and I should contact open-vm tools vendor

what works as workaround for me (in our Rocky Linux 8 VMs):
echo 1 > /sys/block/sdb/device/rescan
dmsetup table vgmain-lvroot | dmsetup reload vgmain-lvroot && dmsetup resume vgmain-lvroot

Expected behavior

UNMAP should be still working

may be open-vm tools kernel driver can itself try to check if UNMAP is workable (after it did not work for a little while during snapshot removal)

Additional context

No response

PaTHml commented 2 years ago

Hi, Storage UNMAP issues are not open-vm-tools issues; Please open a service/support request with VMware support to diagnose the issue further.

If you already have an ticket number, we'll need it (private msg) to move this along.

Spockie commented 2 years ago

I have already contacted VMware support, according to them issue is not on VMware side: ... I wanted to update you that we are still investigating on the issue.

We found out that the open vm-tools is responsible for the auto reclamation call inside guest OS when deleting snapshot is done as a result we investigating if there is a workaround/resolution from VMware level or to ask from you to kindly engage the guest OS as Linux guest OS vendor is responsible for distributing the open vm-tools per kb https://kb.vmware.com/s/article/2073803?lang=en_US

We will continue investigating and get back to you as soon as possible ... After internal investigation, I am afraid that from VMware level nothing can be done for the auto reclamation as it is blocked from guest OS level not VMware level .

As a result, Kindly engage the guest OS vendor to check from their level ad their logging why the auto reclamation is not working from guest and works fine on storage level.

I will be there for you from VMware level if the guest OS needed our assistance in any time. ... The issue is not re-producible in on our labs.

I am afraid that VMware no longer manages the VMware tools or release any more native VMware tools for Linux operating systems, it is guest OS managed per the kb sent before

I tried to find any workaround from VMware level to assist you in our internal resources but unfortunately this issue is on guest OS level not virtual machine VMware level.

As a result, Guest OS engagement is needed and if there is anything they required from VMware level, we will be there to collaborate and assist ...

Spockie commented 2 years ago

So for now I see only one "solution": cron e.g. daily at 09:00: echo 1 > /sys/block/sdb/device/rescan dmsetup table vgmain-lvroot | dmsetup reload vgmain-lvroot && dmsetup resume vgmain-lvroot fstrim /

Spockie commented 2 years ago

Btw how can I send you private message (with VMware support issue number)?

PaTHml commented 2 years ago

Btw how can I send you private message (with VMware support issue number)?

That I don't know. I thought there was something on the menus. Try reaching me with phamel at vmware . That should be the most expedient.

Spockie commented 2 years ago

Btw if VMware support was not able to reproduce the issue, I think that may be the bug can be at storage driver specific for our environment (and different from used in their test lab).

Spockie commented 2 years ago

I have confirmed now that the issue is in ESXi, 7.0.3, 19193900 too.

PaTHml commented 2 years ago

Thank you for the ticket numbers, I received them yesterday.

VM with thin disks, which SCSI controller is configured for those?

Spockie commented 2 years ago

yes, VMs with thin disks SCSI controller: VMware Paravirtual

Spockie commented 1 year ago

Btw this bug is still not fixed. It happens even in Rocky Linux 9.

Spockie commented 1 week ago

Btw is there any plan to fix this bug?