xcp-ng / xcp

Entry point for issues and wiki. Also contains some scripts and sources.
https://xcp-ng.org
1.32k stars 74 forks source link

Feature request: enable issue_discards=1 in /etc/lvm/lvm.conf #123

Closed sammcj closed 2 years ago

sammcj commented 5 years ago

issue_discards should be enabled by default in /etc/lvm/lvm.conf, this allows TRIM/DISCARD and SCSI UNMAP IOCTLs through the LVM layer which is very important if you use SSDs or in many cases with thin provision iSCSI storage.

From lvm.conf:

"Configuration option devices/issue_discards. Issue discards to PVs that are no longer used by an LV. Discards are sent to an LV's underlying physical volumes when the LV is no longer using the physical volumes' space, e.g. lvremove, lvreduce. Discards inform the storage that a region is no longer used. Storage that supports discards advertise the protocol-specific way discards should be issued by the kernel (TRIM, UNMAP, or WRITE SAME with UNMAP bit set). Not all storage will support or benefit from discards, but SSDs and thinly provisioned LUNs generally do. If enabled, discards will only be issued if both the storage and kernel provide support."

Patch for change:

--- /etc/lvm/lvm.conf   2019-01-03 11:36:22.379998897 +1100
+++ /etc/lvm/lvm.conf.discards  2019-01-03 14:35:14.624804878 +1100
@@ -298,7 +298,7 @@
    # benefit from discards, but SSDs and thinly provisioned LUNs
    # generally do. If enabled, discards will only be issued if both the
    # storage and kernel provide support.
-   issue_discards = 0
+   issue_discards = 1
 }

 # Configuration section allocation.

This is an improvement suggestion as per a brief discussion on the XCP-ng forums based on my XenServer and CentOS tuning experiences.

sammcj commented 5 years ago

FYI we have been doing this with puppet as such:

  # Enable discard / unmap / trim at the LVM level
  file_line {'issue_discards':
    ensure   => 'present',
    line     => '  issue_discards = 1',
    path     => '/etc/lvm/lvm.conf',
    match    => /  issue_discards = 0/,
    multiple => false,
    replace  => true,
  }
stormi commented 5 years ago

Do you know why it's not the default in lvm.conf? Are there drawbacks to this setting?

sammcj commented 5 years ago

No idea, I wouldn't be surprised if it was just a historical thing, it certainly annoys me having to change it on every machine I build.

AFAIK there are no drawbacks because if the underlying block device is not capable of the IOCTL it's simply not issued.

Ultra2D commented 5 years ago

Do you make any other changes for this to have any effect? Because this does not seem to work on lvmoiscsi when removing a VDI from a SR.

According to https://discussions.citrix.com/topic/378764-reclaiming-freed-space/#comment-1931283 lvremove is called without discard, unless you do a "Reclaim freed space" which sends discards even if issue_discards = 0 in lvm.conf. Reclaim freed space does work.

It also suggests changing lvutil.py, and indeed, after changing lvutil.py it does work for me.

adding the config_param "issues_discards=1" to the lvutil.py remove function, which makes it automatic and transparent

It would be nice to have a SR parameter (maybe an sr-config key) auto-trim=true or something so folks with cooperative storage platforms can maintain the benefit of thin provisioning at scale.

sammcj commented 5 years ago

Ah yes - sorry lvutil.py is the other part I forgot in this suggestion - thank you for picking that up 👍

Supermathie commented 5 years ago

@sammcj I believe you are misinterpreting this option - issue_discards tells LVM to actively discard any PV extents no longer mapped to an LV.

Discard commands issued on a filesystem can be passed down through an LV regardless of this setting.

benapetr commented 4 years ago

I am now testing it with CEPH RBD backend, it is certainly promising, as CEPH is also benefiting from this greatly

benapetr commented 4 years ago

it most definitely works with CEPH and is very useful, just changing this option in both /etc/lvm/lvm.conf as well as master/lvm.conf made space reclaiming possible, deletion of any virtual disk immediately frees space in CEPH pool, which didn't happen before, unless you ran "reclaim free space" on SR, enabling this makes free space reclaiming unnecessary.

benapetr commented 4 years ago

I think that main reason why Xen doesn't go with this is simply that in theory for some storage vendors this may put some "performance impact" on storage when you discard unused blocks, or that's what they say:

Note: Reclaiming freed space is an intensive operation and can affect the performance of the storage array. You should perform this operation only when space reclamation is required on the array. Citrix recommends that you schedule this work outside the peak array demand hours

So in theory, you may not want to automatically discard unused blocks of data left by deleted VDI (LVs) immediately, but during some scheduled night job. However I believe that in most other cases (SSD disks, or even CEPH as in my case) this is irrelevant, as SSD disks don't suffer from any performance penalty when TRIM issued. I don't even know of any other enterprise storage arrays that actually do, this may very well be some ancient left-over from old times where storages really had issues with discards?

stormi commented 4 years ago

I have raised your questions to the developers of the storage stack in the XAPI project: https://github.com/xapi-project/sm/issues/514

stormi commented 4 years ago

Quoting the answer:

It is set to 0 by default due to data corruption caused by SANs that implement the discard functionality incorrectly so it isn't safe to have on by default. The performance issue is also valid. Some SANs are understood to lock an entire LUN or RAID set in order to complete discard which is obviously undesirable with active dynamic workloads.

More on the linked issue.

nagilum99 commented 4 years ago

Any chance to know which ones work safely and which don't? Edit: Quickspecs for our HPE MSA2040 says: "UNMAP reclaims space that is no longer on a thinly provisioned VMFS volume" so technically it should be fine. But from what I read it's limited to unmaps of (at least) 4 MB per block. (https://community.hpe.com/t5/msa-storage/msa2042-unmap-on-esx6-5/m-p/6976465#M10916)

stormi commented 4 years ago

Any chance to know which ones work safely and which don't?

No, there is no such list.

benapetr commented 4 years ago

they do have valid points though, I do believe that some SAN storage (especially older one) that is working fine now with discards disabled, may start malfunctioning when you suddenly enable them, so indeed, forcing this via update or patch for everyone is probably not wanted here.

It would be best if there was an option to enable / disable this (probably on storage level, cluster-wide?) so that users can decide per storage if they want this enabled.

I was reading more about this topic when it comes to SSD, I strongly recommend this: https://web.archive.org/web/20100227234254/http://www.devwhy.com/blog/2009/8/4/from-write-down-to-the-flash-chips.html which describes how TRIM works in detail on SSDs, and after reading this and many more articles, I came to conclusion that issuing discard too frequently could indeed have some negative performance effect on SSD drives, which is also a reason why most Linux FS do not recommend running continuous trim, but rather schedule a fstrim weekly or monthly (but that is a different use-case, here we talk about deleteing whole VHD, instead of deleting files on FS, which would probably happen much less often).

I have to admit I didn't have SSDs in mind at all when writing about this, as my use-case is CEPH backend, which seems to only benefit from this, with no performance impact whatsoever and no issues with data-corruption (or I haven't observed any so far, but CEPH seems to support this for quite some time, and other virtualization providers such as proxmox or OpenStack use this) https://ceph.io/geen-categorie/openstack-and-ceph-rbd-discard/

nagilum99 commented 4 years ago

Would also be interesting: Which storages does XCP-ng/CH 'clean up'? All block storages? iSCSI, FC... local (SAS)? AFAIK TRIM is the (S-)ATA command and in SCSI-World (SAS, FC, iSCSI) it's discard.

nagilum99 commented 4 years ago

Any chance to know which ones work safely and which don't?

No, there is no such list.

But it's assumable, that if it works with VMWare, there's no reason it should break stuff unter Linux (XCP-ng...)?

benapetr commented 4 years ago

no behaviour would be same, no matter the OS

Trim and discard and virtually same operation from OS level point of view, just as you said "trim" is used in SSD world. Technically you are just notifying underlying storage that data segment (marked by starting block and length) is no longer used and can be discarded. Which on thin provisioned storage means reducing used space (which could then be used by another volume, resulting in much more efficient storage usage).

brendanhoar commented 2 years ago

In LVM, as stated earlier, issue_discards=1 only performs a trim when deallocating extents/chunks from a PV. In practice, this means that if issue_discards=1:

  1. When a normal LV is removed from a VG, discards are issued for the space in the PV that the LV relinquished.
  2. When a thin LV is removed from a thin pool, discards ARE NOT ISSUED for any space the LV relinquished, as it is still allocated to the thin pool. A workaround is to explicitly invoke blkdiscard on the thin LV device before removing it from the pool, after the associated VM has shut down.
  3. When a thin pool is removed from a VG, discards are issued for the space in the PV that the thin pool relinquised (a thin pool lives inside a object similar to a normal LV, effectively).

Why is the default 0?

Likely to prevent long I/O stalls when a large object (normal LV or thin pool) is removed and the space is discarded...along with the serious side effects long I/O stalls can cause in production systems.

B

Fohdeesha commented 10 months ago

Do you make any other changes for this to have any effect? Because this does not seem to work on lvmoiscsi when removing a VDI from a SR.

According to https://discussions.citrix.com/topic/378764-reclaiming-freed-space/#comment-1931283 lvremove is called without discard, unless you do a "Reclaim freed space" which sends discards even if issue_discards = 0 in lvm.conf. Reclaim freed space does work.

It also suggests changing lvutil.py, and indeed, after changing lvutil.py it does work for me.

adding the config_param "issues_discards=1" to the lvutil.py remove function, which makes it automatic and transparent

It would be nice to have a SR parameter (maybe an sr-config key) auto-trim=true or something so folks with cooperative storage platforms can maintain the benefit of thin provisioning at scale.

@Ultra2D Can you outline exactly where/what changes you made to lvutil.py? In usual citrix fashion, the thread you linked with details (https://discussions.citrix.com/topic/378764-reclaiming-freed-space/#comment-1931283) no longer exists

Ultra2D commented 10 months ago

No, sorry, I'm no longer using XCP-ng and have no notes on this either.