Closed sammcj closed 2 years ago
FYI we have been doing this with puppet as such:
# Enable discard / unmap / trim at the LVM level
file_line {'issue_discards':
ensure => 'present',
line => ' issue_discards = 1',
path => '/etc/lvm/lvm.conf',
match => / issue_discards = 0/,
multiple => false,
replace => true,
}
Do you know why it's not the default in lvm.conf? Are there drawbacks to this setting?
No idea, I wouldn't be surprised if it was just a historical thing, it certainly annoys me having to change it on every machine I build.
AFAIK there are no drawbacks because if the underlying block device is not capable of the IOCTL it's simply not issued.
Do you make any other changes for this to have any effect? Because this does not seem to work on lvmoiscsi when removing a VDI from a SR.
According to https://discussions.citrix.com/topic/378764-reclaiming-freed-space/#comment-1931283 lvremove is called without discard, unless you do a "Reclaim freed space" which sends discards even if issue_discards = 0
in lvm.conf. Reclaim freed space does work.
It also suggests changing lvutil.py, and indeed, after changing lvutil.py it does work for me.
adding the config_param "issues_discards=1" to the lvutil.py remove function, which makes it automatic and transparent
It would be nice to have a SR parameter (maybe an sr-config key) auto-trim=true or something so folks with cooperative storage platforms can maintain the benefit of thin provisioning at scale.
Ah yes - sorry lvutil.py is the other part I forgot in this suggestion - thank you for picking that up 👍
@sammcj I believe you are misinterpreting this option - issue_discards
tells LVM to actively discard any PV extents no longer mapped to an LV.
Discard commands issued on a filesystem can be passed down through an LV regardless of this setting.
I am now testing it with CEPH RBD backend, it is certainly promising, as CEPH is also benefiting from this greatly
it most definitely works with CEPH and is very useful, just changing this option in both /etc/lvm/lvm.conf as well as master/lvm.conf made space reclaiming possible, deletion of any virtual disk immediately frees space in CEPH pool, which didn't happen before, unless you ran "reclaim free space" on SR, enabling this makes free space reclaiming unnecessary.
I think that main reason why Xen doesn't go with this is simply that in theory for some storage vendors this may put some "performance impact" on storage when you discard unused blocks, or that's what they say:
Note: Reclaiming freed space is an intensive operation and can affect the performance of the storage array. You should perform this operation only when space reclamation is required on the array. Citrix recommends that you schedule this work outside the peak array demand hours
So in theory, you may not want to automatically discard unused blocks of data left by deleted VDI (LVs) immediately, but during some scheduled night job. However I believe that in most other cases (SSD disks, or even CEPH as in my case) this is irrelevant, as SSD disks don't suffer from any performance penalty when TRIM issued. I don't even know of any other enterprise storage arrays that actually do, this may very well be some ancient left-over from old times where storages really had issues with discards?
I have raised your questions to the developers of the storage stack in the XAPI project: https://github.com/xapi-project/sm/issues/514
Quoting the answer:
It is set to 0 by default due to data corruption caused by SANs that implement the discard functionality incorrectly so it isn't safe to have on by default. The performance issue is also valid. Some SANs are understood to lock an entire LUN or RAID set in order to complete discard which is obviously undesirable with active dynamic workloads.
More on the linked issue.
Any chance to know which ones work safely and which don't? Edit: Quickspecs for our HPE MSA2040 says: "UNMAP reclaims space that is no longer on a thinly provisioned VMFS volume" so technically it should be fine. But from what I read it's limited to unmaps of (at least) 4 MB per block. (https://community.hpe.com/t5/msa-storage/msa2042-unmap-on-esx6-5/m-p/6976465#M10916)
Any chance to know which ones work safely and which don't?
No, there is no such list.
they do have valid points though, I do believe that some SAN storage (especially older one) that is working fine now with discards disabled, may start malfunctioning when you suddenly enable them, so indeed, forcing this via update or patch for everyone is probably not wanted here.
It would be best if there was an option to enable / disable this (probably on storage level, cluster-wide?) so that users can decide per storage if they want this enabled.
I was reading more about this topic when it comes to SSD, I strongly recommend this: https://web.archive.org/web/20100227234254/http://www.devwhy.com/blog/2009/8/4/from-write-down-to-the-flash-chips.html which describes how TRIM works in detail on SSDs, and after reading this and many more articles, I came to conclusion that issuing discard too frequently could indeed have some negative performance effect on SSD drives, which is also a reason why most Linux FS do not recommend running continuous trim, but rather schedule a fstrim weekly or monthly (but that is a different use-case, here we talk about deleteing whole VHD, instead of deleting files on FS, which would probably happen much less often).
I have to admit I didn't have SSDs in mind at all when writing about this, as my use-case is CEPH backend, which seems to only benefit from this, with no performance impact whatsoever and no issues with data-corruption (or I haven't observed any so far, but CEPH seems to support this for quite some time, and other virtualization providers such as proxmox or OpenStack use this) https://ceph.io/geen-categorie/openstack-and-ceph-rbd-discard/
Would also be interesting: Which storages does XCP-ng/CH 'clean up'? All block storages? iSCSI, FC... local (SAS)? AFAIK TRIM is the (S-)ATA command and in SCSI-World (SAS, FC, iSCSI) it's discard.
Any chance to know which ones work safely and which don't?
No, there is no such list.
But it's assumable, that if it works with VMWare, there's no reason it should break stuff unter Linux (XCP-ng...)?
no behaviour would be same, no matter the OS
Trim and discard and virtually same operation from OS level point of view, just as you said "trim" is used in SSD world. Technically you are just notifying underlying storage that data segment (marked by starting block and length) is no longer used and can be discarded. Which on thin provisioned storage means reducing used space (which could then be used by another volume, resulting in much more efficient storage usage).
In LVM, as stated earlier, issue_discards=1 only performs a trim when deallocating extents/chunks from a PV. In practice, this means that if issue_discards=1:
Why is the default 0?
Likely to prevent long I/O stalls when a large object (normal LV or thin pool) is removed and the space is discarded...along with the serious side effects long I/O stalls can cause in production systems.
B
Do you make any other changes for this to have any effect? Because this does not seem to work on lvmoiscsi when removing a VDI from a SR.
According to https://discussions.citrix.com/topic/378764-reclaiming-freed-space/#comment-1931283 lvremove is called without discard, unless you do a "Reclaim freed space" which sends discards even if
issue_discards = 0
in lvm.conf. Reclaim freed space does work.It also suggests changing lvutil.py, and indeed, after changing lvutil.py it does work for me.
adding the config_param "issues_discards=1" to the lvutil.py remove function, which makes it automatic and transparent
It would be nice to have a SR parameter (maybe an sr-config key) auto-trim=true or something so folks with cooperative storage platforms can maintain the benefit of thin provisioning at scale.
@Ultra2D Can you outline exactly where/what changes you made to lvutil.py? In usual citrix fashion, the thread you linked with details (https://discussions.citrix.com/topic/378764-reclaiming-freed-space/#comment-1931283) no longer exists
No, sorry, I'm no longer using XCP-ng and have no notes on this either.
issue_discards should be enabled by default in
/etc/lvm/lvm.conf
, this allows TRIM/DISCARD and SCSI UNMAP IOCTLs through the LVM layer which is very important if you use SSDs or in many cases with thin provision iSCSI storage.From lvm.conf:
Patch for change:
This is an improvement suggestion as per a brief discussion on the XCP-ng forums based on my XenServer and CentOS tuning experiences.