letsencrypt / openzfs-nvme-databases

Creative Commons Zero v1.0 Universal
572 stars 36 forks source link

About ZIL #7

Closed jimklimov closed 3 years ago

jimklimov commented 3 years ago

Hello, first of all - thanks for this write-up and articles on LE site, and for popularization of ZFS in practical large-scale production (even though my background is in another, original, branch of the project ;) )

One point that caught my eye here was that you did not use a separate ZIL because all devices are already fast. While this gauge is true, a separated-hardware ZIL can bring other benefits due to a couple of points:

The main idea with ZIL is that it journals sync writes, going as a ring-buffer, and in a perfect world your system never crashes and so never has to read from it. This allows for some devices to be much more efficient than others at this job.

Practical benefits can be:

On a separate note, people often partition not 100% but some 80%-90% of their device to be used as SSD storage, to ensure there are always "unused" logical pages available for complete reporogramming, in addition to whatever hardware redundancy the vendor cooked into the device.

Also note that since ZFS never overwrites currently referenced data in-place, it can succumb to free space fragmentation and take much longer to find free spots in the data tree to put new writes into after some percentage of the pool is used (and in case of HDDs, it also involved much more seeking for small writes). The particular number/percentage is different for every pool depending on its write and delete history, but keeping somewhere around 10%-25% always free is regarded as a safe zone without looking closer at a particular pool. That said, I've had an 8Tb server back in the day whose performance only took a hit as it was chewing through the last 100Gb, so mileage really varies a lot.

bdaehlie commented 3 years ago

Hi @jimklimov - I am not an engineer who can appropriately respond to your comment, but I want to thank you for your kind offer of advice here! We really appreciate it.

jprenken commented 3 years ago

Likewise, thank you very much! This is very useful. We'll try to experiment with ZIL as we improve our database servers, and for now, I've added a direct reference to this issue in the document.

jimklimov commented 3 years ago

Good luck with your experiments. If hardware pernits, do stress-test in non-production first :) All systems differ (numbers and widths of buses playing a role), so general theory might not visibly materialize in a particular practical case :)