letsencrypt / openzfs-nvme-databases

Creative Commons Zero v1.0 Universal
572 stars 36 forks source link

ZFS compression vs InnoDB PageSize / built in compression. #2

Closed zytek closed 3 years ago

zytek commented 3 years ago

Not sure if this is the right place to ask, but let's try. :)

You've mentioned explicitly matching InnoDB page size to ZFS recordsize - both at 16K.

But how does that play with ZFS compression then? You might, or might not, squeeze two innodb pages in one record, but what if not? You lose the "alignment", but with mostly lots of sequential reads that shouldn't be a problem? What about writes?

Have you considered using innodb built-in compression? You would have to compress in userspace, but might save some IOPS with better alignment I guess?

Just curious. Thanks for great article and repo with great clarifications on each used setting. If you ever consider checking impact of using different innodb page size and compression settings then please post some benchmarks, would love to see them :)

pcd1193182 commented 3 years ago

I don't work at letsencrypt, but I am a ZFS developer. You may be slightly misunderstanding how ZFS-level compression works, if I'm reading your question right.

You can't squeeze two pages into a single record, because a zfs record is the logical size of a chunk of data. When the InnoDB page size and the ZFS recordsize match, each InnoDB page will be exactly one ZFS record. The compressiion happens after that matching-up. The question then becomes whether ZFS will manage to shrink that 16k chunk of logical data below 8k physical size or not; if so, the 16k record will only take up one 8k sector on disk (since the ashift was set to 13, 8k is the smallest region ZFS will allocate). If you use InnoDB compression instead, pages will not be aligned with the start of ZFS records, which will probably result in a large number of read/modify/write cycles at the ZFS level.

zytek commented 3 years ago

Thank you for clarifications.