Open bmatican opened 1 year ago
In general, fsync forces the OS to flush the dirty pages for a given inode/file, as well as the filesystem journal. For more info see: https://dev.to/yugabyte/the-anatomy-of-xfs-fsync-4ael
There is an alternative fsync call that, if safe, will only flush the dirty pages and not the journal, which is "fdatasync". The journal can be a serialisation point, which is made clear in the above text about fsync. The fdatasync call is safe, because if it detect a file structure change that needs flushing of the journal, it will automatically perform that.
YugabyteDB pre-allocates its WAL files, and therefore the WAL write that needs flushing/persistence for transactions can use the fdatasync call without requiring the journal write, because there is no inode/file structure change.
PostgreSQL also uses fdatasync, as well as pre-allocates its WAL files. The fsync options and implementation have been thoroughly been looked at in 2018 when linux was found to optionally not show IO errors in some cases, which has been fixed.
I performed a simple test in a VM on my laptop to see the difference between calling fsync() and fdatasync() after a write() calls:
This shows tests for calling fdatasync() (Fsync/fdatasync), fsync() (Fsync/fsync) and no synchronisation call (Fsync/no sync). The violin plot shows the variance and the relationship between the tests.
The fdatasync() test shows a mean of 443 us, The fsync() test shows a mean of 1,061 us, The no sync test shows a mean of 9.8 us.
These are tests with a pre-allocated file of 64M using a buffered write of 24k. -> In a VM on my laptop (!)
The sourcecode for testing this is here: https://github.com/fritshoogland-yugabyte/benchmark
Jira Link: DB-5995
Description
As per discussions with @fritshoogland-yugabyte , given we pre-allocate our WAL files, we might be able to relieve some disk pressure by using fdatasync instead of fsync
Frits is working on a good test bed for us to validate the win from this, but implementation wise, this should be relatively easy to pull off and put behind a new gflag. FYI @rthallamko3 @Huqicheng @yusong-yan @ttyusupov
Warning: Please confirm that this issue does not contain any sensitive information