Open dcoutts opened 6 months ago
Sounds reasonable to me. Were you preparing a PR?
Yes, we would intend to prepare a PR.
New information: it turns out that the platforms that do support direct I/O or equivalent, can all set it via fcntl. It doesn't have to be set at file open time.
In particular, Linux, FreeBSD and NetBSD all support setting O_DIRECT
via fcntl
, and as noted above, OSX only supports it via fcntl
and not file open.
So this may well be the better way to go, to add something like this to the System.Posix.Fcntl
module:
fileCaching :: Fd -> Bool -> IO ()
Again, it would be a no-op on platforms that do not support such hints (e.g. solaris, openbsd).
Or maybe fileSetCaching
/ fileGetCaching
. Names are hard.
The general pattern of fcntl
is get/set. The two existing functions in the module don't follow the get/set pattern, but they also do not actually use fcntl
.
Advice and opinions welcome.
Further update: OSX actually does not support a way to get the caching mode, only a way to set it.
So a portable API would be just fileSetCaching
with no fileGetCaching
. Or other alternative names (fileSetNoCaching
, fileNoCaching
).
And the CI test would be just: does the call not throw an exception (due to the syscall not returning -1). So no (portable) ability to test that what we get is the value we set.
This arguably makes sense for a portable API anyway, given that it's supposed to be a no-op on platforms where it's not supported, and on non-supporting platforms there is no such state to get.
So a portable API would be just fileSetCaching with no fileGetCaching.
We do have a number of APIs that are sort of platform specific. The pattern is:
So I don't see a problem with adding fileGetCaching
too.
Is your feature request related to a problem? Please describe.
The problem is trying to use modern SSDs to their maximum performance for random I/O (particularly random reads) on normal files (not raw block devices), across multiple cores/capabilities. To do this one needs two things: good async I/O APIs and opening files in a mode that bypasses the page cache. Bypassing the page cache is needed to achieve the maximum IOPS, especially when submitting IO operations from many OS threads at once (so from many RTS capabilities). Good async I/O APIs is out of scope for this feature request.
A similar problem is wanting to do lots of random I/O while optimising the memory of the host system by not polluting the page cache with disk pages that will only be used once (to make best use of the page cache for other files that are used). Again for this use case one wants to open a file in a mode that bypasses or suppresses the page cache.
Another similar problem is wanting to do disk I/O performance benchmarking, and one needs to work around the caching that the OS does: either by dropping caches before a run and avoiding re-reading the same page twice, or avoiding caching altogether.
Describe the solution you'd like
The solution is to allow opening a file in a mode that attempts to suppresses or eliminates the use of disk/page caching for this use of this file. This is a feature that all widely used unix-like OSs support, but it is not standardised by posix:
O_DIRECT
flag toopen(2)
.O_DIRECT
flag toopen(2)
.F_NOCACHE
tofcntl(2)
(link here is to the iPhoneOS man page version because apple removed the online rendered version of the desktop man pages)For platforms that do not support any of these methods, the fallback should simply be to do nothing. The semantics of continuing to do caching is contained within the semantics of no caching (but with different performance characteristics).
Note also that given we will document the semantics as trying to do less/no caching, then we also don't worry about the slight difference in behaviour between OSX and FreeBSD and Linux on the use of the page cache. (OSX will use cached pages for the file if they are present already, while Linux will ignore cached pages even if there are cached pages already. This difference is only relevant for I/O benchmarks, and such programs need to be aware of a lot of platform specific details already).
The feature should be implemented as an extra boolean flag in the
OpenFileFlags
. The name of this field should be descriptive since there is no POSIX name to follow (and different platforms call it different things, so e.g.direct
would be inappropriate). Suggestions includenoCache :: Bool
, since that's simply descriptive (though it happens to be what OSX uses too).Additionally (and this is a matter of API design tastes where reasonable people may differ) one may wish to provide some feature flag that one can test to see if support is present (since no exception will be thrown if it is not present).
The documentation for the feature should also clearly describe that when using this feature, some platforms impose additional constraints on the alignment of file reads/writes and the memory buffers used for reads/writes. Optionally it may also make sense to provide some constants to give the most portable values for disk and memory alignment, or an action to obtain these alignment hints. Feedback on this aspect of the API is welcome.
Describe alternatives you've considered
The alternative is an extension package,
unix-odirect
or something, with just the file open support and nothing else.Additional context
My colleagues and I are happy to implement this feature, including docs etc and shepherd it through PR review.
Related older tickets: #48 and #6. But these propose just using and exposing the non-portable
O_DIRECT
rather than trying to provide portable support.API breaking changes
It would be an extra member of the
OpenFileFlags
record, with a default (normal caching behaviour) in thedefaultFileFlags
value. So this should not break most exising library users which create theOpenFileFlags
record value by overridingdefaultFileFlags
rather than using the raw constructor.Posix compliance
This is a feature available in all major Posix compatible OSs (even windows) but it is not standardised by POSIX.
Relevant excerpts from man pages (linked above):
Linux
open O_DIRECT
:FreeBSD
open O_DIRECT
:OSX
fcntl F_NOCACHE
: