mirage / ocaml-vhd

Read and write .vhd format data
Other
9 stars 20 forks source link

replace O_DIRECT by fadvise() #62

Open nraynaud opened 6 years ago

nraynaud commented 6 years ago

Hi all, I am working on ZFS SR support, and I am constantly butting on O_DIRECT which is not supported by ZFS (the symptom is EINVAL on open()).

This time, it's XAPI calling vhd-tool with --direct hard-coded

And through vhd-tool it comes here. I suspect the intent of --direct is to avoid filling the disk cache in this instance. Both operations are simple in-order complete read and in-order complete write.

I think those cases could be covered by posix_fadvise() and fsync().

I am ready to create a patch, but I would like to know where you think I should make the change before I get to work.

Thanks, Nicolas.

lindig commented 6 years ago

I am not familiar with file I/O enough to understand why O_DIRECT is desirable in the first place. But I want to caution that we need to be careful to discover when certain options are available only on some file systems. We had a problem of trying to use SEEK_DATA in lseek(2) which is not supported on all file system.

djs55 commented 6 years ago

IIRC the original rationale for O_DIRECT was to avoid filling up the disk cache as you suggest. The thinking was that the data being streamed wouldn't be read more than once, so there was no point evicting other cache entries just to cache it.

nraynaud commented 6 years ago

Thanks for your input, after googling a bit, I understand why you dropped the ball on posix and used O_DIRECT, there is basically no reasonable solution to the problem in linux.

I think I will make my changes in vhd-tool, only on the streaming operations, and evict the cache with POSIX_FADV_DONTNEED. My thinking is that when writing a vhd file from the network nobody had a pre-existing need for a cache on a file that was not yet there, it's mostly for importing new VMs. For the read operation they are mostly sequential and nobody really need a random cache either, they are mostly exporting snapshots to the network.

What do you think?