[RFC] `ReadVolatile` and `WriteVolatile` traits for efficient data transfer from/to guest memory

Summary of the PR

With https://github.com/rust-vmm/vm-memory/pull/217, various read_from and write_to methods across vm-memory incurred a performance penalty due to an additional copy of all data read and written. This is due to these functions operating on arbitrary Read/Write implementations, which cannot generally be assumed to respect the volatile semantics of guest memory. Additionally, they required taking slices to guest memory, which was undefined behavior.

This patch series fixes those short-comings by introducing a pair of new traits, ReadVolatile and WriteVolatile which are identical to the standard library's Read and Write traits, with the difference being that they operate on VolatileSlices instead of &mut [u8]. Using these traits will therefore not incur the performance penalty of https://github.com/rust-vmm/vm-memory/pull/217. Additionally, this will make using the vm-memory library easier for users familiar with std::io, as the APIs will be more familiar (in particular, the unintuitive "swapping" of reader/writer src and dst of the old read_from/write_to APIs is resolved).

Please consider this as a not-completely-polished proposal for which I am seeking feedback. The code is based on what we wrote in firecracker to mitigate the performance impact of upgrading to 0.10.0 as a result of #217.

Open Questions

this patch series does not attempt to implement vectored I/O via readv/writev. I think there is definitely a usecase for this, since both firecracker and virtio-fsd use vectored I/O in various places. It would be nice to upstream something similar to https://gitlab.com/virtio-fs/virtiofsd/-/blob/main/src/file_traits.rs to ReadVolatile/WriteVolatile. Maybe someone from that team could chime in?
I'd honestly love to cut down the entire jungle of read and write functions in this crate, but considered that change to be somewhat out of scope here (with this PR only aiming to address the inability to transfer data efficiently between a file descriptor and guest memory). I feel however that deprecating all read/write functions would greatly simplify this library at the cost of users having to call get_slice(). Maybe via turning the Bytes trait into something that represents "self can be turned into a VolatileSlice", as read -> read_volatile, read_slice -> read_exact_volatile, write -> write_volatile and write_slice -> write_all_volatile.

Requirements

Before submitting your PR, please make sure you addressed the following requirements:

[x] All commits in this PR are signed (with git commit -s), and the commit message has max 60 characters for the summary and max 75 characters for each description line.
[x] All added/changed functionality has a corresponding unit/integration test.
[x] All added/changed public-facing functionality has entries in the "Upcoming Release" section of CHANGELOG.md (if no such section exists, please create one).
[x] Any newly added unsafe code is properly documented.

We definitely need some kind of abstraction like this! I am just worrying a bit about the right level to introduce it... This implements the traits on VolatileSlice. I wonder whether it would not be more useful to implement this on a higher level? We only have such crazy number of read/write calls because we lack read/write abstraction on a descriptor chain. Also, if we would implement these traits on a descriptor chain level of abstraction, vectored read and writes could be implemented fairly straight forward.

So: Do we need to have this on this abstraction layer? Or do we only have it here because we lack a higher-level abstraction layer? My feeling is that we will need an abstraction on descriptor-chain level. Mostly for preventing people to use these low-level APIs incorrectly, but it may also allow to solve two problems at once: Zero-overhead access and a lot simpler and less error-prone API. rust-vmm/vm-virtio#33 proposed such an API and may be a natural abstraction level to add these "0-copy" variant traits.

Do we have a use-case that we cannot solve with such a higher-level API?

APIs for transfering data/from guest memory are currently used outside of virtio stacks. For example, linux-loader uses these to copy the kernel to guest memory [1], and firecracker uses them to save guest memory to disk for snapshot purposes [2]. Neither of these could be solved if these traits only lived in vm-virtio. So in that sense, I think it makes sense to have some Read/Write abstractions at the vm-memory level. Having them at VolatileSlice level made sense to me, since it is essentially the guest memory version of &[u8]. Do you see any case where not having the functions from this PR at a high abstraction level will pose a problem?

In a sense, this PR does not add anything new - it simply provides replacements for the APIs that got broken by #217, and deprecates the broken ones.

As for vectored reads and writes, I definitely agree that they are a lot more naturally expressible at the virtio descriptor chain level (we actually have an implementation similar to what you describe in firecracker [3]). I actually was not able to come up with a nice solution here for how to offer them at the vm-memory level without restricting what the virtio level is able to do (for example, if we only offer write_vectored_volatile with &[VolatileSlice] as an argument, then write_vectored_volatile would have to allocate a vector of iovec as VolatileSlice is not ABI compatible with writev. But the caller will already have allocated a vector of VolatileSlice from a descriptor chain, whereas if we directly did vectored writes at the virtio level, only a single allocation would be needed. But a vm-memory-level API that takes &[iovec] is also not very useful because then the caller would be responsible for updating for example the dirty bitmap, and the function would have to be unsafe, so its just libc::writev in a trenchcoat). Hence the "RFC" status of this PR. If there's no way to offer them at the vm-memory level, then we simply won't do it. However, I think it'd be great if we could share some of these primitives between the different virtio stack implementations out there, so if we can express it outside of the virtio-framework, then vm-memory would be the place to do that.

As for why we have all the read/write, I think its mainly convenience. People did not want to convert everything to VolatileSlice before being able to call .read/.write, so everything that could be converted to VolatileSlice instead gained a .read/.write method that does the conversion and then delegates. Over time it just got kinda out of control because the semantics/naming between the various functions got out of sync.

How about also implementing ReadVolatile/WriteVolatile for OwnedFd/BorrowedFd? Otherwise LGTM!

How about also implementing ReadVolatile/WriteVolatile for OwnedFd/BorrowedFd?

Ahh, yes, I forgot about that! Originally I wanted to implement these traits for all T: AsRawFd (which would cover OwnedFd/BorrowedFd, and is slightly more general than T: Into<Ownedfd>). There was some issue of overlapping impls on firecracker's side, but I just found a different solution on our side for that, so let me update this patch to have the general impls (should also make the patch slightly smaller!) I remembered wrongly, implementing for as_raw_fd breaks because "upstream crates may add a new impl of trait std::os::fd::AsRawFd for type &[u8] in future versions", which is nonsense but the orphan rules dont know about that. I'll add the OwnedFd impl then!

How about also implementing ReadVolatile/WriteVolatile for OwnedFd/BorrowedFd? Otherwise LGTM!

Done, please have another look :)

@jiangliu @JonathanWoollett-Light sorry, I had to manually rebase on top of the last changes, could you have one last look please?

@roypat: The Rust Write trait has a flush(). That this does not seem to provide. Also, trying to port the virtio-console from vm-virtio to this would benefit from some kind of Rust Read/Write <> VolatileRead/Write adapter... Would that be in scope of vm-memory? How would you propose to handle the update of vm-memory in vm-virtio?

@roypat: The Rust Write trait has a flush(). That this does not seem to provide. Also, trying to port the virtio-console from vm-virtio to this would benefit from some kind of Rust Read/Write <> VolatileRead/Write adapter... Would that be in scope of vm-memory? How would you propose to handle the update of vm-memory in vm-virtio?

Mhh, for flush, I think we can add that to VolatileWrite and in the implementations just have it forward it to Write's flush implementation. I don't think there's anything volatile-memory-specific that needs to happen there. I actually also noticed that right now this is lacking a VolatileRead/VolatileWrite implementation for Cursor that is needed for some linux-loader tests >.>.

As for adapters for Read/Write, I'd like to avoid that because it comes with all the same problems that the old read_* and write_* family of functions had: Since Read and Write dont know about volatile memory semantics, every operation would need to be buffered in hypervisor memory before it can be forwarded to the traits. For vm-console, I think the only other thing that would be needed to avoid this is a VolatileWrite implementation for std::io::Stdout, which could be generated using the RawFd macros in this PR.

Tbh, I probably was a bit overzealous with deprecating the old functions here, if you want I'll be happy to just revert that part of this PR and make a patch release.

Mhh, for flush, I think we can add that to VolatileWrite and in the implementations just have it forward it to Write's flush implementation. I don't think there's anything volatile-memory-specific that needs to happen there. I actually also noticed that right now this is lacking a VolatileRead/VolatileWrite implementation for Cursor that is needed for some linux-loader tests >.>.

Right... it probably does not make a lot of sense to support these...

As for adapters for Read/Write, I'd like to avoid that because it comes with all the same problems that the old read_* and write_* family of functions had: Since Read and Write dont know about volatile memory semantics, every operation would need to be buffered in hypervisor memory before it can be forwarded to the traits. For vm-console, I think the only other thing that would be needed to avoid this is a VolatileWrite implementation for std::io::Stdout, which could be generated using the RawFd macros in this PR.

Do you maybe want to draft a PR for that? Currently this is blocking the updates to latest vm-memory for a lot of crates.

Tbh, I probably was a bit overzealous with deprecating the old functions here, if you want I'll be happy to just revert that part of this PR and make a patch release.

Well, eventually we want to deprecate them anyway I guess? So we might as well just do it now and fix the code that depends on the old API.

Mhh, for flush, I think we can add that to VolatileWrite and in the implementations just have it forward it to Write's flush implementation. I don't think there's anything volatile-memory-specific that needs to happen there. I actually also noticed that right now this is lacking a VolatileRead/VolatileWrite implementation for Cursor that is needed for some linux-loader tests >.>.

Right... it probably does not make a lot of sense to support these...

I think it might make sense, to avoid people having to do Write + VolatileWrite bounds, maybe?

As for adapters for Read/Write, I'd like to avoid that because it comes with all the same problems that the old read_* and write_* family of functions had: Since Read and Write dont know about volatile memory semantics, every operation would need to be buffered in hypervisor memory before it can be forwarded to the traits. For vm-console, I think the only other thing that would be needed to avoid this is a VolatileWrite implementation for std::io::Stdout, which could be generated using the RawFd macros in this PR.

Do you maybe want to draft a PR for that? Currently this is blocking the updates to latest vm-memory for a lot of crates.

Please have a look at https://github.com/rust-vmm/vm-memory/pull/256 :)

Tbh, I probably was a bit overzealous with deprecating the old functions here, if you want I'll be happy to just revert that part of this PR and make a patch release.

Well, eventually we want to deprecate them anyway I guess? So we might as well just do it now and fix the code that depends on the old API.

Sure!

rust-vmm / vm-memory