rust-vmm / vm-memory

Virtual machine's guest memory crate
Apache License 2.0
305 stars 98 forks source link

Xen/mmapv2 #231

Closed vireshk closed 1 year ago

vireshk commented 1 year ago

Xen allows two memory mapping mechanisms:

Both these mechanisms required additional mapping support, for example an ioctl() needs to be issued (with arguments passed from backend), along with mmap().

Also for grant memory mapping, the mapping can't be done in advance and is done on the fly and so the changes to VolatileSlice.

Add memory mapping support for both these cases, under a new feature "xen".

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org

vireshk commented 1 year ago

Doc tests fail currently as following doesn't work and some definitions aren't found.

[cfg(all(any(test, doctest), feature = "xen"))]

Will it be possible to get some review on this now ? @andreeaflorescu

roypat commented 1 year ago

Mh, before I dive more into the implementation details here, I have a sorta high-level design question: If I understand this implementation correctly, the on-demand mapping nature of the Xen regions happens in VolatileSlice. This means the flow is: get volatile slice -> call some read/write on it -> mmap is called -> memory is accessed -> munmap is called in drop impl of XenMmapSlice.

This means the VolatileSlice has to know about the details of the Xen memory model. I don't think it should. In my mind, a VolatileSlice is a VMM-safe version of a rust-slice. This means to me it should always be pointing to existing memory, and it should be completely agnostic about whether that is standard unix mmap'd memory or some xen grant. So I propose we change the flow to be the following:

VMM knows it wants to access xen memory -> do the xen mmap/ioctl -> get a volatile slice to the now existing memory -> use volatile slice to access it -> drop volatile slice and do the munmap (to ensure we dont unmap anything while volatile slices are alive, set the lifetime of the volatile slice to the lifetime of a reference to the XenMmapSlice)

This way, none of the changes to volatile_slice.rs should be needed, and the xen specific code gets "contained" a bit more.

Please let me know if I am missing something here though, as I have never worked with Xen before and am only going off of information contained in this PR

vireshk commented 1 year ago

Thanks for your comments Patrick. And yes I still need to reply to your UB related comment, I haven't got enough time to get to it yet :)

Mh, before I dive more into the implementation details here, I have a sorta high-level design question: If I understand this implementation correctly, the on-demand mapping nature of the Xen regions happens in VolatileSlice. This means the flow is: get volatile slice -> call some read/write on it -> mmap is called -> memory is accessed -> munmap is called in drop impl of XenMmapSlice.

Right. I think this is going to be required for other use cases as well, like pKVM, as full access to the guest memory may not be always available but Xen is first at least in our list.

From Xen's (and guest kernel's) standpoint, whenever a buffer is passed to backend, a grant mapping is created in the guest to allow the same to be accesssed by the backend running on host. The backend can only map the regions which are allowed by the guest. As soon as the access of the buffers is over and before they are returned to the guest, they should be unmapped by the backend. Also note that the mapping has memory protection in place, that is most of the time you can map memory as READ only or WRITE only and not both.

This means the VolatileSlice has to know about the details of the Xen memory model. I don't think it should. In my mind, a VolatileSlice is a VMM-safe version of a rust-slice. This means to me it should always be pointing to existing memory, and it should be completely agnostic about whether that is standard unix mmap'd memory or some xen grant. So I propose we change the flow to be the following:

VMM knows it wants to access xen memory -> do the xen mmap/ioctl -> get a volatile slice to the now existing memory -> use volatile slice to access it -> drop volatile slice and do the munmap (to ensure we dont unmap anything while volatile slices are alive, set the lifetime of the volatile slice to the lifetime of a reference to the XenMmapSlice)

The problem here is that when the slice is created, we aren't sure if the user is going to read or write to the slice and so mapping can't be done there. It is only when we try to access the memory within the slice, we get to know about it.

Also about "VMM knows it wants to access xen memory", this is a bit different in our case. The backends run as standalone applications and they are mostly unaware of what hypervisor they are running with, KVM or XEN. And then they are supposed to be hypervisor agnostic, no special code there for Xen.

Right now the volatile slice can be used by vm-memory crate itself, where it just returns the data without the user knowing about the VolatileSlice at all. Or, like in case of Virtiofsd, the user (backend) can take a big Volatile slice and process it as it wishes. These are all very complicated usage patterns and the only single place solution that I could think of was to put the mapping logic in VolatileSlice, as that is central to the entire thing.

This way, none of the changes to volatile_slice.rs should be needed, and the xen specific code gets "contained" a bit more.

Please let me know if I am missing something here though, as I have never worked with Xen before and am only going off of information contained in this PR

I hope I was able to clarify this a bit. Thanks.

roypat commented 1 year ago

Thanks for your comments Patrick. And yes I still need to reply to your UB related comment, I haven't got enough time to get to it yet :)

No worries, I'm not exactly fast with following up on things myself right >.> everything's kinda hectic :(

Mh, before I dive more into the implementation details here, I have a sorta high-level design question: If I understand this implementation correctly, the on-demand mapping nature of the Xen regions happens in VolatileSlice. This means the flow is: get volatile slice -> call some read/write on it -> mmap is called -> memory is accessed -> munmap is called in drop impl of XenMmapSlice.

Right. I think this is going to be required for other use cases as well, like pKVM, as full access to the guest memory may not be always available but Xen is first at least in our list.

From Xen's (and guest kernel's) standpoint, whenever a buffer is passed to backend, a grant mapping is created in the guest to allow the same to be accesssed by the backend running on host. The backend can only map the regions which are allowed by the guest. As soon as the access of the buffers is over and before they are returned to the guest, they should be unmapped by the backend. Also note that the mapping has memory protection in place, that is most of the time you can map memory as READ only or WRITE only and not both.

Mh, so the guest creates the grant mapping, which I presume sets up the memory protection, too? And then the flow would be "guest creates grant mapping/memory protection" -> "guest passes details of mapping (addr/size?) to backend" -> "backend mmaps in the region to do its accesses"? Sorry for asking so many questions >.>

This means the VolatileSlice has to know about the details of the Xen memory model. I don't think it should. In my mind, a VolatileSlice is a VMM-safe version of a rust-slice. This means to me it should always be pointing to existing memory, and it should be completely agnostic about whether that is standard unix mmap'd memory or some xen grant. So I propose we change the flow to be the following: VMM knows it wants to access xen memory -> do the xen mmap/ioctl -> get a volatile slice to the now existing memory -> use volatile slice to access it -> drop volatile slice and do the munmap (to ensure we dont unmap anything while volatile slices are alive, set the lifetime of the volatile slice to the lifetime of a reference to the XenMmapSlice)

The problem here is that when the slice is created, we aren't sure if the user is going to read or write to the slice and so mapping can't be done there. It is only when we try to access the memory within the slice, we get to know about it.

Mh, why doesn't it? If the memory protection is done in the guest, then the backend should know ahead of time whether it can read or write to the memory region, so it could use the information to map it ahead of time. Because if the user backend just decides whether it can access via read/write on the fly (and if the guest doesn't enforce anything), then we might as well just mmap it as read/write ahead-of-time because there'd be no enforcement of "read/write-only" in place anyway :thinking:.

Also, isn't doing it on-the-fly like this really inefficient? Won't we end up doing mmap/munmap on every single call to a volatile slice method (which I presume we'll have many of)

Also about "VMM knows it wants to access xen memory", this is a bit different in our case. The backends run as standalone applications and they are mostly unaware of what hypervisor they are running with, KVM or XEN. And then they are supposed to be hypervisor agnostic, no special code there for Xen.

gotcha, makes sense.

Right now the volatile slice can be used by vm-memory crate itself, where it just returns the data without the user knowing about the VolatileSlice at all. Or, like in case of Virtiofsd, the user (backend) can take a big Volatile slice and process it as it wishes. These are all very complicated usage patterns and the only single place solution that I could think of was to put the mapping logic in VolatileSlice, as that is central to the entire thing.

Mh, I'm still not convinced it has to be in VolatileSlice :sweat_smile:

This way, none of the changes to volatile_slice.rs should be needed, and the xen specific code gets "contained" a bit more. Please let me know if I am missing something here though, as I have never worked with Xen before and am only going off of information contained in this PR

I hope I was able to clarify this a bit. Thanks.

Yes, thank you so much for taking the time to explain this to me!

vireshk commented 1 year ago

Mh, so the guest creates the grant mapping, which I presume sets up the memory protection, too? And then the flow would be "guest creates grant mapping/memory protection" -> "guest passes details of mapping (addr/size?) to backend" -> "backend mmaps in the region to do its accesses"?

Right.

Mh, why doesn't it? If the memory protection is done in the guest, then the backend should know ahead of time whether it can read or write to the memory region, so it could use the information to map it ahead of time. Because if the user backend just decides whether it can access via read/write on the fly (and if the guest doesn't enforce anything), then we might as well just mmap it as read/write ahead-of-time because there'd be no enforcement of "read/write-only" in place anyway thinking.

When I said "when the slice is created, we aren't sure ...", it meant from vm-memory's perspective. Right now the API provided by vm-memory crate, lets a user (like backend) to get part of guest memory as slice (Volatile memory) and then accesses it. The vm-memory crate doesn't have any information about read/write permissions for the slice.

Also we won't wanna add special code to do the extra mappings to the backends, but make this all work magically for the user.

There are few complex cases as well, like Virtiofsd. It takes a large area as a slice I guess and then does a lot of operations on it by itself and doesn't use the APIs provided by vm-memory crate. I am not sure if it does both read and write to separate areas after a volatile slice is created.

Also, isn't doing it on-the-fly like this really inefficient? Won't we end up doing mmap/munmap on every single call to a volatile slice method (which I presume we'll have many of)

Yes, we are worried about that as well but that was decided by Xen's upstream community and it is all merged in kernel now. I guess with guest memory protection such inefficiencies will come in anyway. Not sure how pKvm guys are planning this.

roypat commented 1 year ago

Mh, why doesn't it? If the memory protection is done in the guest, then the backend should know ahead of time whether it can read or write to the memory region, so it could use the information to map it ahead of time. Because if the user backend just decides whether it can access via read/write on the fly (and if the guest doesn't enforce anything), then we might as well just mmap it as read/write ahead-of-time because there'd be no enforcement of "read/write-only" in place anyway thinking.

When I said "when the slice is created, we aren't sure ...", it meant from vm-memory's perspective. Right now the API provided by vm-memory crate, lets a user (like backend) to get part of guest memory as slice (Volatile memory) and then accesses it. The vm-memory crate doesn't have any information about read/write permissions for the slice.

Also we won't wanna add special code to do the extra mappings to the backends, but make this all work magically for the user.

There are few complex cases as well, like Virtiofsd. It takes a large area as a slice I guess and then does a lot of operations on it by itself and doesn't use the APIs provided by vm-memory crate. I am not sure if it does both read and write to separate areas after a volatile slice is created.

Mh, this brings up a separate point that I missed so far. This PR makes the as_ptr() method private, but a lot of the ecosystem depends on this function (for example to pass pointers to guest memory to ioctls or normal syscalls). Removing it breaks a lot of dependencies without providing a workaround (for example, the very virtiofsd operation you describe wouldn't work anymore, since it would now be forced to use vm-memory functions). I understand why you did this (because it allows the user to circumvent the mapping mechanisms for Xen), but really that just ties back to my original concern: Conceptually, when we have a VolatileSlice, the memory it points to needs to already be mapped.

I understand not wanting to have to explicitly do the mapping in the backend, but I've had a look at the accompanying PR in the vhost crate, and it looks like even with this implementation, you still need Xen specific code there, too. So unless I misunderstand something (which I very much might do - both xen and vhost user are not familiar to me), just doing all the required mapping in the backend would simplify the code here (with vm-memory providing the required utility functions for creating the mappings in the mmap_xen.rs files, and doing all the heavy lifting), while not changing the status quo for the backend much. Surely we must know what sections of memory we are granted read and which sections of memory we are granted write access to, right?

vireshk commented 1 year ago

Mh, this brings up a separate point that I missed so far. This PR makes the as_ptr() method private, but a lot of the ecosystem depends on this function (for example to pass pointers to guest memory to ioctls or normal syscalls). Removing it breaks a lot of dependencies without providing a workaround (for example, the very virtiofsd operation you describe wouldn't work anymore, since it would now be forced to use vm-memory functions). I understand why you did this (because it allows the user to circumvent the mapping mechanisms for Xen), but really that just ties back to my original concern: Conceptually, when we have a VolatileSlice, the memory it points to needs to already be mapped.

Key takeaways from yesterday's discussion with Patrick (for others):

vireshk commented 1 year ago
  • Viresh agreed that we can't really remove as_ptr() and it is still required by many, but we need a separate implementation for Xen at least. Will try something for that in the next iteration.

@roypat

I have preserved the existing functionality of as_ptr() for non-xen consumers and provided a different API for everyone 01bc496f2848.

Ablu commented 1 year ago

Should we maybe mark the as_ptr() as deprecated? Then we can steer people to use the Xen-compatible API without breaking all the existing code.

vireshk commented 1 year ago

Should we maybe mark the as_ptr() as deprecated? Then we can steer people to use the Xen-compatible API without breaking all the existing code.

@roypat ?

vireshk commented 1 year ago

Superseeded by https://github.com/rust-vmm/vm-memory/pull/241