Add replaceable-mmapped bitmap support

germag commented 10 months ago

Summary of the PR

The vhost user protocol supports live migration, and during live migration, the vhost-user frontend needs to track the modifications the vhost-user backend makes to the memory mapped regions, marking the dirty pages in a log (i.e., a bitmap).

If the backend has the VHOST_USER_PROTOCOL_F_LOG_SHMFD protocol feature, it will receive the VHOST_USER_SET_LOG_BASE message with a file descriptor of the dirty-pages log memory region. This log covers all known guest addresses, and must be manipulated atomically.

For further info please see https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#migration

This PR adds support for creating an atomic-memory-mapped bitmap, and being able to replace it in runtime. The vhost user protocol does not specify whether the previous bitmap is still active after replying to the VHOST_USER_SET_LOG_BASE message, so we use an RwLock to be sure that the in-flight requests are using the new bitmap after the message reply.

For further info please see https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#migration

Note to reviewers

This code is part of the effort to support live migration in virtiofsd and is currently being used in a version of virtiofsd that supports migration: https://gitlab.com/virtiofsd-live-migration

germag commented 10 months ago

/cc @XanClic

germag commented 7 months ago

This new version adds the bitmap support that was originally intended to be part of for the vm-memory crate: https://github.com/rust-vmm/vm-memory/pull/264

germag commented 7 months ago

@stefano-garzarella @Ablu

XanClic commented 7 months ago

How is this supposed to be used? Compared to the previous proposal (in vm-memory), if I replace vm_memory::bitmap::BitmapRegion by vhost_user_backend::BitmapMmapRegion, it works – but that type isn’t public (and neither is the bitmap module), so it doesn’t work as-is with this PR.

What does compile and apparently also kind of works is to continue to use () as the bitmap, but I don’t think that’s really intentional: The BitmapReplace implementation for that is a no-op, so the AtomicBitmapMmap objects created by set_log_base() end up just being dropped. And that seems a bit dangerous, that nothing complains about having no bitmap during migration.

It’s questionable to me whether () can really implement BitmapReplace, when actually it won’t do anything. I also don’t quite understand why set_log_base() creates AtomicBitmapMmap objects specifically, and why BitmapReplace requires being able to replace any bitmap that implements it by specifically an AtomicBitmapMmap.

It feels like ideally we’d have a trait (maybe LogBitmap) with some inner type that allows for set_log_base() support, i.e. the trait provides one method for replacing (with the inner type, not a concrete one), and one for creating the inner type from the information we get through SET_LOG_BASE. AtomicBitmapMmap would then implement that trait, but () would not. set_log_base() is only fully implemented if the bitmap type implements LogBitmap (returns an error otherwise), and get_protocol_features() should probably automatically set LOG_SHMFD if that trait is implemented.

The problem with “is trait implemented or not” is that it’s not so easy in Rust and would require a whole new line of VhostUserBackendReqHandler traits, I believe. However, what can work would be to do the test at runtime, e.g. with an associated const:

pub trait LogBitmap: Sized {
    fn new<R: GuestMemoryRegion>(region: &R, logmem: MmapLogReg) -> io::Result<Self>;
}

pub trait LogBitmapRegion: Bitmap {
    const CAN_LOG_DIRTY_WRITES: bool; // Can you really use this bitmap type for migration?
    type Inner: LogBitmap;

    fn replace(&self, bitmap: Self::Inner);
}

And then this can be implemented for (), setting the const to false:

impl LogBitmap for () {                                                                                                                                                         
    fn new<R: GuestMemoryRegion>(_region: &R, _logmem: MmapLogReg) -> io::Result<Self> {
        Err(io::Error::new(
            io::ErrorKind::Unsupported,
            "Back-end does not support logging memory modification, cannot migrate",
        ))
    }
}

impl LogBitmapRegion for () {
    const CAN_LOG_DIRTY_WRITES: bool = false;
    type Inner = ();

    fn replace(&self, _bitmap: ()) {}
}

(And you just impl LogBitmap for AtomicBitmapMmap and impl LogBitmapRegion for BitmapMmapRegion.)

set_log_base() may want to double-check <T::Bitmap as LogBitmapRegion>::CAN_LOG_DIRTY_WRITES before doing anything, but may also just stay as-is, relying on <T::Bitmap as LogBitmapRegion>::Inner::new(region, logmem) to return an error if the bitmap doesn’t really work.

get_features() may want to auto-set VhostUserVirtioFeatures::LOG_ALL.bits() if CAN_LOG_DIRTY_WRITES is true, same for get_protocol_features() and VhostUserProtocolFeatures::LOG_SHMFD. (Though set_features() will need to take care to filter out LOG_ALL from the bits sent to the back-end implementation if it was auto-set.)

Alternatively, the quick fix would be to just make BitmapMmapRegion public and have ()::replace() return a runtime error that prevents migration, because having no real bitmap seems to forbid migration.

germag commented 7 months ago

v2:

Fix module access
Add assoc. type to prevent () to be used as a replaceable bitmap

germag commented 7 months ago

v3:

Address Stefano feedback
- Add TODO comments
- Add copy right notice

stefano-garzarella commented 7 months ago

LGTM. Please, can you add an entry on both vhost and vhost-user-backend changelog for this new feature.

Nit: maybe we can remove vhost-user-backend: prefix from the last commit where we touch both crates, and add it to the first commit.

germag commented 7 months ago

LGTM. Please, can you add an entry on both vhost and vhost-user-backend changelog for this new feature.

I knew I was forgetting something, sorry, yes I'll do it

Nit: maybe we can remove vhost-user-backend: prefix from the last commit where we touch both crates, and add it to the first commit.

Ok

germag commented 7 months ago

v4:

Changelog entries
Fix commit message
Check for 0-sized mem regions
Include inner error on io::Error

germag commented 7 months ago

v5:

wrap MmapLogReg on Arc to avoid multiple mmaps

XanClic commented 7 months ago

Looks good to me, thanks! (Can’t resolve my discussions again…)

(Only nit, but I won’t add it as a thread, because I don‘t really mind and I wouldn’t be able to resolve the thread myself: Where the tests currently create logmem0 and logmem1, they could just create a single one and Arc::clone() it now.)

germag commented 7 months ago

Looks good to me, thanks! (Can’t resolve my discussions again…)

(Only nit, but I won’t add it as a thread, because I don‘t really mind and I wouldn’t be able to resolve the thread myself: Where the tests currently create logmem0 and logmem1, they could just create a single one and Arc::clone() it now.)

Ops, true I'll fix it

XanClic commented 7 months ago

Thanks again! (Still looks good to me 😄)

germag commented 7 months ago

v6:

(Trying to) Explain the bitmap structure
Add comments in tests
Replace size arithmetic with literals on tests

germag commented 7 months ago

Rebase

XanClic commented 7 months ago

Looks good to me still.

Ablu commented 7 months ago

Ok. I played a bit with it and I think I mostly got confused by the slightly generic names. It exactly does what I thought that it does not do, I just mixed up two two types :). So I am happy with the overall concept!

But while playing with it I came up with a few further nitpicks (sorry :P):

By moving the construction of the inner type into the impl of our BitmapReplace then we could avoid the public MemRegionBitmap trait and make BitmapMmapRegion (apparently?) and AtomicBitmapMmap private: https://github.com/Ablu/vhost/commit/3f790d91dc208ea6b73a76f0bdfa0aef33c23b08

Any thoughts on that (ignoring my naming changes there)? The error handling may need to be reconsidered. But if that would then enable us to mark all of that private then that may be worth it?

Regarding naming, we have a few new public symbols:

BitmapReplace
BitmapMmapRegion
MemRegionBitmap
AtomicBitmapMmap
MmapLogReg

By just reading the names I find it hard to figure out what they do - especially when also dealing with the confusing Bitmap, WithBitmapSlice, BitmapSlice types of vm-memory.

BitmapReplace really models a bitmap that may start logging in the future? We do not replace the well-known Bitmap type. Our internal type does not implement Bitmap in the end... I could not come up with particulary great names that describe the actual behaviour. But we already name quite a few things "Vhost[UserBackend]" so maybe we could just go with VhostUserBackendBitmap? Then we could also extend it once we need other features in there. replace would have to be turned into something like start_logging of course. But IMHO it would just make it more expressive?

BitmapMmapRegion could get removed with my hacks above.

MemRegionBitmap if we can make this private then I do not care too much...

AtomicBitmapMmap confused me a bit since it does not really implement Bitmap. It really just is our inner part of MemRegionBitmap.

MmapLogReg will have to stay public. But it is unlikely to be seen by the ordinary user. So I do not care too much.

germag commented 7 months ago

Ok. I played a bit with it and I think I mostly got confused by the slightly generic names. It exactly does what I thought that it does not do, I just mixed up two two types :). So I am happy with the overall concept!

But while playing with it I came up with a few further nitpicks (sorry :P):

By moving the construction of the inner type into the impl of our BitmapReplace then we could avoid the public MemRegionBitmap trait and make BitmapMmapRegion (apparently?) and AtomicBitmapMmap private: Ablu@3f790d9

Any thoughts on that (ignoring my naming changes there)? The error handling may need to be reconsidered. But if that would then enable us to mark all of that private then that may be worth it?

That will make it possible to use () as log backing memory that is worse, IMO.

Regarding naming, we have a few new public symbols:
* `BitmapReplace`

* `BitmapMmapRegion`

* `MemRegionBitmap`

* `AtomicBitmapMmap`

* `MmapLogReg`
By just reading the names I find it hard to figure out what they do - especially when also dealing with the confusing Bitmap, WithBitmapSlice, BitmapSlice types of vm-memory.

I also plan to change some of these traits in vm-memory because are quite inflexible, or do more thing than necessary, and some could be made useless.

BitmapReplace really models a bitmap that may start logging in the future? We do not replace the well-known Bitmap type. Our internal type does not implement Bitmap in the end... I could not come up with particulary great names that describe the actual behaviour. But we already name quite a few things "Vhost[UserBackend]" so maybe we could just go with VhostUserBackendBitmap? Then we could also extend it once we need other features in there. replace would have to be turned into something like start_logging of course. But IMHO it would just make it more expressive?

I'm terrible at choosing names, but I think start_logging is not accurate either, because the bitmap doesn't start logging at that moment, I still think new and replace is better, since we are just constructing the thing. Also, prefix something with VhostUser... inside a vhost-user-specific crate feels an unnecessary duplication. BackendBitmap is also no accurate either, is not really just the backend bitmap.

As I said before, replacing the AtomicBitmapMmap in BitmapMmapRegion is just an artifact to prevent users to use () (also the use of an associated type is to deal with the current API)

About the internal bitmap implementing Bitmap I don't think that is important, for instance, we cannot implement it safely because there is no way to implement a valid slice_at() for the internal bitmap (IMO a design error in vm-memory). And the idea is to change vm-memory so we can reuse AtomicBitmap so AtomicBitmapMmap will be removed.

BitmapMmapRegion could get removed with my hacks above.

MemRegionBitmap if we can make this private then I do not care too much...

AtomicBitmapMmap confused me a bit since it does not really implement Bitmap. It really just is our inner part of MemRegionBitmap.

MmapLogReg will have to stay public. But it is unlikely to be seen by the ordinary user. So I do not care too much.

I'm not happy making these types public, and I plan to make the whole vhost-user bitmap implementation private and transparent for the user (and removing the () impl), but to do that I need to change too many things, so I prefer to do it in steps, but delivering the feature first so we can work on top of it. And I think even if this PR is not ideal, at least the types are safe to expose in the public API.

Ablu commented 7 months ago

That will make it possible to use () as log backing memory that is worse, IMO.

I do not understand the difference here. The existing code also implements BitmapReplace for (). So I do not think my hacks/proposal changes anything in that regard?

germag commented 7 months ago

That will make it possible to use () as log backing memory that is worse, IMO.

I do not understand the difference here. The existing code also implements BitmapReplace for (). So I do not think my hacks/proposal changes anything in that regard?

The difference is ()::new() will fail, so the front-end will be notified. If we just use a non-failing replace() (this is required), panicking is our only option.

I missed one extra problem with your proposal, replace()must not fail, otherwise we cannot safely replace a bitmap that is currently in use (we do not always go from an empty bitmap to a functional bitmap, we can got from a functional bitmap to another functional bitmap). Replacing the whole memory (like in add/remove_mem_region()) will not work due to how ArcSwap works, basically the in-flight requests (those that were sent prior to the switch but have not yet been completed) will continue to use the old bitmap instead of the new one.

XanClic commented 7 months ago

By moving the construction of the inner type into the impl of our BitmapReplace then we could avoid the public MemRegionBitmap trait and make BitmapMmapRegion (apparently?) and AtomicBitmapMmap private: Ablu@3f790d9

I don’t know how important unifying new() + replace() into a single new function is to your proposal, but it was my impression that separating both was important for error handling: Creating the bitmaps may fail (e.g. because mmap() may fail), but replacing must not. We don’t want to end up in a state where some bitmaps for some regions have been replaced and others have not, and this can easily be done by separating the potentially failing new() loop from the infallible replace() loop.

A unified function could exist if it returned the old bitmap on success, so we could collect and reinstate all prior bitmaps if replacing any bitmap failed. But then we’d still need a separate (infallible) replace() function that can reinstate those old bitmaps, so it sounds like basically the same thing.

If it’s just about moving functions into the BitmapReplace trait, it’d be possible to move MemRegionBitmap::new() into it as fn new_inner<R: GuestMemoryRegion>(region: &R, logmem: Arc<MmapLogReg>) -> io::Result<Self::InnerBitmap>. Seems a bit ugly to me, but, well.

Ablu commented 7 months ago

The difference is ()::new() will fail, so the front-end will be notified. If we just use a non-failing replace() (this is required), panicking is our only option.

Sure, with the constructor trait being gone ()::new() no longer exists. I hoped we can just make the "replace" failable...

I missed one extra problem with your proposal, replace()must not fail, otherwise we cannot safely replace a bitmap that is currently in use (we do not always go from an empty bitmap to a functional bitmap, we can got from a functional bitmap to another functional bitmap). Replacing the whole memory (like in add/remove_mem_region()) will not work due to how ArcSwap works, basically the in-flight requests (those that were sent prior to the switch but have not yet been completed) will continue to use the old bitmap instead of the new one.

I indeed glossed over that since it looked to me like it would mostly fail in cases where pages are set up in fairly weird ways. The spec also does not seem to mandate a particular behaviour here :/. But if this is a major concern then sure, that won't work the way I hacked things up.

I don’t know how important unifying new() + replace() into a single new function is to your proposal,

It is not important to me at all. My first priority was reducing the number of public traits that become user-visible. I just went one step further and also tried to make the inner type private. rust-vmm is already pretty complex with all the abstraction layers where is is impossible to tell which are really needed and used. "Too complex API" is the main complaint that I heard (and felt) in general.

Now... While thinking about options, I skimmed the spec again:

To start/stop logging of data/used ring writes, the front-end may send messages VHOST_USER_SET_FEATURES with VHOST_F_LOG_ALL and VHOST_USER_SET_VRING_ADDR with VHOST_VRING_F_LOG in ring’s flags set to 1/0, respectively.

All the modifications to memory pointed by vring “descriptor” should be marked. Modifications to “used” vring should be marked if VHOST_VRING_F_LOG is part of ring’s flags.

This sounds to me like SET_LOG_BASE should only set the log, but SET_FEATURES should enable the logging? Or do I misread things here? Is there a reason why we do it on setting the log here?

If setting the log and starting logging are two separate commands, then I think a single trait that provides the contract for that (set_log, enable/disable_log) would sound natural to me?

germag commented 7 months ago

This sounds to me like SET_LOG_BASE should only set the log, but SET_FEATURES should enable the logging? Or do I misread things here? Is there a reason why we do it on setting the log here?

I'm not sure if I understood this, but in SET_LOG_BASE we are just constructing the log, as I said before, we are not starting it

And also what qemu (and others) do and what the spec says are two different things :)

germag commented 7 months ago

It is not important to me at all. My first priority was reducing the number of public traits that become user-visible. I just went one step further and also tried to make the inner type private. rust-vmm is already pretty complex with all the abstraction layers where is is impossible to tell which are really needed and used. "Too complex API" is the main complaint that I heard (and felt) in general.

I agree with the "Too complex API", so I plan to work on that in the near future. But that requires more profound changes and documentation

XanClic commented 7 months ago

It is not important to me at all. My first priority was reducing the number of public traits that become user-visible. I just went one step further and also tried to make the inner type private. rust-vmm is already pretty complex with all the abstraction layers where is is impossible to tell which are really needed and used. "Too complex API" is the main complaint that I heard (and felt) in general.

Reducing the number of traits should then be possible by moving MemRegionBitmap::new() into BitmapReplace() as I’ve described, but keeping the general model as-is.

This sounds to me like SET_LOG_BASE should only set the log, but SET_FEATURES should enable the logging? Or do I misread things here? Is there a reason why we do it on setting the log here?

I'm not sure if I understood this, but in SET_LOG_BASE we are just constructing the log, as I said before, we are not starting it

We aren’t starting it anywhere, then, though. Attaching the bitmaps to the memory region does begin logging immediately, if I’m not mistaken, because there is no way to mark bitmaps as disabled.

I’d say there’s no harm in starting logging before SET_FEATURES, because there’s generally no real harm in marking pages as dirty when they aren’t really dirty. If we wanted to do this right, then we’d either have to have a way of marking bitmaps as disabled (i.e. actually disabling logging), or we’d need to delay attaching (replace()) the bitmaps until SET_FEATURES, but that sounds error-prone (we’d have to be able to handle the case where the memory regions have been modified in the meantime, even if that case is never supposed to happen).

A related question is: What happens on a cancelled migration (i.e. SET_FEATURES called with F_LOG cleared)? I think as per spec we can’t actually remove the bitmaps, because the front-end might set F_LOG again without calling SET_LOG_BASE a second time. So it sounds like we’d have to have a way to disable bitmaps, actually, I think.

germag commented 7 months ago

We aren’t starting it anywhere, then, though. Attaching the bitmaps to the memory region does begin logging immediately, if I’m not mistaken, because there is no way to mark bitmaps as disabled.

Yep, it's also how vm-memory is structured, there is no way to enable/disable marking pages. So, technically speaking, the logging "starts" as soon the guest memory is created

A related question is: What happens on a cancelled migration (i.e. SET_FEATURES called with F_LOG cleared)? I think as per spec we can’t actually remove the bitmaps, because the front-end might set F_LOG again without calling SET_LOG_BASE a second time. So it sounds like we’d have to have a way to disable bitmaps, actually, I think.

We will need to disable the InnerBitmap, while keeping the old one just in case we don't receive a new SET_LOG_BASE, because the vm-memory types will continue calling mark_dirty(), right now it seems unnecessary complexity to me since as you said there is no harm in setting dirty bits in the log.

germag commented 7 months ago

A related question is: What happens on a cancelled migration (i.e. _SETFEATURES called with _FLOG cleared)?

taking a quick look at qemu's code, I don't see any place where F_LOG_ALL is cleared in any case (there is something in vdpa but is unrelated)

XanClic commented 7 months ago

A related question is: What happens on a cancelled migration (i.e. SET_FEATURES called with F_LOG cleared)? I think as per spec we can’t actually remove the bitmaps, because the front-end might set F_LOG again without calling SET_LOG_BASE a second time. So it sounds like we’d have to have a way to disable bitmaps, actually, I think.

We will need to disable the InnerBitmap, while keeping the old one just in case we don't receive a new SET_LOG_BASE, because the vm-memory types will continue calling mark_dirty(), right now it seems unnecessary complexity to me since as you said there is no harm in setting dirty bits in the log.

Yes, there isn’t any real harm, but it will cost a bit of performance after a cancelled migration, so would be nice to look into at some point at least.

I think if desired we can introduce bitmap disabling locally here in the vhost-user-backend crate without modifications to vm-memory, just by adding a bool/AtomicBool to either AtomicBitmapMmap or MmapLogReg. (bool might give better run-time performance, but would require a new replace() cycle.)

XanClic commented 7 months ago

A related question is: What happens on a cancelled migration (i.e. _SETFEATURES called with _FLOG cleared)?

taking a quick look at qemu's code, I don't see any place where F_LOG_ALL is cleared in any case (there is something in vdpa but is unrelated)

vhost_log_global_stop() calls vhost_migration_log(..., false), which will have vhost_dev_set_features() clear VHOST_F_LOG_ALL. vhost_log_global_stop() is invoked from memory_global_dirty_log_do_stop() as a log_global_stop memory listener.

The current draft for virtiofsd does use clearing of F_LOG_ALL to recognize a cancelled migration. Note that qemu will call memory_global_dirty_log_do_stop() only when the VM is running (see its caller memory_global_dirty_log_stop()), i.e. definitely only when the migration has been cancelled, never when it’s just been completed.

germag commented 7 months ago

I think if desired we can introduce bitmap disabling locally here in the vhost-user-backend crate without modifications to vm-memory, just by adding a bool/AtomicBool to either AtomicBitmapMmap or MmapLogReg. (bool might give better run-time performance, but would require a new replace() cycle.)

I still think is unnecessary right now, and I'm not sure about the performance gain either, we will need to check an AtomicBool even if we don't have a InnerBitmap.

Ablu commented 7 months ago

I'm not sure if I understood this, but in SET_LOG_BASE we are just constructing the log, as I said before, we are not starting it

Ah... My brain assumed that that's what the replace would trigger :). But it certainly makes a lot more sense now!

Do you already have actual starting of the logging implemented somewhere? I tried to look around in https://gitlab.com/virtiofsd-live-migration, but could not immediately spot it there. Would love to see how that works out API wise.

Overall, it would have probably helped me if there would have been a more concrete mention that this is only part of the solution and other bits still have to follow (with a rough overview of what those pieces are).

XanClic commented 7 months ago

introduce

I still think is unnecessary right now, and I'm not sure about the performance gain either, we will need to check an AtomicBool even if we don't have a InnerBitmap.

Sure. mark_dirty() does get a range, and its current implementation dirties every single bit separately, but I assume the range is a single page most of the time, so it’s just a single operation. OTOH, I’m sure that fetching an atomic bool is quicker than performing an atomic OR, and as I said, we could get away without using an atomic, too (let mut disabled_bmap = region().bitmap().inner().clone(); disabled_bmap.disabled = true; region().bitmap().replace(disabled_bmap);).

germag commented 7 months ago

o you already have actual starting of the logging implemented somewhere? I tried to look around in https://gitlab.com/virtiofsd-live-migration, but could not immediately spot it there. Would love to see how that works out API wise.

No, because that is part of vm-memory, it expects the bitmap exists at compile time, as I said before

Overall, it would have probably helped me if there would have been a more concrete mention that this is only part of the solution and other bits still have to follow (with a rough overview of what those pieces are).

I think I discussed it in the previous reviews

germag commented 7 months ago

introduce

I still think is unnecessary right now, and I'm not sure about the performance gain either, we will need to check an AtomicBool even if we don't have a InnerBitmap.

Sure. mark_dirty() does get a range, and its current implementation dirties every single bit separately, but I assume the range is a single page most of the time, so it’s just a single operation. OTOH, I’m sure that fetching an atomic bool is quicker than performing an atomic OR,

Probably, but currently in the most common case (i.e., without a bitmap) we don't access any of that

and as I said, we could get away without using an atomic, too (let mut disabled_bmap = region().bitmap().inner().clone(); disabled_bmap.disabled = true; region().bitmap().replace(disabled_bmap);).

Ok, I will implement the AtomicBool approach, otherwise I think we also need to take into account that we can receive a SET_FEATURE unrelated to live migration

XanClic commented 7 months ago

I'm not sure if I understood this, but in SET_LOG_BASE we are just constructing the log, as I said before, we are not starting it

Ah... My brain assumed that that's what the replace would trigger :).

As far as I understand, it effectively does.

The vm-memory crate calls mark_dirty() for every write access, basically, so from one perspective, logging is active all the time, it’s just that mark_dirty() is effectively a no-op as long as the bitmaps don’t have an mmap-ed area to back them.

Specifically, <BitmapMmapRegion as Bitmap>::mark_dirty() is a no-op if BitmapMmapRegion.inner is None, but if let Some(bmap) = BitmapMmapRegion.inner, bmap.mark_dirty() will immediately start setting dirty bits in the mmap-ed area, i.e. as soon as we have called replace().

Do you already have actual starting of the logging implemented somewhere? I tried to look around in https://gitlab.com/virtiofsd-live-migration, but could not immediately spot it there.

Because so far I understood that attaching the bitmap to the mem regions would immediately start logging, there is nothing that virtiofsd does except to start aggregating its own internal state.

germag commented 7 months ago

The current draft for virtiofsd does use clearing of _F_LOGALL to recognize a cancelled migration.

but that is only for the internal state, keeping marking dirty pages will not interfere with it So, I'm still not sure if enabling/disabling the bitmap is it is worth doing it

XanClic commented 7 months ago

introduce

I still think is unnecessary right now, and I'm not sure about the performance gain either, we will need to check an AtomicBool even if we don't have a InnerBitmap.

Sure. mark_dirty() does get a range, and its current implementation dirties every single bit separately, but I assume the range is a single page most of the time, so it’s just a single operation. OTOH, I’m sure that fetching an atomic bool is quicker than performing an atomic OR,

Probably, but currently in the most common case (i.e., without a bitmap) we don't access any of that

I was specifically talking about the cancelled migration case, and I did agree (“Sure.”) that we don’t necessarily need to address it now. If you want to implement disabling now, fine with me; if you don’t, also fine with me. I only said I believe this is something that should be looked into at some point, not necessarily now.

and as I said, we could get away without using an atomic, too (let mut disabled_bmap = region().bitmap().inner().clone(); disabled_bmap.disabled = true; region().bitmap().replace(disabled_bmap);).

Ok, I will implement the AtomicBool approach, otherwise I think we also need to take into account that we can receive a SET_FEATURE unrelated to live migration

I don’t understand. It’s easy to find out whether F_LOG_ALL is being toggled by a SET_FEATURE call or not: We can just store in VhostUserHandler whether logging is currently done or not, and find out whether the bit is being toggled this way; or we can even just check whether region().bitmap().inner().disabled matches features & VhostUserVirtioFeatures::LOG_ALL.bits() == 0.

The current draft for virtiofsd does use clearing of F_LOG_ALL to recognize a cancelled migration.

but that is only for the internal state, keeping marking dirty pages will not interfere with it

I only gave this as an example to show that clearing F_LOG_ALL does occur.

To be entirely clear, I still do not have any objection to this PR as-is.

But from what I understand, with the PR as-is, logging does start when SET_LOG_BASE is called, and will never stop. Functionally, I don’t see any problem with that, but it is not correct as per spec, and in the case of a cancelled migration, seems like an admittedly most likely absolutely negligible waste of CPU cycles. Therefore, I do believe this should be addressed at some point in time, but not necessarily now.

germag commented 7 months ago

I was specifically talking about the cancelled migration case, and I did agree (“Sure.”) that we don’t necessarily need to address it now. If you want to implement disabling now, fine with me; if you don’t, also fine with me. I only said I believe this is something that should be looked into at some point, not necessarily now. ... To be entirely clear, I still do not have any objection to this PR as-is.

But from what I understand, with the PR as-is, logging does start when _SET_LOGBASE is called, and will never stop. Functionally, I don’t see any problem with that, but it is not correct as per spec, and in the case of a cancelled migration, seems like an admittedly most likely absolutely negligible waste of CPU cycles. Therefore, I do believe this should be addressed at some point in time, but not necessarily now.

Ahh ok, sorry for the confusion (you know Mondays), ok I take note, and I'll work on a solution for a future PR, and do some testing, also I'll measure the performance impact. Thanks

XanClic commented 7 months ago

(Just for reference for a future PR, this is how I imagine it without using an atomic: https://czenczek.de/0001-Disable-bitmaps-unless-F_LOG_ALL.patch (github wouldn’t let me attach this file 🙁))

stefano-garzarella commented 7 months ago

@germag I just rebased it and everything looks fine. IIUC we can merge it for now and improve later.

@XanClic @Ablu WDYT?

XanClic commented 7 months ago

No objections from my side, looks good to me.

Ablu commented 7 months ago

Ah, when skimming the discussion above I assumed that changes were planned.

I still do not fully understand where we are and what is missing.

Why do we immediately enable logging if the specification says that it starting logging should happen as part of SET_FEATURES?
What exactly is missing here for full functionality?
Do we see ways to simplify the API?

Probably all of this is clear to you being more familiar with the specs and implementations. Chances are that I am also just misunderstanding bits. But if this is not usable as it is yet and we do not have a full understanding of what the API needs will be in the end, then what do we gain by merging now?

I expressed my worry about the additional trait complexities and I am happy to merge this if we have a road towards simplifying that, but it would be great to have the plans spelled out somewhere at least in a rough form.

XanClic commented 7 months ago

Why do we immediately enable logging if the specification says that it starting logging should happen as part of SET_FEATURES?

German knows this better than me, but I assume just because it’s simpler: There currently is no way to attach dirty bitmaps to memory regions (which we must do on SET_LOG_BASE) but keep them disabled. So that would need to be implemented. We can’t see an actual problem that would arise from starting too early or stopping too late. I’ve posted a patch that implements disabling dirty bitmaps until SET_FEATURES sets F_LOG_ALL. German has said he’d prefer to incorporate that in a future PR rather than here, and because I don’t see a functional problem, I don’t disagree.

What exactly is missing here for full functionality?

Nothing. This MR works as-is for live migration with virtiofsd. There is the F_LOG_ALL issue, but in practice that is not a functional issue but at worst a performance issue.

Do we see ways to simplify the API?

German knows this better, but if anything, the only problem I have with the API is that it requires specifying a bitmap type at all, because vhost-user should always use BitmapMmapRegion. I don’t have a problem with the traits, because they didn’t appear when I made use of this MR in virtiofsd. The only thing I needed to do was to replace the bitmap type from being implicitly () to explicitly BitmapMmapRegion. Patch 2 in the virtiofsd migration MR makes virtiofsd ready for state-less migration*, and it really is just changing the bitmap type from () to BitmapMmapRegion, and setting the necessary feature bits.

*State-less migration means it cannot transfer its internal state yet, but that’s a completely separate topic.

Ablu commented 7 months ago

What exactly is missing here for full functionality?

Nothing. This MR works as-is for live migration with virtiofsd. There is the _F_LOGALL issue, but in practice that is not a functional issue but at worst a performance issue.

OK. That's where I got confused. Thanks! I looked at https://gitlab.com/virtiofsd-live-migration and could not tell what exactly was missing.

Why do we immediately enable logging if the specification says that it starting logging should happen as part of SET_FEATURES?

German knows this better than me, but I assume just because it’s simpler: There currently is no way to attach dirty bitmaps to memory regions (which we must do on _SET_LOGBASE) but keep them disabled. So that would need to be implemented. We can’t see an actual problem that would arise from starting too early or stopping too late. I’ve posted a patch that implements disabling dirty bitmaps until _SETFEATURES sets _F_LOGALL. German has said he’d prefer to incorporate that in a future PR rather than here, and because I don’t see a functional problem, I don’t disagree.

Then let's document that either in a code comment or the commit log and merge? :thinking:

germag commented 6 months ago

V7:

Add a TODO comment to support LOG_ALL flag on SET_FEATURE

germag commented 6 months ago

Rebase

rust-vmm / vhost

Add replaceable-mmapped bitmap support #206

Summary of the PR

Note to reviewers