rust-vmm / community

rust-vmm community content
499 stars 28 forks source link

Crate Addition Request: vhost-user-master #133

Open vireshk opened 2 years ago

vireshk commented 2 years ago

Crate Name

vhost-user-master

Short Description

Crate to manage vhost-user-master side of the protocol.

Why is this crate relevant to the rust-vmm project?

I already host a crate for this, but need space in rust-vmm now.

https://github.com/vireshk/vhost-user-master

The code here is mostly copied from cloud-hypervisor and so is relevant for rust-vmm for sure.

We, at Linaro's Project Stratos, need it for Xen's vhost master side implementation.

andreeaflorescu commented 2 years ago

Does it make sense for this to be in the same workspace with vhost-user-backend?

vireshk commented 2 years ago

I am fine with it, and it may be better to keep related things together. While at it, maybe we could have had vhost, vhost-user-backend and vhost-user-master together in the same workspace.

andreeaflorescu commented 2 years ago

Sounds good to me! We would need to create a workspace, and move the existing crates there, and then add this one as a crate as well. @slp @stefano-garzarella @jiangliu @sboeuf @rbradford do you have any concerns with this?

sboeuf commented 2 years ago

The crate name should be changed to something like vhost-user-vmm or else, but please don't use master given everybody is trying to move away from this wording. About the crate itself, I don't have anything against it, but I'm not sure Cloud Hypervisor will be able to move to it.

vireshk commented 2 years ago

but I'm not sure Cloud Hypervisor will be able to move to it.

The code I am carrying right now is copied from cloud hypervisor itself, almost unmodified (very small changes made in a separate patch).

vireshk commented 2 years ago

Sounds good to me! We would need to create a workspace, and move the existing crates there,

Great. I will wait for it.

andreeaflorescu commented 2 years ago

Sounds good to me! We would need to create a workspace, and move the existing crates there,

Great. I will wait for it.

Could you take that work as part of upstreaming this? We don't have time to invest in converting the vhost based crates in a workspace right now.

sboeuf commented 2 years ago

but I'm not sure Cloud Hypervisor will be able to move to it.

The code I am carrying right now is copied from cloud hypervisor itself, almost unmodified (very small changes made in a separate patch).

Yes I understand and I know it would be easy to rely on it at first, but I'm worried that with time the code will change and that it might be complicated to apply patches and fixes. The point is, I'm not sure there's added value in having this code into a crate and I'd rather keep it inside Cloud Hypervisor. But it's totally fine if you want to create a new crate from this code if you think this is a real added value for your project.

stefano-garzarella commented 2 years ago

The crate name should be changed to something like vhost-user-vmm or else, but please don't use master given everybody is trying to move away from this wording. About the crate itself, I don't have anything against it, but I'm not sure Cloud Hypervisor will be able to move to it.

Yep, I agree with the re-naming. Following the updated spec where we now use front-end and back-end, what about vhost-user-frontend?

Ablu commented 1 year ago

So far @vireshk only needed minimal changes on top of the frontend code from Cloud Hypervisor. But with moving towards "truly standalone daemons", we are facing the requirement to do larger refactoring in order to align the code with the proposed vhost-user spec additions. This brings up the question of how to approach upstreaming this again.

So... How to do it? Just copying the code from Cloud Hypervisor while cloud-hypervisor continues to use their own version sounds like the worst option for everyone to me... We would copy a lot of code that we would need to spend time understanding, improving and documenting. But if Cloud Hypervisor does not switch to it, we would have to put in the effort without having an (immediate) user for this. How would we test something like migration if we cannot send patches to Cloud Hypervisor in cases where API changes may be required?

To me, there only seem to be two (equally bad) solutions to this: We would take a battle-tested solution and start hacking on it without being able to test it anymore. Or we remove all the features that we have no immediate user for and then have to duplicate the effort to add all those features back once we start needing them. In both cases we would fragment the ecosystem and duplicate effort.

@sboeuf: I understand that you are afraid of churn in the code and that it may require your to jump through more hoops in order to get changes landed. But wouldn't you also be interested in support for new additions to the spec(s) and more eyes on the code from a wider community?

I think that if we start off with the current bits from Cloud Hypervisor and then collaborate on slowly and carefully massaging in some cleanups we are in a much better shape. Cloud Hypervisor can benefit from new features and we can test the more advanced features that exist there. So we would only need to do the minimal changes required to turn this into a shareable lib (if any required) and could then continue the development in a usual upstream manner.

Happy to hear thoughts on this!

sboeuf commented 1 year ago

I understand that you are afraid of churn in the code and that it may require your to jump through more hoops in order to get changes landed. But wouldn't you also be interested in support for new additions to the spec(s) and more eyes on the code from a wider community? I think that if we start off with the current bits from Cloud Hypervisor and then collaborate on slowly and carefully massaging in some cleanups we are in a much better shape. Cloud Hypervisor can benefit from new features and we can test the more advanced features that exist there. So we would only need to do the minimal changes required to turn this into a shareable lib (if any required) and could then continue the development in a usual upstream manner.

That sounds like the proper approach to me!

Ablu commented 1 year ago

That sounds like the proper approach to me!

Glad to hear that! :)

That would mean:

  1. We create a vhost-user-frontend crate in the vhost repo with the current unmodified bits from Cloud Hypervisor
  2. Make anything private the Cloud Hypervisor does not need
  3. Create a release for that crate
  4. Allow Cloud Hypervisor to switch to that release
  5. Start to do documentation/cleanups/refactoring in a careful manner, following the usual release processes.

Does this sound like a plan to the vhost maintainers (@eryugey @jiangliu @sboeuf @slp @stefano-garzarella)?

I am happy to work on this after I am back from my 2 week vacation, but if anyone else wants to work on this: Feel free :).

vireshk commented 1 year ago

I did start that work earlier (https://github.com/rust-vmm/vhost/pull/122), but the main problem I see here is the missing tests.

Are we okay to merge this stuff to vhost workspace without any tests in place ? @sboeuf ?

Ablu commented 1 year ago

Are we okay to merge this stuff to vhost workspace without any tests in place ?

To keep things simple, I would just merge it as it is (and adjust the coverage accordingly) and make test writing a priority after we have a risk-free release that Cloud Hypervisor can switch over to. But I got no strong feelings about this :).

vireshk commented 1 year ago

I think it would be good to have a decision on this before one of us try to make the effort.

@roypat @jiangliu : We are trying to move the vhost-user-frontend (master) bits out of the cloud hypervisor and merge them as a separate create in the vhost workspace. The crate won't have any tests to begin with as there are no tests written for them and it is complex stuff and someone more familiar with all that would be required to help us on that.

Are you guys okay to make this migration ?

Ablu commented 1 year ago

We can of course write tests for the tests sake and just treat coverage like a number that we need to reach, but I doubt it will make the crate significantly more stable (especially when we want to reduce the risk for Cloud Hypervisor to switch to this).

stefano-garzarella commented 1 year ago

@Ablu About test coverage and documentation, since it is a separate crate in vhost workspace, I think we can do things incrementally.

It would be nice to split the coverage for each individual crate, but I don't know if that's possible now.

stsquad commented 1 year ago

How about using a staging area. I've just prosed some guidelines for discussion at: https://github.com/rust-vmm/community/pull/153

Ablu commented 1 year ago

@stsquad Hm... This depends on whether we want to make public releases of things under staging/. The current workflow of migrating the crate over from cloud-hypervisor, then switching cloud-hypervisor over would require a release.

Ablu commented 1 year ago

@stsquad brough up this (https://github.com/rust-vmm/community/pull/153#issuecomment-1770957371):

Does cloud-hypervisor want to switch over given we are going to immediately start making changes and updates? I would have thought they could track via git trees if they are interested in testing where the changes go and then just switch when we get to production state?

My assumption was that we do an initial release with as little changes as possible. This would allow cloud-hypervisor to switch over and the further development would then happen on the new crate. We would need the cloud-hypervisor code base for testing the features that currently exist. So, it would be great to have cloud-hypervisor switch over. Then the development can happen as a joint effort.

See also https://github.com/rust-vmm/community/issues/133#issuecomment-1653193891.