gpu-socket feature causing feature unification issues

dorindabassey commented 3 days ago

since the set_gpu_socket feature is behind a feature flag, this cause feature unification issues, because vhost-device-gpu is enabling that feature and other crates are not enabling it. According to rust feature reference:

That is, enabling a feature should not disable functionality, and it should usually be safe to enable any combination of features. A feature should not introduce a SemVer-incompatible change.

dorindabassey commented 3 days ago

@stefano-garzarella is suggesting that we remove the gpu_socket feature flag. which I agree can solve the feature unification problem. more suggestions are welcomed.

stefano-garzarella commented 2 days ago

Just a bit more context. The issue was discovered while adding vhost-device-gpu crate in a workspace with other crates using the vhost-user-backend crate, see https://github.com/rust-vmm/vhost-device/pull/668

Since the new crate enable gpu-socket feature, this is enabled by cargo also for other crates in the workspace to minimize the number of copies of the crate, see https://doc.rust-lang.org/cargo/reference/features.html#feature-unification

At this point, we had the following issue on another crate in the same workspace (vhost-device-console) not enabling that feature:

   Compiling vhost-device-console v0.1.0 (/workdir/staging/vhost-device-console)
error[E0046]: not all trait items implemented, missing: `set_gpu_socket`
   --> vhost-device-console/src/vhu_console.rs:641:1
    |
641 | impl VhostUserBackendMut for VhostUserConsoleBackend {
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ missing `set_gpu_socket` in implementation

The first fix we had in mind was in some way to avoid the feature unification, maybe forcing cargo to use 2 copies, but at that point we discovered the recommendation in https://doc.rust-lang.org/cargo/reference/features.html#feature-unification that also @dorindabassey reported:

That is, enabling a feature should not disable functionality, and it should usually be safe to enable any combination of features. A feature should not introduce a SemVer-incompatible change.

So maybe we should avoid that here, so I was thinking of not really removing the feature, but something like just providing default implementation for VhostUserBackend that should not be used if the feature is not enabled (maybe we can panic there):

diff --git a/vhost-user-backend/src/backend.rs b/vhost-user-backend/src/backend.rs
index ad9e950..2ba7576 100644
--- a/vhost-user-backend/src/backend.rs
+++ b/vhost-user-backend/src/backend.rs
@@ -87,13 +87,12 @@ pub trait VhostUserBackend: Send + Sync {
     /// function.
     fn set_backend_req_fd(&self, _backend: Backend) {}

-    #[cfg(feature = "gpu-socket")]
     /// Set handler for communicating with the frontend by the gpu specific backend communication
     /// channel.
     ///
     /// This method only exits when the crate feature gpu-socket is enabled, because this is only
     /// useful for a gpu device.
-    fn set_gpu_socket(&self, _gpu_backend: GpuBackend);
+    fn set_gpu_socket(&self, _gpu_backend: GpuBackend) {}

     /// Get the map to map queue index to worker thread index.
     ///
(END)

@germag @epilys any suggestion?

epilys commented 2 days ago

a device specific method in a general trait does not make much sense. I suggest adding something like a pub trait VhostUserGpuBackend: VhostUserBackend with this method and implement it for for the vhost-device-gpu crate

stefano-garzarella commented 2 days ago

@epilys it's not really device specific, IIRC it's part of the spec, but I may be wrong. In any case it's true that only the GPU device will use it, so a new trait makes sense.

@dorindabassey @mtjhrc do you remember why we did that?

epilys commented 2 days ago

@stefano-garzarella no no you're not wrong, it's part of the spec. But since it's for GPU stuff only, it doesn't make sense to have it on the general backend trait. A sub-trait for backends that use the vhost user GPU protocol sounds like a good solution to me.

dorindabassey commented 2 days ago

yes, it's part of the spec in https://qemu-project.gitlab.io/qemu/interop/vhost-user-gpu.html

@dorindabassey @mtjhrc do you remember why we did that?

this feature gpu-socket was added in the vhost dependency to ensure that it remains specific to GPU device.

I think we can make it an optional protocol feature just like set_backend_req_fd

epilys commented 2 days ago

Or use the type system with a sub-trait :) It's there, why not use it?

mtjhrc commented 2 days ago

Yes, using a sub-trait seems nicer. Though I am not sure if it is worth the extra complexity/code.

There is the trait VhostUserBackend but we also have VhostUserBackendMut

and the impls: impl<T: VhostUserBackendMut> VhostUserBackend for Mutex<T> { impl<T: VhostUserBackendMut> VhostUserBackend for RwLock<T> {

So if we want a Gpu subtrait we need have both GpuVhostUserBackendMut and GpuVhostUserBackend. We also want to have impls for Mutex<T> and RwLock<T> to be consistent.

Then VhostUserDaemon and internal code needs to be extended to also work with the extended trait. https://github.com/rust-vmm/vhost/blob/4f160320a86a27579a3a3373b590dde2c27959a6/vhost-user-backend/src/lib.rs#L98-L113

The proper way do this I imagine is to introduce yet another trait, to be able to accept both GpuVhostUserBackend and VhostUserBackend. The trait GpuVhostUserBackend wouldn't exists in the crate if gpu-socket is not enabled)

While this is all doable, it just seems quite complex, and I am not sure if it is worth it.

Two simpler solutions are: 1) Just remove the existence of the feature flag and have it enabled for all devices.

2) Keep the feature flag but have default implementation of the method set_gpu_socket even if the feature is not enabled. It should probably just panic. Note that this means we also need to have the struct GpuBackend defined (as an empty struct probably) if we want to be able to define the method signature:

fn set_gpu_socket(&self, _gpu_backend: GpuBackend) {

epilys commented 2 days ago

Side note: Having a separate Mut trait and also impls for Mutex<T> and RwLock<T> are simply bad abstractions and code. Bad abstractions are code smell, and now the smell shows up. :(

That's not for the gpu device to solve, so a default impl with a panic! if the feature is disabled should be the way to go for now then.

mtjhrc commented 2 days ago

I agree, I don't like the current abstraction either, there are also other things in vhost-user-backend - mainly the framework spawning worker threads instead of the library user having more control (It can also complicate the device code in some circumstances). I think I'll open an issue to discuss that. This is kind of related to the separate Mut and Non-Mut trait variants.

dorindabassey commented 2 days ago

While this is all doable, it just seems quite complex, and I am not sure if it is worth it.

I actually tried this solution and yes it quite a complicated approach.

Just remove the existence of the feature flag and have it enabled for all devices.

Keep the feature flag but have default implementation of the method set_gpu_socket even if the feature is not enabled. It should probably just panic. Note that this means we also need to have the struct GpuBackend defined (as an empty struct probably) if we want to be able to define the method signature:

we can remove the feature flag and have default implementation of the method set_gpu_socket like set_backend_req_fd. so if other devices do not implement the set_gpu_socket it won't be a problem.

rust-vmm / vhost

gpu-socket feature causing feature unification issues #265