rust-vmm / vhost-device

'vhost-user' device backends workspace
Apache License 2.0
68 stars 48 forks source link

vhost-device-spi: Add initial implementation #640

Closed HaixuCui closed 3 months ago

HaixuCui commented 8 months ago

Summary of the PR

This program is a vhost-user backend that emulates a VirtIO SPI bus. This program takes the layout of the spi bus and its devices on the host OS and then talks to them via the /dev/spidevX.Y interface when a request comes from the guest OS for a SPI device.

The implementation corresponds with the specification: https://github.com/oasis-tcs/virtio-spec/tree/virtio-1.4/device-types/spi

Requirements

Before submitting your PR, please make sure you addressed the following requirements:

stefano-garzarella commented 8 months ago

@HaixuCui sorry, I'm a bit confused, this PR is named "vhost-device-console: Add initial implementation" and we already have another PR #601, but looking in the code and in the PR description it looks like this PR is not covering the console device.

I suppose the PR title is wrong. Please, can you clarify?

Can you also check the CI failures? (note: cargo-audit failure is expected, see #639 )

HaixuCui commented 8 months ago

@stefano-garzarella Yes the title was wrong, I have changed, it's vhost-user-spi, thanks

HaixuCui commented 8 months ago

Hi all, my CI only has two jobs and both failed with the error log:

git clone -v -- https://github.com/rust-vmm/vhost-device.git . fatal: destination path '.' already exists and is not an empty directory. ⚠️ Warning: Checkout failed! cloning git repository: exit status 128 (Attempt 1/3 Retrying in 2s) Removing /var/lib/buildkite-agent/builds/ip-172-31-70-93-5/rust-vmm/vhost-device-ci 🚨 Error: Failed to remove "/var/lib/buildkite-agent/builds/ip-172-31-70-93-5/rust-vmm/vhost-device-ci" (unlinkat /var/lib/buildkite-agent/builds/ip-172-31-70-93-5/rust-vmm/vhost-device-ci/staging/target/debug/incremental: permission denied)

Could you please give me some clue, what should I do to resolve this? Thank you very much.

stefano-garzarella commented 8 months ago

Hi all, my CI only has two jobs and both failed with the error log:

git clone -v -- https://github.com/rust-vmm/vhost-device.git . fatal: destination path '.' already exists and is not an empty directory. ⚠️ Warning: Checkout failed! cloning git repository: exit status 128 (Attempt 1/3 Retrying in 2s) Removing /var/lib/buildkite-agent/builds/ip-172-31-70-93-5/rust-vmm/vhost-device-ci 🚨 Error: Failed to remove "/var/lib/buildkite-agent/builds/ip-172-31-70-93-5/rust-vmm/vhost-device-ci" (unlinkat /var/lib/buildkite-agent/builds/ip-172-31-70-93-5/rust-vmm/vhost-device-ci/staging/target/debug/incremental: permission denied)

Could you please give me some clue, what should I do to resolve this? Thank you very much.

It looks like this is unrelated. I'll check what is going on with our CI.

stefano-garzarella commented 8 months ago

@HaixuCui please, can you commit the changes to Cargo.lock?

HaixuCui commented 8 months ago

Hello @stefano-garzarella , I add vhost-user-spi package in Cargo.lock. Thanks!

HaixuCui commented 8 months ago

Hi @stefano-garzarella, I found the CI was workable before my last commit, and after I commit the Cargo.lock, it failed again with the same error I mentioned before. Could you please tell me what you do to make CI works? Really appreciate.

stefano-garzarella commented 8 months ago

Hi @stefano-garzarella, I found the CI was workable before my last commit, and after I commit the Cargo.lock, it failed again with the same error I mentioned before. Could you please tell me what you do to make CI works? Really appreciate.

@HaixuCui sorry about that, but we have problems with CI that we are investigating. For a moment it ran again because we manually removed the problematic file. We're figuring out why, I'll keep you updated.

HaixuCui commented 8 months ago

Hi @stefano-garzarella,

use libc::{c_ulong, ioctl};

#[cfg(target_env = "musl")]
type IoctlRequest = libc::c_int;
#[cfg(not(target_env = "musl"))]
type IoctlRequest = c_ulong;

const SPI_IOC_RD_MODE32: IoctlRequest = 0x80046b05;

The SPI_IOC_RD_MODE32 definition leads to such CI error: literal out of range for i32, with the note "the literal 0x80046b05 (decimal 2147773189) does not fit into the type i32 and will become -2147194107i32".

But now the type of SPI_IOC_RD_MODE32 is libc::c_int, it's signed and defined explicitly, why the compiler treat its value 0x80046b05 as unsigned, and try to fit 2147773189 into i32 type? Just treat as negative value -2147194107 and it can fit into the i32 range.

Solution:

  1. As the CI hint, to use as a negative number (decimal -2147194107), consider using the type u32 for the literal and cast it to i32, const SPI_IOC_RD_MODE32: IoctlRequest = 0x80046b05u32 as i32; but maybe cause other errors when target is not musl
  2. add #[deny(overflowing_literals to prevent checking?

Do you have any suggestions? Thank you very much.

stefano-garzarella commented 8 months ago

Do you have any suggestions? Thank you very much.

What about using the ioctl macros from vmm-sys-util? https://github.com/rust-vmm/vmm-sys-util/blob/main/src/linux/ioctl.rs

For example we used in https://github.com/rust-vmm/vhost/blob/2a89b283fad40047adb991ae3f5b958e5b4270d7/vhost/src/vhost_kern/vhost_binding.rs#L46

We build vhost for musl and glibc without issues, so I guess it already handles it internally.

HaixuCui commented 7 months ago

Hello @stefano-garzarella, many thanks for your support, the CI errors are resolved.

Now there are two errors, cargo-audit-x86_64 and coverage-x86_64. Are they expected? As you mentioned before cargo-audit is an known issue.

Can the review process begin now, or need to wait for the cargo audit fixed?

Thank you so much, again

stefano-garzarella commented 7 months ago

Hello @stefano-garzarella, many thanks for your support, the CI errors are resolved.

Great!

Now there are two errors, cargo-audit-x86_64 and coverage-x86_64. Are they expected? As you mentioned before cargo-audit is an known issue.

I just opened https://github.com/rust-vmm/vhost-device/pull/641 for the cargo-audit issue.

About coverage, it should be green, so you have 2 ways, add more tests or adapt the value in coverage_config_x86_64.json. The former would be preferred, but the deviation seems small, so if you don't have any tests in mind to add, it's fine to adjust the value as well.

Can the review process begin now, or need to wait for the cargo audit fixed?

It can start, but I'm not sure if I have time this week, sorry :-(

HaixuCui commented 7 months ago

Hi @stefano-garzarella, I've added some test functions, but there is still small deviation. Could you please give me some guidance on how to modify the coverage_config_x86_64.json?

Besides, the CI for the latest commit is problematical. Pipeline is wrong, but coverage_x86-64 passed with the 100% coverage, seems rather strange. Thank you so much.

stefano-garzarella commented 7 months ago

Hi @stefano-garzarella, I've added some test functions, but there is still small deviation. Could you please give me some guidance on how to modify the coverage_config_x86_64.json?

Something like this should work:

diff --git a/coverage_config_x86_64.json b/coverage_config_x86_64.json
index 346636c..26c2ba2 100644
--- a/coverage_config_x86_64.json
+++ b/coverage_config_x86_64.json
@@ -1,5 +1,5 @@
 {
-  "coverage_score": 77.55,
+  "coverage_score": 76.31,
   "exclude_path": "",
   "crate_features": ""
 }

Besides, the CI for the latest commit is problematical. Pipeline is wrong, but coverage_x86-64 passed with the 100% coverage, seems rather strange. Thank you so much.

yeah, we still have some problems. I rebuild it manually for now and now should be fine.

stefano-garzarella commented 7 months ago

@HaixuCui we merged the PR to fix cargo-audit warning, can you rebase?

HaixuCui commented 7 months ago

@HaixuCui we merged the PR to fix cargo-audit warning, can you rebase?

Hi @stefano-garzarella Okay.

HaixuCui commented 7 months ago

Hi @stefano-garzarella It works, cargo-audit passed. Thank you for your support.

HaixuCui commented 7 months ago

Hi @stefano-garzarella, @vireshk, Could you please help review the vhost-user-spi code, thank you.

HaixuCui commented 7 months ago

I did a first review and I left some comments. I'll be off till May 2nd, sorry in advance, I hope others can do next reviews, otherwise I'll check when I'm back.

Thank you for your comments, I'll check and fix.

HaixuCui commented 6 months ago

Hi @stefano-garzarella, @vireshk, @MatiasVara, I've update the project according to your comments, please help check and review. Thank you very much.

HaixuCui commented 6 months ago

Hi @stefano-garzarella , @MatiasVara , @vireshk

Can you please help to review again, I have updated the code and passed all checks. Thank you so much.

stefano-garzarella commented 5 months ago

Hi @stefano-garzarella , @MatiasVara , @vireshk

Can you please help to review again, I have updated the code and passed all checks. Thank you so much.

Sorry for the delay, I've been on vacation and have some stuff piled up, I hope to review it within this week.

epilys commented 5 months ago

A humble request: resolve old review comments that have been addressed. Thanks!

epilys commented 5 months ago

Since 1.4 has not been ratified yet, should we move this to staging/? @stefano-garzarella what do you think?

stefano-garzarella commented 5 months ago

Since 1.4 has not been ratified yet, should we move this to staging/? @stefano-garzarella what do you think?

Good question! Usually in Linux we ask to send changes to the spec first before accepting new drivers or changes. We usually accept changes once they've been voted on and merged though, so IMHO should be okay to have the device not in staging if the spec has been voted on.

epilys commented 4 months ago

@HaixuCui I resolved most of the left over comments. Would you like to resolve what is left yourself and select "Re-request review" when you're done?

By the way, it seems the CI complains about test coverage dropping by a tiny bit which must be easily fixable, and also on some yanked dependency warnings for other crates in the repository. Try rebasing the PR on top of the main branch to see if they go away.

HaixuCui commented 4 months ago

Hi @epilys, @stefano-garzarella, @MatiasVara, Thank you for your helpful comments, I've fixed most of them, and also verified on my target, vhost-user-spi daemon works well.

While the CI still failed due to coverage deviation, I would try to add some cases.

HaixuCui commented 4 months ago

Hi @stefano-garzarella , @epilys

I add some test cases, but the coverage only improved slightly.

I cannot add testcase to cover detect_supported_features function in spi.rs, because it interacts with the real hardware, while it is quite a bit function.

Another point is that, after using reader/writer mode of descriptor_util, some conditions also cannot be covered, because some invalid descriptors will be filtered by the following code at the very early stage.

let mut reader = desc_chain
                .clone()
                .reader(&mem)
                .map_err(|_| Error::DescriptorReadFailed)?;

     let writter = desc_chain
                .clone()
                .writer(&mem)
                .map_err(|_| Error::DescriptorReadFailed)?;

Now I have no testcase in mind, and there is still 0.55% deviation. Do you have any idea about improving the coverage? Thank you very much.

stefano-garzarella commented 4 months ago

Now I have no testcase in mind, and there is still 0.55% deviation. Do you have any idea about improving the coverage? Thank you very much.

I haven't had time to check, but don't worry, 0.55% deviation is acceptable, go ahead and edit coverage_config_x86_64.json in this PR to update coverage_score with the new value.

epilys commented 4 months ago

I cannot add testcase to cover detect_supported_features function in spi.rs, because it interacts with the real hardware, while it is quite a bit function.

It could be mocked but for such a low percentile point we could ignore it for now.

Another point is that, after using reader/writer mode of descriptor_util, some conditions also cannot be covered, because some invalid descriptors will be filtered by the following code at the very early stage.

We have that issue in this repository generally, it'd be nice to have mocking with error condition checks but it's a project on its own, out of the scope of this PR! So let's ignore that for now too.