spdk / spdk

Storage Performance Development Kit
https://spdk.io/
Other
2.95k stars 1.17k forks source link

[vhost] Windows SCSI compliance test failures #3271

Open karlatec opened 4 months ago

karlatec commented 4 months ago

Sighting report

So majority of community may not know that but SPDK CI internally runs vhost_windows_scsi_compliance test job in nightly mode. This test run SPDK Vhost target, configures SCSI controllers and then attaches to them from a Windows Server 2012 VM. After creating MBR partition tables and creating NTFS filesystems on top of attached block devices there is a number of SCSI compliance tests run.

Automation scripts for this test case can be found here.

There are two issues with this test:

  1. Automation script does not report failures properly. Builds are in passing state for a long time, but upon manual investigation it looks like failures are not detected properly and it should be failing. More details in next point.

  2. Automation script defines a list of "allowed" failures and warnings reported by SCSI compliance tools. This list was a set of well-known issues at the time of creating the automation script, but at this time the actual failure/warning list reported by the test has greatly diverged from what's expected. Expected fail/warn list defined in the script: link Actual fail/warn list reported by automation builds: windows_scsi_compliance_fails.txt Full build log: windows_scsi_compliance_log.txt Archive with SCSI compliance tools logs: windows_scsi_compliance_test_logs.zip

Expected Behavior

  1. Automation script should fail when discrepancies between expected results and actual results are detected.
  2. Actual results should match expected results (either by fixing issues or adjusting expected results list)

Current Behavior

As described in report.

Possible Solution

As in expected behavior

Steps to Reproduce

N/A at the moment. Will provide if needed.

tomzawadzki commented 4 months ago

Looking at the attached zip file with full log details (there is a nice summary in results.html), most of the tests pass successfully. There might be some relevant failures in READ/WRITE (10) only for 4k block size Malloc backed controller or PERSISTENT RESERVE for both controllers. Those are limited to SCSI Compliance Test (not the SCSI Compliance Test 2.0 (LOGO)). Similar tests on 2.0 are passing.

The test script, results and goals need to be re-evaluated. Are we mostly interested in SCSI compliance testing, or Windows virtio-scsi driver compatibility with SPDK vhost ? Should we look at alternative compliance testing suites ? Does extending testing to Windows NVMe driver backed by SPDK vfio-user device make sense ?

Right now unfortunately, this test is not maintained - including the VM image. Let's gauge interest via this issue, with no traction the test (and job) shall be removed.

karlatec commented 4 months ago

this test is not maintained

For now I'll take on fixing proper pass/fail reporting. I'm going to unassign myself after it's done.

BTW, we should probably adjust the labels on this issue

karlatec commented 4 months ago

Actually unassigning for now. Let's first discuss what to do with this.

tomzawadzki commented 4 months ago

[Bug scrub] No one on the call is interested in maintaining or deploying this/similar test. I'll reach out on Slack too, to gather more attention and leave issue open for time being.