sony / nmos-cpp

An NMOS (Networked Media Open Specifications) Registry and Node in C++ (IS-04, IS-05)
Apache License 2.0
139 stars 80 forks source link

Recent intermittent BCP-003-01 failures on Windows asio platform #184

Closed garethsb closed 3 years ago

garethsb commented 3 years ago

The Google Sheet shows that since c3f0cf7 five of the BCP-003-01 tests have been failing intermittently on the Windows asio build, where they haven't before. These are all tests that run testssl.sh with:

-S, --server-defaults         displays the server's default picks and certificate info

(https://github.com/drwetter/testssl.sh/wiki/Usage-Documentation)

In order to avoid running testssl.sh multiple times unnecessarily, the test suite caches the results with the same command line arguments, and we can see from the results that some times all five tests that ask for -S report DISABLED (test_03, test_05, test_06, test_08, test_09, some times the first four or fewer, and once, it obviously succeeded at the first attempt.

Since this doesn't happen on the non-asio Windows build, it would appear not to be related to how testssl.sh is run on Windows, and since it didn't happen at c3f0cf7 or earlier, it would appear to be related to PR #183. Perhaps the cert files are being read and the certs becoming available later or something?

The test suite is running testssl.sh with --openssl-timeout (see https://github.com/drwetter/testssl.sh/commit/a288954d36f80f2745ccbf0af8a4d12b16ecc070) set based on CONFIG.HTTP_TIMEOUT which is 1 second by default (and I don't think is changed by nmos-cpp CI). I wonder if the test suite could be improved to detect and distinguish timeout from other issues with testssl.sh... See https://man7.org/linux/man-pages/man1/timeout.1.html (in which timeout --preserve-status used by the above is documented) and https://github.com/AMWA-TV/nmos-testing/blob/2c8bb19fb06160be64da04041e39f997e1914d2e/nmostesting/suites/BCP00301Test.py#L63.

lo-simon commented 3 years ago

Thanks, Gareth, for your in-depth investigation. I have tested the windows asio built, running test-suite (and the testssl.sh) on the Ubuntu VM. It has passed with no issue. In the past when I was using WSL to run the test-suite, it took a long time to run, and I would think the issue could well be timeout while running testssl.sh. I will investigate the cause of the issue.

lo-simon commented 3 years ago

Here are what we found:

  1. Running tests locally - test-suite on WSL 18.04 (WSL 1) and WSL 20.04 (WSL 2) Pass - using memory buffer and file-based asio interfaces to load certificates.

  2. Running tests on GitHub - test-suite on WSL 20.04 with testssl http_timeout = 1s (default) Fail - using memory buffer asio interface Pass - using file-based asio interface

  3. Running tests on GitHub - test-suite on WSL 20.04 with testssl http_timeout = 600s Fail - using memory buffer asio interface Pass - using file-based asio interface

  4. Running tests on GitHub - test-suite on WSL 18.04 with testssl http_timeout = 1s (default) Pass - using memory buffer asio interface Pass - using file-based asio interface

In our case, according to MS, it would be better to use WSL 1 than WSL 2.

Your project files must be stored in the Windows file system. WSL 1 offers faster access to files mounted from Windows. If you will be using your WSL Linux distribution to access project files on the Windows file system, and these files cannot be stored on the Linux file system, you will achieve faster performance across the OS files systems by using WSL 1.

See https://docs.microsoft.com/en-us/windows/wsl/compare-versions#exceptions-for-using-wsl-1-rather-than-wsl-2

jonathan-r-thorpe commented 3 years ago

Fixed by 58aa4d20ccada0636fac0d8580b1abe8635e102f