runfinch / finch

The Finch CLI is an open source client for container development
https://www.runfinch.com
Apache License 2.0
3.47k stars 87 forks source link

ci: add workaround for WSL hanging in CI #993

Closed austinvazquez closed 2 days ago

austinvazquez commented 1 week ago

Issue #, if available: There is a known issue, https://github.com/microsoft/WSL/issues/8529, where WSL commands can hang. This can cause Windows e2e tests to block until hitting the 2 hour timeout.

Description of changes: This change adds a workaround to detect the bad state and attempt to mitigate by killing the WSL service. If the issue cannot be resolved, the test will only hang for 300 seconds before failing.

Testing done: CI run was successful with 8 WSL shutdown failures. https://github.com/runfinch/finch/actions/runs/9682445232/job/26715743040

image

Trade-off analysis The trade-off for this approach is the test suite can take longer with multiple reset VM calls being made. Sample runs which previously took ~15 minutes are up to ~37 minutes with the hanging mitigation; however, this is down from the 2 hour timeout failure which would occur without the mitigation.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

austinvazquez commented 2 days ago

Hopefully this works better than the naive shutdown command 👍

From the microsoft/WSL issue, some users reported shutdown taking 2-3 minutes. We can also consider being more aggressive than this and killing the WSL service faster.