Open andylizf opened 1 month ago
i have some buildkite stuff i've already setup for GKE that would probably adapt well to running skypilot smoke tests. Happy to work with the team on sharing it since I'm depending pretty heavily on skypilot+k8s right now
Another cost optimization - we can run a k8s cluster as a part of our CI and move many of the cloud agnostic tests (e.g., those which test core functionality of SkyPilot) to run on a Kubernetes cluster provisioned in github actions for the duration of the test.
See: https://github.com/marketplace/actions/kind-kubernetes-in-docker-action Step-by-step blog: https://dev.to/kitarp29/running-kubernetes-on-github-actions-f2c
Need to evaluate cost-benefit (test migration effort vs reduced cloud cost) before we implement the above.
Implement Automated Weekly Smoke Tests
Problem
Currently, smoke tests for SkyPilot (implemented in
test_smoke.py
) are being run manually. This process could be improved by automating these tests on a weekly basis.Proposed Solution
Implement an automated weekly smoke test run using a suitable CI/CD platform or automation tool, leveraging the existing
test_smoke.py
script.Implementation Details
Automation Setup:
test_smoke.py
script in the automationEnvironment Setup:
Test Execution:
pytest tests/test_smoke.py --terminate-on-failure
Challenges to Address
Credit Control:
Test Stability and Retries:
Multi-Cloud Testing:
test_smoke.py
Next Steps
Feedback on implementation details and challenge mitigation strategies is welcome, particularly from those familiar with
test_smoke.py
and our current testing processes.