Closed davidxia closed 2 months ago
@sydneyw-spotify just saw you update this PR recently. Feel free to ping me whenever this PR is ready for review.
@kevin85421, thanks. This is ready for review now. Example input and output in the info gist link in PR description.
LGTM. I will open a follow up to move the envtest to a better place.
Thank you!
Why are these changes needed?
Problem Statement
My ML platform team runs the kuberay ray-operator. We want to measure the time it takes for RayCluster's to transition from their initial "unhealthy" state to some other state. This metric is important for us because our users want their RayClusters to start in a timely manner. It seems like neither the ray-operator nor RayClusters provide this info currently.
Design
Add a new
.status.stateTransitionTimes
field to theRayCluster
custom resource. This field is amap[ClusterState]*metav1.Time
that indicates the time of the last state transition for each state. This field is updated whenever the.status.state
changes.manual testing steps:
make manifests generate
kubectl config use-context CONTEXT
make docker-build docker-push deploy IMG=europe-west4-docker.pkg.dev/spotify-workbench/images/operator:$USER-$(git rev-parse --short=7 HEAD)
kubectl --context CONTEXT apply -f /path/to/raycluster.yaml
.status.stateTransitionTimes
in the output ofkubectl --context CONTEXT -n NAMESPACE get rayclusters NAME -o yaml
Related issue number
Checks