One-click machine learning deployment (LLM, text-to-image and so on) at scale on any cluster (GCP, AWS, Lambda labs, your home lab, or even a single machine).
This is a step to log start time of ModelZ deployment.
Refactor
Move all Anotation and Label consts into modelzetes.
Feat
Add new events: pod-create, pod-ready and pod-timeoutpod-create emitted when any pod created
pod-ready emitted when the pod is ready
pod-timeout emitted when the pod takes more than 5 minutes to start
The message of these events is pod name
It could read by apiserver to fetch latest pod start time range.
Add a new pod_start_seconds Histogram to Prometheus with Buckets:
{5.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 150.0, 300.0}
It would be observed from one deployment-start-begin to deployment-start-finish
This could help us to draw a Histogram graph at modelz-ui -> Observability for all deployments.
This is a step to log start time of ModelZ deployment.
Refactor
modelzetes
.Feat
pod-create
,pod-ready
andpod-timeout
pod-create
emitted when any pod createdpod-ready
emitted when the pod is readypod-timeout
emitted when the pod takes more than 5 minutes to start Themessage
of these events ispod name
It could read by
apiserver
to fetch latest pod start time range.pod_start_seconds
Histogram to Prometheus with Buckets: {5.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 150.0, 300.0} It would be observed from onedeployment-start-begin
todeployment-start-finish
This could help us to draw a Histogram graph at
modelz-ui -> Observability
for all deployments.