tensorchord / openmodelz

One-click machine learning deployment (LLM, text-to-image and so on) at scale on any cluster (GCP, AWS, Lambda labs, your home lab, or even a single machine).

https://docs.open.modelz.ai

Apache License 2.0

239 stars 23 forks source link

feat: log pod start time #190

Closed cutecutecat closed 1 year ago

cutecutecat commented 1 year ago

This is a step to log start time of ModelZ deployment.

Refactor

Move all Anotation and Label consts into modelzetes.

Feat

Add new events: pod-create, pod-ready and pod-timeout pod-create emitted when any pod created pod-ready emitted when the pod is ready pod-timeout emitted when the pod takes more than 5 minutes to start The message of these events is pod name

It could read by apiserver to fetch latest pod start time range.

Add a new pod_start_seconds Histogram to Prometheus with Buckets: {5.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 150.0, 300.0} It would be observed from one deployment-start-begin to deployment-start-finish

This could help us to draw a Histogram graph at modelz-ui -> Observability for all deployments.