trustyai-explainability / trustyai-service-operator

Kubernetes operator for the TrustyAI service
Apache License 2.0
2 stars 17 forks source link

feat: Driver updates job's status periodically #280

Closed yhwang closed 2 weeks ago

yhwang commented 3 weeks ago

The driver periodically update the LMEvalJob.Status.Message field with the outputs from the lm-eval. The message pattern the driver captures is like Running text generation: 81%|. Then users can use this information to check the progress of the job.

openshift-ci[bot] commented 3 weeks ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ruivieira

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/trustyai-explainability/trustyai-service-operator/blob/dev/lm-eval/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
yhwang commented 3 weeks ago

/retest

to see if it's an intermittent failure

ruivieira commented 2 weeks ago

@yhwang It seems the error happens due to https://github.com/trustyai-explainability/trustyai-service-operator/pull/280/files#diff-ff3da6967ee893290caa6ed4cf523cca9bf4109109e969e4c8422a4bc22054e2R114 redefining zap-devel. Is it possible to make the devel option for the logger global to all test, for instance?

openshift-ci[bot] commented 2 weeks ago

New changes are detected. LGTM label has been removed.

yhwang commented 2 weeks ago

@ruivieira nice catch! fixed it. Thanks!