neuro-inc / platform-api

Provides API for Neu.ro MLOps Platform.
Other
3 stars 2 forks source link

ClusterScaleUpFailed for clusters with fixed size is confusing #1711

Open YevheniiSemendiak opened 3 years ago

YevheniiSemendiak commented 3 years ago

STR:

Actual result: the job is failed with the ClusterScaleUpFailed status, which is confusing (we even cannot scale up in on-prem).

Desired result: the failure status should be bind with the previous "pending" status, something like SchedulingFailed.

To make it less confusing, we could add some hints on what is the reason and what to check, like SchedulingFailed - not enough resources. As an alternative, in Slack thread we agreed to describe exit codes in docs.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 14 days

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 14 days

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 14 days