opendatahub-io / model-registry-operator

Apache License 2.0
3 stars 17 forks source link

feat: add detailed error messages from deployment and pod status, fixes RHOAIENG-8789 #108

Closed dhirajsb closed 3 months ago

dhirajsb commented 3 months ago

Description

Added logic to modelregistry_controller_status to get detailed error messages from deployment and pod status for various error use cases Operator defaults to generic error messages to avoid ignoring errors when specific errors cannot be found in status conditions Fixes RHOAIENG-8789

How Has This Been Tested?

Deployment was broken by setting grpc image property to bad:image in model registry spec manually, which causes the deployment to fail gradually as it tries to resolve the missing image condition. The operator gracefully handles the evolving status changes.

Merge criteria:

dhirajsb commented 3 months ago

@tarilabs I tested the deployment errors by trying to use a bad grpc image and a bad db host name. Istio and gateway errors are kinda hard to manually create since they'd require some resource constraint or a bad Istio config somehow. But the operator will collect those status errors and add it to the model registry status.

dhirajsb commented 3 months ago

This PR builds on top of #108

tarilabs commented 3 months ago

@tarilabs I tested the deployment errors by trying to use a bad grpc image and a bad db host name. Istio and gateway errors are kinda hard to manually create since they'd require some resource constraint or a bad Istio config somehow. But the operator will collect those status errors and add it to the model registry status.

thanks, it would be nice eventually to add test for the former, more easily reproducible, scenarios my2c

dhirajsb commented 3 months ago

it would be nice eventually to add test for the former, more easily reproducible, scenarios my2c

Agreed, it's just that the current operator setupenv is not really built for an e2e test since it doesn't actually deploy anything. @tonyxrmdavidson checking the various status errors is something we should do in openshift-ci test. wdyt?