vmware-tanzu-labs / educates-training-platform

A platform for hosting interactive workshop environments in Kubernetes, or on top of a local container runtime.
https://docs.educates.dev
Apache License 2.0
63 stars 14 forks source link

Be more obvious on errors in session-manager (maybe add events) #472

Closed jorgemoralespou closed 5 days ago

jorgemoralespou commented 6 days ago

Is your feature request related to a problem? Please describe.

When created a cluster without kapp-controller and then deployed a workshop which had a dependency on kapp-controller, workshops were not spinning up but one could not immediately know what was the cause of the error. When looking at the logs of session manager, one could see this:

INFO:educates.workshopallocation:Workshop allocation request educates-cli-w02-s001 against workshop session educates-cli-w02-s001 of workshop environment educates-cli-w02 being processed.
ERROR:kopf.objects:Handler 'workshop_allocation_create' failed temporarily: No record of variables secret educates-cli-w02-s001-session required for workshop allocation request educates-cli-w0
2-s001.
INFO:educates.workshopsession:Creating workshop session object educates-cli-w02-s001-admin-vcluster-values of type Secret in namespace educates-cli-w02 for workshop session educates-cli-w02-s0
01.
INFO:educates.workshopsession:Creating workshop session object educates-cli-w02-s001-admin-vcluster-package of type App in namespace educates-cli-w02 for workshop session educates-cli-w02-s001
.
ERROR:educates.workshopsession:Unable to create workshop session objects, failed creating object educates-cli-w02-s001-admin-vcluster-package of type App in namespace educates-cli-w02 for work
shop session educates-cli-w02-s001.
Traceback (most recent call last):
  File "/opt/app-root/src/handlers/workshopsession.py", line 1402, in workshop_session_create
    create_from_dict(object_body)
  File "/opt/app-root/src/handlers/objects.py", line 50, in create_from_dict
    resource = Resource(api, body)
               ^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/src/handlers/objects.py", line 10, in Resource
    return object_factory(api, body["apiVersion"], body["kind"])(api, body)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/venv/lib64/python3.12/site-packages/pykube/objects.py", line 213, in object_factory
    resource_list = api.resource_list(api_version)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/venv/lib64/python3.12/site-packages/pykube/http.py", line 386, in resource_list
    r.raise_for_status()
  File "/opt/app-root/venv/lib64/python3.12/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://10.96.0.1:443/apis/kappctrl.k14s.io/v1alpha1/
ERROR:kopf.objects:Handler 'workshop_session_create' failed permanently: Unable to create workshop session objects, failed creating object educates-cli-w02-s001-admin-vcluster-package of type
App in namespace educates-cli-w02 for workshop session educates-cli-w02-s001.

Describe the solution you'd like

Last line being the more obvious one:

nable to create workshop session objects, failed creating object educates-cli-w02-s001-admin-vcluster-package of type
App in namespace educates-cli-w02 for workshop session educates-cli-w02-s001

It'll be ideal if this could be surfaced to the user in a more prominent way, maybe via an event.

Describe alternatives you've considered

No response

Additional information

No response

GrahamDumpleton commented 6 days ago

What did you see when you run:

% kubectl get workshopsessions
NAME                    URL   USERNAME   PASSWORD   STATUS   MESSAGE
educates-cli-w03-s001                               Failed   Unable to create workshop session objects, failed creating object localhost-educates-cli-w03-s001 of type xService in namespace educates-cli-w03 for workshop session educates-cli-w03-s001.

after error from session.objects?

Or when listing workshopenvironments for environment.objects or workshopallocations for request.objects.

GrahamDumpleton commented 6 days ago

FWIW, in default config for kopf it would create events for a rediculous amount of log messages which created noise and causes events to be garbage collected more quickly making events overall pretty useless. This was therefore dialed back.

Although messages is captured in status message of these custom resources, some like workshopsession and workallocation are themselves potentially shorted live and get cleaned up after the workshop session startup timeout, session is orphaned, explicitly terminated or workshop duration is expired. So do accept will be lost perhaps quicker than events, which typically might last an hour at least if not garbage collected sooner due to volume.

The kopf framework does provide an API for generating explicit events from an operator, so can add that for these in addition to adding them to status of the respective custom resources.

GrahamDumpleton commented 6 days ago

These errors are also reported as part of the analytics event stream, which would be the only viable path for permanent capture using an external system.

GrahamDumpleton commented 6 days ago

Actually, analytics events are not reporting for these specific errors at the moment.

GrahamDumpleton commented 6 days ago

Checking, there should already have been an event for this error without needing to change anything. Something like:

0s          Error     Logging             workshopsession/educates-cli-w02-s002              Handler 'workshop_session_create' failed permanently: Unable to create workshop session objects, failed creating object registry-educates-cli-w02-s002 of type xService in namespace educates-cli-w02 for workshop session educates-cli-w02-s002.
jorgemoralespou commented 5 days ago

My bad. Closing this issue as there's already events that somehow I missed to see