odpi / egeria-ui

User interface instance using main Egeria functionalities.
https://odpi.github.io/egeria-ui/
Apache License 2.0
36 stars 17 forks source link

Error handling needs improving when ui-chassis is not ready #574

Closed planetf1 closed 1 year ago

planetf1 commented 1 year ago

During startup, the ui chassis may not be ready to service all requests. Use of the UI during this time may hang

For example: In our coco labs environment, deployed using our odpi-egeria-lab helm chart, run the configuration, and data catalog notebooks.

Then navigate to the ui

You may see this page after login, which is indicative of the ui chassis not being ready:

Screenshot 2023-03-29 at 08 13 35

If we look in more detail, we may see errors fetching types:

Screenshot 2023-03-29 at 08 38 22 Screenshot 2023-03-29 at 08 39 09

And in the log from the ui chassis it's clear this is because the server is not ready:

2023-03-29 07:35:57.397 - INFO 1 --- [           main] o.o.o.u.u.springboot.EgeriaUIPlatform    : Started EgeriaUIPlatform in 60.405 seconds (process running
 for 61.971)
2023-03-29 07:37:56.824 -ERROR 1 --- [nio-8443-exec-2] o.o.o.c.ffdc.RESTExceptionHandler        : Detected Invalid Parameter Exception in REST Response

org.odpi.openmetadata.frameworks.connectors.ffdc.InvalidParameterException: OMAG-MULTI-TENANT-404-001 The OMAG Server cocoMDS1 is not available to service a
request from user erinoverview
        at org.odpi.openmetadata.commonservices.ffdc.RESTExceptionHandler.throwInvalidParameterException(RESTExceptionHandler.java:289) ~[ffdc-services-4.0.j
ar!/:na]
        at org.odpi.openmetadata.commonservices.ffdc.RESTExceptionHandler.detectAndThrowInvalidParameterException(RESTExceptionHandler.java:206) ~[ffdc-servi
ces-4.0.jar!/:na]
        at org.odpi.openmetadata.accessservices.assetcatalog.AssetCatalog.detectExceptions(AssetCatalog.java:307) ~[asset-catalog-client-4.0.jar!/:na]
        at org.odpi.openmetadata.accessservices.assetcatalog.AssetCatalog.getSupportedTypes(AssetCatalog.java:276) ~[asset-catalog-client-4.0.jar!/:na]
        at org.odpi.openmetadata.userinterface.uichassis.springboot.service.AssetCatalogOMASService.getSupportedTypes(AssetCatalogOMASService.java:236) ~[cla
sses!/:na]

The UI should have an appropriate timeout and error handling

planetf1 commented 1 year ago

original report: https://github.com/odpi/egeria-ui/issues/480

sarbull commented 1 year ago

noted

sarbull commented 1 year ago

@planetf1 i added a new page called "Server unavailable" on where the user gets redirected for Failed fetch requests or CORS meaning that the server is unreachable.

For other kinds of response, i notice you get a 404, that might be because the nginx layer returns that, it would be better to return 503 so that i know i should redirect the user to the "Server unavailable" situation.

Currently here [0] is where the API responses are being handled with that behaviour, if we want the behaviour to be like redirecting the user to the "Server unavailable" we have to update the nginx response from 404 to 503 and that handler to redirect to "Server unavailable" page.

[0] - https://github.com/odpi/egeria-js-commons/blob/main/src/http/handle-response.ts#L9

sarbull commented 1 year ago

@planetf1 As a safety measure the form is already disabled if there was no data brought to the UI for unexpected situations at API level such as the one in the attached screenshot, it also raises an alert bottom right side for failed requests.

planetf1 commented 1 year ago

We get 404 - certainly before a server is configured - since at that point the URL is indeed not found, there is no such thing as ‘cocoMDS2’ etc known.

That assumes the platform itself is responding to requests - if there are issues with tomcat resources, you might get a 503.

If a server is temporarily shutdown, I see a case for preferring 503 over 404, but am not sure what the chassis returns, or if what we could/should do. This case may also be tricky to apply at startup in any case.

Even errors like DNS failures could be transient - especially in a k8s environment - where the DNS name for the service has not yet been created.

I think in all these cases the user needs an easy way to retry - maybe several times

sarbull commented 1 year ago

@planetf1 that's actually the problem, nginx gives those statuses over the actual 503 because of the reverse proxy implementation. with the two separate servers, using cors, there is now an implementation for when a client doesn't receive an answer from the backend (server down) it redirects him to a maintainance page, here [0] and here [1]

[0] - https://github.com/odpi/egeria-ui/pull/585 [1] - https://github.com/odpi/egeria-js-commons/pull/114

sarbull commented 1 year ago

closing this after workshop demo