mu-semtech / mu-python-template

Template for running Python/Flask microservices
MIT License
4 stars 8 forks source link

Exception handling #14

Open piemonkey opened 11 months ago

piemonkey commented 11 months ago

The existing start-up error handling simply logs the string message from the exception which doesn't always help, so instead, pull the stack trace using the traceback module and log it.

When debugging an endpoint that the delta-notifier was calling, I realised that an exception in a route is just returned to the caller, not logged, so this adds an environment variable to enable a catch-all handler, which logs the stack trace but re-raises the exception so it doesn't change the behaviour.

madnificent commented 11 months ago

Thanks for this addition. We think this would help but we are not sure about the approach yet.

First, a small comment on the implementation: the console now logs that 500 will be returned but that's still an application concern and therefore may not hold.

On the larger concept, there are two parts: one is handling errors, another is logging errors.

With respect to handling errors:

In mu-semtech/mu-javascript-template#53 we've made a suggestion to have a function that can be overridden. This allows to specify a default.

In mu-semtech/mu-javascript-template the errors are currently handled by using the error handler and specifying that. It may be that this approach gets replaced/upgraded by a similar approach as the one in the PR.

With respect to logging errors:

It's still unclear to us if these should always be logged, if this should be done through an environment variable in the template, or if other logging infrastructure should be offered. Preferably light-weight. Regardless of what we choose, we'd ideally have the same solution across the various microservice templates.

Either case, we'll leave this one hanging as we figure out good approaches for providing default behaviour and overriding them. Insights welcome!

piemonkey commented 11 months ago

@madnificent I'll try to reply to each comment by quoting. Since you've said that you'll leave it hanging, I thought to separate the two commits. The other PR #15 contains the part that doesn't change the API, just the usefulness of logging on start-up errors.

the console now logs that 500 will be returned but that's still an application concern and therefore may not hold.

Since this handler is called if an exception is raised and not handled within a route, unless there is a way to override the default exception behaviour in Flask, other than by setting this handler, then it will indeed return a 500. Regardless of what the application would like to do with errors. As far as my testing goes, only one errorhandler gets called, so if this is happening, a 500 will be returned.

In https://github.com/mu-semtech/mu-javascript-template/pull/53 we've made a suggestion to have a function that can be overridden. This allows to specify a default.

I like this API, but the issue here is that it seems that setting a catch-all error handler (for any Exception) in the application code does not work if this one is set. So if we follow the 'default-but-allow-overrides' approach, we force the application code to use our mechanism rather than allowing them to use the default Flask mechanism if they choose to, as our default handler would override their custom handler.

It's still unclear to us if these should always be logged, if this should be done through an environment variable in the template, or if other logging infrastructure should be offered. Preferably light-weight. Regardless of what we choose, we'd ideally have the same solution across the various microservice templates.

My personal take on this is that for developer sanity experience, the default should be to log any errors. This should of course be something that can be switched off if desired. Since we're aiming to reduce boilerplate code, I also think sensible defaults for production are a good thing.

To me the choice of logging infrastructure is not a code concern. The code used to report should be standard (if available, like for python) or follow common practices (e.g. in the case of node). Whatever library/module this is should then allow the configuration of what infrastructure is used to handle logs.