oda-hub / dispatcher-app

Other
2 stars 1 forks source link

clarify exceptions returned by the dispatcher #25

Open volodymyrss opened 3 years ago

volodymyrss commented 3 years ago

a lot of the time in the current dispatcher the exception flows are not particularly untested, and occasionally fail within the exception handling. Due to time constrains and organizational priorities no sufficient refactoring was made for this purpose.

More recently, some effort was made to make sure returned errors as somewhat meaningful, but this effort is far from complete.

In general, dispatcher has two sorts of exception situations:

we would want to explain which kinds of exceptions can happen - most of all with exception class hierarchy. Two hierarchies - for each of the two kinds.

I would also argue that we should very clearly distinguish unexpected errors from expected.

And we need to be sure that in case of expected errors, user should be explained very clearly why the request likely does not give what user wanted.

and in unexpected:

And that if the error is unexpected, user should be promised we are looking into it! And maybe we should have a chance to come back to the user by email in such a case.

volodymyrss commented 3 years ago

I previously note some general experience and principles of our condition in here: https://github.com/oda-hub/doc-ops-reporting#issue-handling-principles . It derives from the user stories and practical experience of operating and using these platforms.

burnout87 commented 3 years ago

Starting some investigation that will help me in understanding more the architecture and how to improve the exceptions handling.

One first question:

volodymyrss commented 3 years ago

Starting some investigation that will help me in understanding more the architecture and how to improve the exceptions handling.

honorable goal!

One first question:

* [Here](https://github.com/oda-hub/dispatcher-app/blob/bc2be600665c090bd0b9172a4b2cb4b1aad48dc2/cdci_data_analysis/flask_app/dispatcher_query.py#L773) a new job is built, set it as failed, but not used for anything else: 

it is used right after: https://github.com/oda-hub/dispatcher-app/blob/bc2be600665c090bd0b9172a4b2cb4b1aad48dc2/cdci_data_analysis/flask_app/dispatcher_query.py#L775

and then returned https://github.com/oda-hub/dispatcher-app/blob/bc2be600665c090bd0b9172a4b2cb4b1aad48dc2/cdci_data_analysis/flask_app/dispatcher_query.py#L789

is there some other reason why building a job is needed in that case? Also, such a function, currently, is used only in case of a failed token validation...perhaps it can be used somewhere else.

perhaps it can be.

it was added later in order to try and reduce code redundancy, but it's content is replicated, with variations, in other places, which because of incomplete refactoring.

Something I am trying to understand.

burnout87 commented 3 years ago

Starting some investigation that will help me in understanding more the architecture and how to improve the exceptions handling.

honorable goal!

On that note, I created already a dedicated branch, where I can break everything I want!

One first question:

* [Here](https://github.com/oda-hub/dispatcher-app/blob/bc2be600665c090bd0b9172a4b2cb4b1aad48dc2/cdci_data_analysis/flask_app/dispatcher_query.py#L773) a new job is built, set it as failed, but not used for anything else: 

it is used right after:

https://github.com/oda-hub/dispatcher-app/blob/bc2be600665c090bd0b9172a4b2cb4b1aad48dc2/cdci_data_analysis/flask_app/dispatcher_query.py#L775

and then returned

https://github.com/oda-hub/dispatcher-app/blob/bc2be600665c090bd0b9172a4b2cb4b1aad48dc2/cdci_data_analysis/flask_app/dispatcher_query.py#L789

is there some other reason why building a job is needed in that case? Also, such a function, currently, is used only in case of a failed token validation...perhaps it can be used somewhere else.

perhaps it can be.

it was added later in order to try and reduce code redundancy, but it's content is replicated, with variations, in other places, which because of incomplete refactoring.

Something I am trying to understand.

Ok, then I can see if code can be further reduced

burnout87 commented 3 years ago

Next question:

The module excpetion contains some class definitions for specific exceptions (currently not in use). But I noticed that also the dispatcher_query has some BadRequest class extension definition (eg exampl). A refactoring action could indeed the moving of those classes definiton in a commonplace (the exception module looks the best candidate for this)...unless there is a specific reason for those to be there?

volodymyrss commented 3 years ago

Next question:

The module excpetion contains some class definitions for specific exceptions (currently not in use).

Do you mean that some of them are not in use? Because most are. Actually that's where BadRequest comes from.

https://github.com/oda-hub/dispatcher-app/blob/bc2be600665c090bd0b9172a4b2cb4b1aad48dc2/cdci_data_analysis/analysis/exceptions.py#L24

But I noticed that also the dispatcher_query has some BadRequest class extension definition (eg exampl).

https://github.com/oda-hub/dispatcher-app/blob/bc2be600665c090bd0b9172a4b2cb4b1aad48dc2/cdci_data_analysis/flask_app/dispatcher_query.py#L46

A refactoring action could indeed the moving of those classes definiton in a commonplace (the exception module looks the best candidate for this)...unless there is a specific reason for those to be there?

Well, it might be convenience to move them to simplify some large files, like dispatcher_query.py But the meaning of defining exceptions elsewhere is to distinguish responsibility of different modules and their own excpetions. Plugins can also have their own exceptions.

It is only important still that they are inherited from a single parent, defined in a common module, like analysis.exceptions

Then, things specific to some modules of dispatcher and not for use by the other modules or also plugins can be in their own modules. E.g. nothing else should raise InstrumentNotRecognized but dispatcher_query.py, where it is defined. It's a kind of encapsulation.

It may be considered more carefully, though, which ones are currently common and which are not.

burnout87 commented 3 years ago

As far as I have seen, with a more in-depth code investigation, when initializing a new InstrumentQueryBackEnd an output message with details regarding a missing parameter or a relative wrong value assigned, a dispatcher response (failed) is generated, including some specific details on the issue.

But during the execution of a query instead. exceptions are raised (eg MissingRequestParameter).

Simply put, is this, the main difference among the main two ways of handling an exception?

volodymyrss commented 3 years ago

The exceptions implement this flow of "piercing through" many levels, which is essential case of many critical issues, when little can be or should be done anymore. This corresponds to exceptions with little details but one simple clear message.

But, in some cases, it is necessary to create more detailed failure response, specifically when job information is needed: e.g. when analysis was running for a while and then failed.

In general, I would just use exceptions as long as it is possible. That's the precise purpose of exceptions. We can also intercept some or all of the to usually finish with build_dispatcher_response. But some exceptions should be still caught but flask handler (to avoid indiscriminate 500 errors)

Ideally, we should explain which failure responses can occur. Normally, it can be put in a schema, e.g. like this: https://github.com/volodymyrss/dqueue/blob/master/dqueue/api.py#L175

But since we not want to suddenly start implementing https://github.com/oda-hub/dispatcher-app/issues/6, the schema could be a json file in the same format. It would readable according to the openapi/swagger specification. I would just start adding them, the possible exceptions, to some file.

volodymyrss commented 3 years ago

sometimes exceptions are caught indiscriminately in critical places, like this here: https://github.com/oda-hub/dispatcher-app/blob/master/cdci_data_analysis/analysis/queries.py#L532