roocs / rook

A Web Processing Service for roocs: remote operations on climate simulations.
https://rook-wps.readthedocs.io/en/latest/
Apache License 2.0
5 stars 6 forks source link

Better error reporting in rooki #166

Open JamesVarndell opened 3 years ago

JamesVarndell commented 3 years ago

Hi @cehbrecht and @agstephens,

I've just been looking over our first day of live CMIP6 requests, and it looks like we got a total of 167 requests. 85% of these were successful, but we have 27 failed requests, and there's a consistent theme - most of the failed requests asked for a very small area subset for their data (possibly in an attempt to extract just a single point).

How we want to handle these kind of requests is one problem, but more imminently important is that we properly report to users what went wrong with their request. We had a single user who made most of those failed requests, because the error they received simply said 'Sorry, process failed' (inherited from rooki). If this had said something like 'Invalid spatial subset', that would have been much more useful and the user would probably have corrected their mistake.

I assume clisops is actually raising a useful error somewhere, so the question is - is it possible to get that error propagated down into the final rooki error?

Many thanks!

cehbrecht commented 3 years ago

Actually we should already capture the errors. I have no "sorry" ones in my list from today: https://nbviewer.jupyter.org/github/roocs/rooki/blob/master/notebooks/tests/test-c3s-cmip6-subset-errors-dkrz-2021-03-23.ipynb

They might still occur when the pywps request itself was not accepted (internal to pywps) ... and we have not captured the metalink output.

rooki just displays the error message. So, it is on the rook side.

cehbrecht commented 3 years ago

For the stats, on dkrz side I have 126 requests:

select from pywps_requests where operation='execute' and identifier='orchestrate' and time_start>='2021-03-23' order by time_end DESC;
--
(126 rows)

And from these 10 failed:

select from pywps_requests where operation='execute' and identifier='orchestrate' and time_start>='2021-03-23' and status=5 order by time_end DESC;
--
(10 rows)
cehbrecht commented 3 years ago

@agstephens do you have other failures on your side than those we have collected, like process rejected ... or so ...

JamesVarndell commented 3 years ago

Thanks a lot @cehbrecht - I've integrated the response.status into the exceptions raised by the CDS adaptor.

The one issue I would raise is that the exception I get for requesting a bad area subset is not very infortmative - it simply says:

Process error: float divmod

@agstephens is this inherited from clisops? Would it be possible to have something a bit more informative for the users?

Thanks!

agstephens commented 3 years ago

Hi @JamesVarndell: here is an update from our end:

  1. We haven't been able to find where the 'Sorry, process failed' has come from. Please send us a request if you have one so we can track this down.
  2. We are creating a fix for the bounding box issue so that if no grid boxes are found it will raise an appropriate exception.
  3. We have also seen some errors when selecting to YYYY-12-31 on a 360-day calendar, we are writing a fix to nudge the selection to a valid date so that these will just work.
agstephens commented 3 years ago

Thanks a lot @cehbrecht - I've integrated the response.status into the exceptions raised by the CDS adaptor.

The one issue I would raise is that the exception I get for requesting a bad area subset is not very infortmative - it simply says:

Process error: float divmod

@agstephens is this inherited from clisops? Would it be possible to have something a bit more informative for the users?

Thanks!

@JamesVarndell this is the issue where the user selects a bounding box that is so small it falls between the grid box centres. @ellesmith88 is creating a suitable exception for this.