request_raw should not return tenacity.RetryError

ivanprado commented 4 years ago

IMO we should never raise tenacity.RetryError. Instead, we should raise the underlying RequestError for request errors, potentially a new exception for network errors (or maybe just reuse RequestError, or request_processor.get_latest_results() otherwise. We could include the number of retrials in RequestError in case we want to make this information accessible by the client (or even include the tenacity.RetryError inside, as it can include interesting information).

More was discussed here: https://github.com/scrapinghub/scrapinghub-autoextract/pull/7#discussion_r454970102

ivanprado commented 4 years ago

More on that: scrapinghub-autoextract raises a RetryError exception when retries fail, but the wrapped exception when retries are disabled.

IMO this is not the right API, because forces the user to handle two cases for each possible error. For example, imagine the developer wants to do something when RequestError is raised. It has to catch two exceptions:

RequestError that are raised when no retries are in place.
RetryError raised when retries are possible and all retries fail. The user has to get the underlying cause from the RetryError

I propose to always raise the underlying cause (e.g. RequestError).

Ways to implement it:

change the retrial code adding the parameter reraise=True. See https://tenacity.readthedocs.io/en/latest/#error-handling.
One argument could be that the retrials information is lost using the approach above. An alternative is to catch the RetryError, unwrap the exception, and inject into the raised exception some metadata about the retrials (e.g. a property with the number of trials)

victor-torres commented 4 years ago

At least for now I cannot think about a bad side-effect of adding reraise=True. Let's just ask for opinions from @ejulio and @kmike.

ejulio commented 4 years ago

I'll just add that, we should have a way to know it was retried (we should be able to distinguish a non retried error and a retried one). To some extent, I prefer two types over a property in the class. One way to circumvent it would be to have RequestError as base and RequestRetriesExceededError(RequestError) as another class. Then, if the user doesn't want to get the different, just catch RequestError, otherwise they'll have a way to catch both and handle them separately.

kmike commented 3 years ago

Is this fixed by https://github.com/scrapinghub/scrapinghub-autoextract/pull/27?

ivanprado commented 3 years ago

Thanks, @kmike for pointing this up. This was fixed by #27 . Closing

zytedata / zyte-autoextract

request_raw should not return tenacity.RetryError #11