Better API response handling:

vit- commented 3 years ago

add api call retries
add api response data/schema validation

vit- commented 3 years ago

These changes should fix the issue discussed here https://github.com/thomasgermain/vaillant-component/issues/46 The next steps would be to simplify mapper.py by dropping the data cleaning logic (parse dates, enums, etc) out of it and leveraging schema's validation features instead.

thomasgermain commented 3 years ago

Thanks for the PR.

Actually, I'm not sure if pymultimatic needs to validate responses, I'm not against of course. What would be the added value ?

One added value would be to "hide" error when vaillant API response is wrong, but then, you could have really long delay to receive a correct response (if we can receive a correct response), as I can see, there is a retry that can go up to 10 sec.

I think this retry can be useful if added in the integration instead, here is what I see:

pymultimatic validates the response, if not correct, it throws some new exception
the integration catches it and ask again for data (optonaly with the delay).

I'm having some discussion with vaillant in order to fix their issues. I reach them yesterday and they're gonna work on it

vit- commented 3 years ago

pymultimatic is the component that makes API calls, it is responsible for making sure the response from the remote API is in the expected format. It is a common pattern when a high level client incapsulates validation and retry logic. get_system() performs multiple HTTP requests. Retrying the entire get_system() call sounds like overkill if only one underlying HTTP request failed.

a retry that can go up to 10 sec

How big of a delay the integration can tolerate? We can tune the backoff interval and/or the number of retries to meet the toleration point.

I'm having some discussion with vaillant in order to fix their issues. I reach them yesterday and they're gonna work on it

Wow, I am impressed by the fact that the manufacturer reached back. Please keep me posted about the progress.

Oh, almost forgot. Here's the added value I see:

having a declarative schema is easier to maintain. Developer sees the entire picture of a response in one place; no need to implement error-prone logic with safe getting the values and casting/parsing them to appropriate types, the schema library takes care of it
HTTP calls may fail due to lots of reasons (network error, internal server error, etc). Retrying a single failed call is a common practice when building robust failure-tolerant software

thomasgermain commented 3 years ago

Ok, I'm convinced :100:, I'm glad you have such interest for the project :smiley:

How big of a delay the integration can tolerate? We can tune the backoff interval and/or the number of retries to meet the toleration point.

As far as I can see, there is no "limit" from HA point of view: see https://github.com/home-assistant/core/blob/dev/homeassistant/helpers/update_coordinator.py, there is just a log when it receives a TimeoutError.

Maybe a 0.5 backoff could be good ? (So it means 0.5 -> 1 -> 1.5 -> 2, so it fails after 5 seconds). The idea is to retry quite fast and also failed quite fast. it can also be good to have a specific error so it can be caught at integration side and have some logs.

Wow, I am impressed by the fact that the manufacturer reached back. Please keep me posted about the progress.

Yeah, I'm also impressed, but well, I send them a screenshot of the android app where the detail of the zone is blank and I send them a wrong json response I'm receiving.

By the way, it would be good to have the schema validation for every calls in SystemManager

thomasgermain commented 3 years ago

I also created a v4 branch, you can do your PR(s) against that branch, it will be easier to manage other PRs and (beta) releases. This branch will then be merged to master when this is stable

vit- commented 3 years ago

I'm glad you have such interest for the project

I've just moved to a new house, so I'm pretty enthusiastic about automatic everything about it ;)

Maybe a 0.5 backoff could be good ?

Fine by me, fixed.

it would be good to have the schema validation for every calls in SystemManager

I've added validation to all get calls. Could you provide example responses for the put/delete calls?

Also, I'm unsure why few test cases fail. Those parts of code were not changed. I double-checked aioresponses do have the mock for the corresponding request. Something weird is happening.

thomasgermain commented 3 years ago

I've added validation to all get calls. Could you provide example responses for the put/delete calls?

Most of the time, this is something like {'ok': 'ok'}, so I think for such calls (PUT and POST), validating the http status should be sufficient.

Also, I'm unsure why few test cases fail. Those parts of code were not changed. I double-checked aioresponses do have the mock for the corresponding request. Something weird is happening.

I had a quick look and I can't see why neither, I will have a deep look, but certainly during next week (same for the HACS PR)

vit- commented 3 years ago

I found the issue with the tests. In case API responses with code > 399 the Connection raises an ApiError exception. retry_async decorator retries the failed request few more times. There are few cases when response code 409 is expected and should not be retried. I believe some of ApiErrors should be retried (for example, 5xx). This requires implementing additional logic in retry_async. For now, I configured it to retry on SchemaError only.

thomasgermain commented 3 years ago

I found the issue with the tests. In case API responses with code > 399 the Connection raises an ApiError exception. retry_async decorator retries the failed request few more times. There are few cases when response code 409 is expected and should not be retried. I believe some of ApiErrors should be retried (for example, 5xx). This requires implementing additional logic in retry_async. For now, I configured it to retry on SchemaError only.

I think we should only retry when there is an 5xx error or a schema error

Otherwise, everything else sounds good. I'm just expecting some tests on the retry mechanism

thomasgermain commented 3 years ago

It looks good to me, is there any todo left ? If no, I can merge it and start some testing with HA

vit- commented 3 years ago

There's nothing else I wanted to include in this PR. Let's merge.

thomasgermain / pymultiMATIC

Better API response handling: #59