zabuldon / teslajsonpy

Apache License 2.0
57 stars 62 forks source link

feat: include TeslaExceptions for api retries #382

Closed carleeno closed 1 year ago

carleeno commented 1 year ago

To combat transient api exceptions such as 408 when the vehicle has limited connectivity, the retry logic has been updated to handle TeslaException as well the existing httpx.RequestError

Note that a 15 second max timeout is used to ensure there is plenty of buffer between the last attempt and the 30 second HA core timeout for the component to finish updating. Cars with poor connectivity often fail for a 408 with a timeout response, these failures eat up ~10 seconds for each attempt, so using a max retry window of 15 seconds means we shouldn't exceed much more than 25 seconds for a complete update request.

Tested in dev HA container with a model 3 and additional logging:

carleeno commented 1 year ago

leaving in draft while I soak test this for a couple of days with my real HA instance to ensure this doesn't cause long term issues, and provides a measurable benefit as my car is typically parked in a weak signal area.

carleeno commented 1 year ago

sorry for the delay with this one, I've encountered a bug with backoff that's making this difficult:

Backoff evaluates the elapsed timed before running the decorated function, but then determines if it should retry or not based on the elapsed time from before. In the case of a timeout (9-10 seconds) on the api request, it will retry even if it is significantly past the the maximum retry time. This is causing us to sometimes hit the 30 second maximum time for HA to wait on the component to finish updating, even though the max wait is only set to 10 seconds. If I set it any shorter, we will not have enough retries for it to be effective.

I'm going to submit a PR to backoff to fix this bug, hopefully it won't take too long to go through and get a new release, but I fear that backoff is not very maintained anymore.

carleeno commented 1 year ago

ahh actually someone already submitted that pr for backoff, a year ago and it's still sitting :(

Maybe we could use a better maintained retry library such as Tenacity? If there's not an objection to switching I'll work it into this PR and see if I come across different speed bumps with it...but there doesn't seem to be a way to make this work well with the backoff library.

Or another option is that I write a custom retry decorator just for our needs, but seems a bit wasteful.

alandtse commented 1 year ago

We can move to something else. Double check it won't conflict with core though.

carleeno commented 1 year ago

Thanks, it looks promising so far, going to test on my main HA instance the rest of the weekend to see how it does.

alandtse commented 1 year ago

I know we don't really have any test coverage on controller, but do you think given the new functionality we should start adding?

carleeno commented 1 year ago

it's never a bad time to start doing the right thing haha.

Let me take a look to see if a lot of work needs to be done to allow better tests for controller...if so maybe it could/should be tackled in a separate PR, especially given the recent ask about tests for the poll/sleep logic.

alandtse commented 1 year ago

Ok, let's put in some basic testing for the new features you're adding here and we can revisit more tests for older functionality in a separate PR.

carleeno commented 1 year ago

Ok, let's put in some basic testing for the new features you're adding here and we can revisit more tests for older functionality in a separate PR.

sorry for the delay. I added tests for the new retry logic.