lephisto / tesla-apiscraper

API Scraper for pulling Vehicle Statistics from the Tesla Owner API into an InfluxDB + Grafana Dashboards
GNU Lesser General Public License v3.0
365 stars 104 forks source link

Crashes in refresh_vehicle #45

Closed ltfiend closed 5 years ago

ltfiend commented 5 years ago

I've gotten the below crash a few times and lost data because of it. Appears to be a failure to resolve the tesla.services domain. To mitigate it I've wrapped the refresh_vehicle on line 445 in a try to catch when it happens and prevent it from crashing. So far I've seen it three times, each one isolated.

--------- Error ---------- Traceback (most recent call last): File "apiscraper.py", line 431, in state_monitor.refresh_vehicle() File "apiscraper.py", line 99, in refresh_vehicle self.connection.refresh_vehicle() File "/home/peter/tesla-apiscraper/teslajson.py", line 79, in refresh_vehicle self.vehicles = [Vehicle(v, self) for v in self.get('vehicles')['response']] File "/home/peter/tesla-apiscraper/teslajson.py", line 83, in get return self.post(command, None) File "/home/peter/tesla-apiscraper/teslajson.py", line 92, in post return self.open("%s%s" % (self.api, command), headers=self.head, data=data) File "/home/peter/tesla-apiscraper/teslajson.py", line 126, in open resp = opener.open(req) File "/usr/lib64/python2.7/urllib2.py", line 431, in open response = self._open(req, data) File "/usr/lib64/python2.7/urllib2.py", line 449, in _open '_open', req) File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain result = func(*args) File "/usr/lib64/python2.7/urllib2.py", line 1258, in https_open context=self._context, check_hostname=self._check_hostname) File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open raise URLError(err) urllib2.URLError: <urlopen error [Errno -2] Name or service not known>

------ Patch --------

*** apiscraper.py 2019-04-15 14:27:42.433000000 -0400 --- apiscraper.py 2019-04-15 21:03:20.739000000 -0400


* 442,448 **

We cannot be sleeping with small poll interval for sure.

      # In fact can we be sleeping at all if scraping is enabled?
      if poll_interval >= 64 or resume:

! state_monitor.refresh_vehicle()

Car woke up

      if is_asleep == 'asleep' and state_monitor.vehicle['state'] == 'online':
          poll_interval = 0

--- 443,452 ----

We cannot be sleeping with small poll interval for sure.

      # In fact can we be sleeping at all if scraping is enabled?
      if poll_interval >= 64 or resume:

! try: ! state_monitor.refresh_vehicle() ! except: ! logger.info("Hostname Exception Caught")

Car woke up

      if is_asleep == 'asleep' and state_monitor.vehicle['state'] == 'online':
          poll_interval = 0
wishbone1138 commented 5 years ago

I also see this with a 502 error

urllib.error.HTTPError: HTTP Error 502: Bad Gateway

I noticed that the teslajaon call doesn't handle errors well. Ultimately should probably be where these are handled, but it's a lot more work than trying it at the refresh_vehicle level. Thanks for the patch.

lephisto commented 5 years ago

Sometimes happens, when Owner API throws a 502. Make sure that:

from urllib.error import URLError

is not commented out, there was a intermediate release when this was the case. Try current master..

ltfiend commented 5 years ago

I just checked, that line is not commented out.

grep -R URLError * apiscraper.py:from urllib.error import URLError apiscraper.py: except (KeyError, HTTPError, URLError) as details: apiscraper.py: except (HTTPError, URLError) as exc:

wishbone1138 commented 5 years ago

I'm running current master, verified that line is not commented out.

lephisto commented 5 years ago

Is it possible, that the Daemon had network interruption?

ltfiend commented 5 years ago

There wasn't a wide spread network interruption and an attempt to refresh_vehicle immediately after resulted in success. I think that when urllib fails to resolve the hostname it errors out. Since you're only serving as a stub resolver every time you check the car there are two DNS queries made to your recursive server. I don't know if urllib could be made to cache the DNS record so you query less frequently but with a 60 second TTL on owner-api.vn.tesla.services that probably isn't worth the effort.

Here is the output of my patch running for a few days. You'll notice it usually doesn't happen twice in a row (there is a 3 in a row but that's the first time I've seen that and the tesla servers were having issues last night). I'm running a local recursive server on the same virtual machine so it's quick to get the DNS response but there are infrequent failures either due to the nature of UDP DNS or tesla.service servers offline or some other factor.

Apr 15 21:47:13 tesla-monitor apiscraper.py: 2019-04-15 21:47:13 INFO Hostname Exception Caught Apr 16 14:55:38 tesla-monitor apiscraper.py: 2019-04-16 14:55:38 INFO Hostname Exception Caught Apr 17 11:13:06 tesla-monitor apiscraper.py: 2019-04-17 11:13:06 INFO Hostname Exception Caught Apr 17 11:17:29 tesla-monitor apiscraper.py: 2019-04-17 11:17:29 INFO Hostname Exception Caught Apr 17 11:21:51 tesla-monitor apiscraper.py: 2019-04-17 11:21:51 INFO Hostname Exception Caught Apr 17 14:31:04 tesla-monitor apiscraper.py: 2019-04-17 14:31:04 INFO Hostname Exception Caught Apr 18 02:47:11 tesla-monitor apiscraper.py: 2019-04-18 02:47:11 INFO Hostname Exception Caught

Regardless of the reason for the failure there is no reason the program should crash out rather than gracefully try again. If you wanted to make a count and if it happens X times in a row then crash maybe that would be ok but even so my personal opinion is that data capture should take precedence above all else. The script should handle any error and retry after some period of time so that you don't lose longer segments of time of data for a momentary failure or even a longer term failure that is the result of external forces.

Thanks for all the hard work on this.

Peter

lephisto commented 5 years ago

Thanks for investigating, and pointing out what's causing the failure in your case. I will errorhandle dns resolution as well when I'm back from vaction.

lephisto commented 5 years ago

Do you want to submit a PR for it?

wishbone1138 commented 5 years ago

I just want to add that this was also killing my script every 24 hours or so on a bad gateway 502 message. Simply catching DNS specific issues would still potentially result in exceptions that should probably still be handled.

With this said, the last exception I caught on this was on 2019-04-18.

ltfiend commented 5 years ago

I just want to add that this was also killing my script every 24 hours or so on a bad gateway 502 message. Simply catching DNS specific issues would still potentially result in exceptions that should probably still be handled.

With this said, the last exception I caught on this was on 2019-04-18.

I get this occasionally but it hasn't caused crashes in my case (with or without the patch). Httplib which is used by urllib2 can throw this error. If that's when it's causing your crash then this patch should prevent it.

Pull request submitted.

wishbone1138 commented 5 years ago

I get this occasionally but it hasn't caused crashes in my case (with or without the patch). Httplib which is used by urllib2 can throw this error. If that's when it's causing your crash then this patch should prevent it.

Catching the error definitely prevents it. I had already done something similar to your patch, just in a different area (catching at the teslajson level).

lephisto commented 5 years ago

patch merged, thanks for contribution