philippelt / netatmo-api-python

Netatmo connect API python client (for Netatmo information, see https://dev.netatmo.com)
GNU General Public License v3.0
186 stars 118 forks source link

Sporadic error when getting weather data #37

Closed genfersee closed 3 years ago

genfersee commented 4 years ago

Hello all,

I am facing an issue which is hapenning quite many times recently. Installed is lnetatmo==1.4.3 and I am calling once every hour the following code:

def getCurrentWeatherData(): try: authorization = lnetatmo.ClientAuth(clientId = "xxx", clientSecret = "xxx", username = "xxx", password = "xxx") devList = lnetatmo.WeatherStationData(authorization) externalTemperature = round(devList.lastData('Brest')['Exterieur']['Temperature'],1) externalHumidity = devList.lastData('Brest')['Exterieur']['Humidity'] internalPressure = round(devList.lastData('Brest')['Appart']['Pressure'],1) expirationdelay=1800 someLost = devList.checkNotUpdated(station='Brest',delay=expirationdelay) if someLost and 'Exterieur' in someLost: currentTemperature = "--" currentHumidity = "--" currentPressure = "--" print ("Error handled: Could not get data from External module") else: currentTemperature = str(externalTemperature) currentHumidity = str(externalHumidity) currentPressure = str(internalPressure) return (currentTemperature,currentHumidity,currentPressure) except: print ("Error handled: Could not get weather data from Netatmo server") return ("--","--","--")

It is working most of the time, since years. But sometimes, once every 2 days, I get the following error:

Exception in thread Thread-12: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "webcam.py", line 193, in run currentTemperature,currentHumidity,currentPressure = getCurrentWeatherData() File "webcam.py", line 355, in getCurrentWeatherData devList = lnetatmo.WeatherStationData(authorization) File "/home/pi/.local/lib/python2.7/site-packages/lnetatmo.py", line 203, in init resp = postRequest(_GETSTATIONDATA_REQ, postParams) File "/home/pi/.local/lib/python2.7/site-packages/lnetatmo.py", line 650, in postRequest resp = urllib2.urlopen(req, timeout=timeout) File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen return opener.open(url, data, timeout) File "/usr/lib/python2.7/urllib2.py", line 429, in open response = self._open(req, data) File "/usr/lib/python2.7/urllib2.py", line 447, in _open '_open', req) File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain result = func(*args) File "/usr/lib/python2.7/urllib2.py", line 1241, in https_open context=self._context) File "/usr/lib/python2.7/urllib2.py", line 1201, in do_open r = h.getresponse(buffering=True) File "/usr/lib/python2.7/httplib.py", line 1121, in getresponse response.begin() File "/usr/lib/python2.7/httplib.py", line 438, in begin version, status, reason = self._read_status() File "/usr/lib/python2.7/httplib.py", line 394, in _read_status line = self.fp.readline(_MAXLINE + 1) File "/usr/lib/python2.7/socket.py", line 480, in readline data = self._sock.recv(self._rbufsize) File "/usr/lib/python2.7/ssl.py", line 766, in recv return self.read(buflen) File "/usr/lib/python2.7/ssl.py", line 653, in read v = self._sslobj.read(len) SSLError: ('The read operation timed out',)

Could you please tell me what I am doing wring? Is it an authentication issue? Or a network issue?

Many thanks in advance :) Genfersee

philippelt commented 4 years ago

I have the same issue here, usually by night.

Looks like Netatmo is refactoring its service infrastructure and this lead to minor service disruption.

I didn’t found it was a big issue for the moment.

genfersee commented 4 years ago

Hello! Thanks for your answer! The problem is that I face the issue 2 to 3 times per day, also during the day now... maybe I should not authenticate each time I am willing to get weather data?

philippelt commented 4 years ago

I am approximately on the same rate of failure. Unfortunately, the error is a timeout error. It means that no server was available to process the request (server failure, load balancer failure, etc...)

It doesn't seem to be related in any way with a throttling issue or authentication issue. I don't think that any change in the client requests flow would decrease the failure rate.

I didn't done tests but it is likely short transient failures. You could try to add a delay/retry in case of failure with a delay of 1 ou 2 minutes to see if you could improve the success rate.

genfersee commented 4 years ago

Many thanks! I will also update to latest version of lnetatmo and let you know if I see some improvements.

genfersee commented 4 years ago

I updated yesterday to v1.6.0 and I still see face the issue: occured once at 11am on 29/06 and once at 7am on 30/06

philippelt commented 4 years ago

On Jun 29th, I had the problem (in UTC +2) at 21h34, 22h04, 23h04. I am controlling temperature every 30min at 04 and 34 of each hour.

I am also monitoring my home network internet access (real time notification) so I know that internet (and DNS) was up and running at that moment.

As I said, it is a timeout error so no upgrade or code change client side will improve the situation. Without any information from Netatmo (which is almost a "tradition" from this company) there will be no way to improve anything. Even when you find bugs in their products, you will not find anyone to talk about possible fixes 😄

Did you try a (reasonable) retry pattern ? As I can leave without the temperature info using the ZWave backup data, I didn't tried.

But now that I look deeper at the failures timing, I can see that it fails both at 21h34 an 22h04 so it could have been totally off for half an our in the worst case that would be quite significant ! In winter, this would have been a problem for me before I switched monitoring to ZWave.

genfersee commented 4 years ago

This would be so strange Netatmo does not provide 100% reliable service since it is not even possible to get the data directly from the device locally. I implemented a retry 10 sec laters and it also failed the second time. Today 30/06, it failed while requesting at 11:56 and 12:00 (UTC+2)...

philippelt commented 4 years ago

Well, I consider that 100% availability do not exist, even in banking systems (I do a lot of cloud/containers architecture and deployments for banks).

So for Netatmo that is providing the service for the fixed initial price of the device, we could consider that the current availability level is not bad, of course it is always nice to have a higher availability but this comes with an exponential cost.

Regarding the availability of local access to the device, I was asking (and not alone) for this capability as soon as 2013 when the first generation device was released but Netatmo never understood that this could be simply made, without interference with the current behavior, and with a very small footprint in the firmware. Just UDP broadcast the data on the LAN at the same time where you upload to Netatmo server and that's all. Easy but unreachable for Netatmo obviously. Worst, when the protocol was reverse engineered long ago, they made firmware upgrade to close it again.

I think that almost every year since, someone in the netatmo API forum is requesting this capability and is usually supported by many others without any intelligent response from Netatmo. Just plain stupid behavior from Netatmo.

Their software team suffer from the common NIH syndrom (Not Invented Here). So they take great care not to add value to the product that would come from anywhere else 😄, even if it's free and if it comes from actual customers who bought the product.

genfersee commented 4 years ago

Fully agree with you, this is a shame Netatmo does not consider such kind of improvement requests.

genfersee commented 4 years ago

Hello, On 5th of July, error happened from 12am to 18pm! No data could be obtained from server at hh:00 of each of these hours...

genfersee commented 3 years ago

Hi! Seems there was no more occurrences for a few days now! :)

genfersee commented 3 years ago

Hello again! I am in fact still facing this issue several times a day...

Here is the code I use. Do you see anything I could improve?

def getCurrentWeatherData():
        authorization = lnetatmo.ClientAuth(clientId = "xxx", clientSecret = "xxx", username = "xxx", password = "xxx")
        weatherData = lnetatmo.WeatherStationData(authorization,unicode('Station','utf-8'))
        theWeatherData = weatherData.lastData()
        externalTemperature = round(theWeatherData[unicode('Extérieur','utf-8')]['Temperature'],1)
        externalHumidity = theWeatherData[unicode('Extérieur','utf-8')]['Humidity']
        internalPressure = round(theWeatherData['Appart']['Pressure'],1)

        isLost = weatherData.checkNotUpdated(1800)
        print("islost: "+str(isLost))
        if isLost == None:
            currentTemperature = str(externalTemperature)
            currentHumidity = str(externalHumidity)
            currentPressure = str(internalPressure)
        else:
            currentTemperature = "--"
            currentHumidity = "--"
            currentPressure = "--"
            print ("Error handled: Could not get data more recent than 30 minutes from Netatmo server")
        return (currentTemperature,currentHumidity,currentPressure) 

Thanks in advance :)

philippelt commented 3 years ago

Hello,

there are still some timeout error from Netatmo servers (i.e. servers not responding to authentication requests).

Trying to characterize the issue, I added a signal to my events log each time my code fail to get a response. I am requesting current temperature data each 30' to decide automated actions in a greenhouse.

This is the failure record of the last month up to now:

 short_local_date |  facility   | severity | resource |    event    | Add. data 
------------------+-------------+----------+----------+-------------+-----------
 2021-05-05 16:04 | ALARM_SERRE | W        | netatmo  | unavailable | {}
 2021-05-05 14:04 | ALARM_SERRE | W        | netatmo  | unavailable | {}
 2021-05-03 17:04 | ALARM_SERRE | W        | netatmo  | unavailable | {}
 2021-05-03 12:04 | ALARM_SERRE | W        | netatmo  | unavailable | {}
 2021-04-26 16:04 | ALARM_SERRE | W        | netatmo  | unavailable | {}
 2021-04-24 10:34 | ALARM_SERRE | W        | netatmo  | unavailable | {}
 2021-04-23 16:34 | ALARM_SERRE | W        | netatmo  | unavailable | {}
 2021-04-23 11:04 | ALARM_SERRE | W        | netatmo  | unavailable | {}
 2021-04-22 10:34 | ALARM_SERRE | W        | netatmo  | unavailable | {}
 2021-04-21 11:34 | ALARM_SERRE | W        | netatmo  | unavailable | {}
 2021-04-20 15:34 | ALARM_SERRE | W        | netatmo  | unavailable | {}
 2021-04-18 18:34 | ALARM_SERRE | W        | netatmo  | unavailable | {}
 2021-04-13 18:04 | ALARM_SERRE | W        | netatmo  | unavailable | {}
 2021-04-12 13:04 | ALARM_SERRE | W        | netatmo  | unavailable | {}
 2021-04-09 11:34 | ALARM_SERRE | W        | netatmo  | unavailable | {}
 2021-04-09 10:34 | ALARM_SERRE | W        | netatmo  | unavailable | {}
 2021-04-08 17:34 | ALARM_SERRE | W        | netatmo  | unavailable | {}
 2021-04-01 17:34 | ALARM_SERRE | W        | netatmo  | unavailable | {}
 2021-04-01 15:34 | ALARM_SERRE | W        | netatmo  | unavailable | {}
 2021-04-01 12:04 | ALARM_SERRE | W        | netatmo  | unavailable | {}

These errors are "timeout error". It means that Netatmo server failed to send a response to the authentication request not that they are unreachable or internet connexion is down (I am monitoring several servers on Internet so I know by the way that the connexion is up).

As you can see, it happens in a kind of 'series' of days from time to time. This is typically the behavior that we could encounter when Netatmo make changes to their infrastructure (or pushes new version). Sometimes it works immediately (like on 2021/04/01 and sometime, it fails and require adjustments like 20,21,22,23 and 24 of april.

I tried to implement a "retry pattern" in my code, waiting for 1 min before retrying to authenticate (2 times) but it mostly failed. I suppose that when a failure is in progress, it takes more than a few minutes to recover. As I request data every 30 min, I finally decided to live with the problem and just give away when authentication failed. The next call 30 min later will probably succeed.

I don't know what is you use case but the only change that I would suggest would be to implement a retry pattern with a long delay (probably more than 5 min) and keeping the last value in cache if you need to provide an answer very fast. Temperature do not vary so quickly that the value will be unusable before 1 or 2 hours.

For my use case, even in winter during the night, when temperature changes can occurs quite fast and when I need to adjust heaters in case of excessive cold, I know that I can miss one measurement and introduce a delay of one hour.

I am afraid there is nothing we can do. Netatmo could have integrate, as many users have requested from day one, the capability to access local measurement data but they systematically rejected the idea and it will be the case until competition will release a comparable product with such additional feature.

Then they will have to adapt or disappear. It requires some creativity/innovation to stay a leader.

genfersee commented 3 years ago

Many thanks for your detailed answer! I will implement a retry mechanism. Indeed, having the ability to get the data directly from the station in local would be a great improvement...