lafrech / oem_gateway

Part of OpenEnergyMonitor project. Gateway from data source to target database.
20 stars 19 forks source link

Data corruption after high CPU load #9

Closed modenet closed 6 years ago

modenet commented 11 years ago

I use oemgateway on a raspberry with image raspbmc, the raspberry is used occasionally also for watching movies on hd tv (= high cpu). Yesterday after saw a film I noticed that oemgateway relayed values ​​higher than 150-200w of reality (compared with emonglcd). only restarting oemgateway the values ​​passed by oemgateway to emoncms returned correct. It happenedd another time after viewing a film. thanks

[Edit: Title modification - Jérôme]

lafrech commented 11 years ago

This is interesting...

I understand it is permanent, right ? I mean once it happens, you won't get normal values back until you restart the service, even if you don't use the CPU intensively anymore ?

At least it seems sort of reproducible. I'd have a look in the logs to see where the error appears.

modenet commented 11 years ago

Yes It's permanent, even with low CPU . Il giorno 14/set/2013 13:44, "Jérôme Lafréchoux" notifications@github.com ha scritto:

This is interesting...

I understand it is permanent, right ? I mean once it happens, you won't get normal values back until you restart the service, even if you don't use the CPU intensively anymore ?

At least it seems sort of reproducible. I'd have a look in the logs to see where the error appears.

— Reply to this email directly or view it on GitHubhttps://github.com/Jerome-github/oem_gateway/issues/9#issuecomment-24443585 .

lafrech commented 11 years ago

Can you find anything relevant in the logs ?

modenet commented 11 years ago

Not yet observed Il giorno 14/set/2013 14:08, "Jérôme Lafréchoux" notifications@github.com ha scritto:

Can you find anything relevant in the logs ?

— Reply to this email directly or view it on GitHubhttps://github.com/Jerome-github/oem_gateway/issues/9#issuecomment-24443919 .

modenet commented 11 years ago

no, in the log (limited to warning issue) I can observe:

misorder data log:

2013-09-13 09:44:01,649 WARNING Send failure 2013-09-13 11:39:16,170 WARNING Send failure 2013-09-13 12:03:57,235 WARNING Send failure 2013-08-28 22:23:18,393 WARNING Misformed RX frame: ['>\x00', '8b'] 2013-08-28 22:23:18,600 WARNING Misformed RX frame: ['\x01'] 2013-08-28 02:00:53,799 WARNING Misformed RX frame: ['\x01'] 2013-08-28 02:00:54,205 WARNING Misformed RX frame: ['\x01'] 2013-09-13 15:36:50,201 WARNING Send failure 2013-09-13 16:39:39,137 WARNING Send failure 2013-09-13 17:52:38,223 WARNING Send failure ..

some buffer errors:

2013-09-17 11:50:55,362 WARNING Couldn't send to server, HTTPError: 404 2013-09-17 12:28:10,362 WARNING Couldn't send to server, HTTPException 2013-09-17 12:28:12,041 WARNING Couldn't send to server, Exception: Traceback (most recent call last): File "/root/oemgateway/oemgatewaybuffer.py", line 147, in _send_data result = urllib2.urlopen(url_string) File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen return _opener.open(url, data, timeout) File "/usr/lib/python2.7/urllib2.py", line 401, in open response = self._open(req, data) File "/usr/lib/python2.7/urllib2.py", line 419, in _open '_open', req) File "/usr/lib/python2.7/urllib2.py", line 379, in _call_chain result = func(_args) File "/usr/lib/python2.7/urllib2.py", line 1211, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open r = h.getresponse(buffering=True) File "/usr/lib/python2.7/httplib.py", line 1034, in getresponse response.begin() File "/usr/lib/python2.7/httplib.py", line 407, in begin version, status, reason = self._read_status() File "/usr/lib/python2.7/httplib.py", line 365, in _read_status line = self.fp.readline() File "/usr/lib/python2.7/socket.py", line 447, in readline data = self._sock.recv(self._rbufsize) error: [Errno 104] Connection reset by peer ... 2013-09-19 21:15:02,951 WARNING Send failure 2013-09-19 21:19:24,920 WARNING Send failure 2013-09-19 21:23:33,500 WARNING Send failure 2013-09-19 21:27:37,943 WARNING Couldn't send to server, Exception: Traceback (most recent call last): File "/root/oemgateway/oemgatewaybuffer.py", line 147, in _send_data result = urllib2.urlopen(url_string) File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen return _opener.open(url, data, timeout) File "/usr/lib/python2.7/urllib2.py", line 401, in open response = self._open(req, data) File "/usr/lib/python2.7/urllib2.py", line 419, in _open '_open', req) File "/usr/lib/python2.7/urllib2.py", line 379, in _call_chain result = func(_args) File "/usr/lib/python2.7/urllib2.py", line 1211, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open r = h.getresponse(buffering=True) File "/usr/lib/python2.7/httplib.py", line 1034, in getresponse response.begin() File "/usr/lib/python2.7/httplib.py", line 407, in begin version, status, reason = self._read_status() File "/usr/lib/python2.7/httplib.py", line 365, in _read_status line = self.fp.readline() File "/usr/lib/python2.7/socket.py", line 447, in readline data = self._sock.recv(self._rbufsize) error: [Errno 104] Connection reset by peer

2013/9/14 Jérôme Lafréchoux notifications@github.com

Can you find anything relevant in the logs ?

— Reply to this email directly or view it on GitHubhttps://github.com/Jerome-github/oem_gateway/issues/9#issuecomment-24443919 .

lafrech commented 11 years ago

Some of the send failures are probably linked to the server being stopped: http://openenergymonitor.org/emon/node/2838, although not those of the 19th of september.

I'm just surprised we get those tracebacks, which means the error wasn't caught by usual exceptions here: https://github.com/Jerome-github/oem_gateway/blob/master/oemgatewaybuffer.py#L148

Now, the first log

2013-09-13 09:44:01,649 WARNING Send failure 
2013-09-13 11:39:16,170 WARNING Send failure 
2013-09-13 12:03:57,235 WARNING Send failure 
2013-08-28 22:23:18,393 WARNING Misformed RX frame: ['>\x00', '8b'] 
2013-08-28 22:23:18,600 WARNING Misformed RX frame: ['\x01'] 
2013-08-28 02:00:53,799 WARNING Misformed RX frame: ['\x01'] 
2013-08-28 02:00:54,205 WARNING Misformed RX frame: ['\x01'] 
2013-09-13 15:36:50,201 WARNING Send failure 2013-09-13 16:39:39,137 WARNING Send failure 2013-09-13 17:52:38,223 WARNING Send failure
modenet commented 11 years ago

in every case there are not evidence in the log of higher value after high cpu on raspberry (first post)

2013/9/20 Jérôme Lafréchoux notifications@github.com

Some of the send failures are probably linked to the server being stopped: http://openenergymonitor.org/emon/node/2838, although not those of the 19th of september.

I'm just surprised we get those tracebacks, which means the error wasn't caught by usual exceptions here:

https://github.com/Jerome-github/oem_gateway/blob/master/oemgatewaybuffer.py#L148

Now, the first log

2013-09-13 09:44:01,649 WARNING Send failure 2013-09-13 11:39:16,170 WARNING Send failure 2013-09-13 12:03:57,235 WARNING Send failure 2013-08-28 22:23:18,393 WARNING Misformed RX frame: ['>\x00', '8b'] 2013-08-28 22:23:18,600 WARNING Misformed RX frame: ['\x01'] 2013-08-28 02:00:53,799 WARNING Misformed RX frame: ['\x01'] 2013-08-28 02:00:54,205 WARNING Misformed RX frame: ['\x01'] 2013-09-13 15:36:50,201 WARNING Send failure 2013-09-13 16:39:39,137 WARNING Send failure 2013-09-13 17:52:38,223 WARNING Send failure

— Reply to this email directly or view it on GitHubhttps://github.com/Jerome-github/oem_gateway/issues/9#issuecomment-24799280 .

lafrech commented 11 years ago

My former message was incomplete. I edited to complete, but I guess I screwed up. Sorry about that.

I said I didn't understand the timestamps here:

2013-09-13 12:03:57,235 WARNING Send failure 
2013-08-28 22:23:18,393 WARNING Misformed RX frame: ['>\x00', '8b'] 

How can we have those lines in this order in the log ?!

Anyway, I'd like to see what the DEBUG log provides when the data becomes corrupted. I think it would show what exactly is received. But I don't see how we could lose data in the serial port.

modenet commented 11 years ago

ok I change the loglevel to DEBUG.

2013/9/20 Jérôme Lafréchoux notifications@github.com

My former message was incomplete. I edited to complete, but I guess I screwed up. Sorry about that.

I said I didn't understand the timestamps here:

2013-09-13 12:03:57,235 WARNING Send failure 2013-08-28 22:23:18,393 WARNING Misformed RX frame: ['>\x00', '8b']

How can we have those lines in this order in the log ?!

Anyway, I'd like to see what the DEBUG log provides when the data becomes corrupted. I think it would show what exactly is received. But I don't see how we could lose data in the serial port.

— Reply to this email directly or view it on GitHubhttps://github.com/Jerome-github/oem_gateway/issues/9#issuecomment-24801048 .

lafrech commented 11 years ago

Thanks. Or perhaps you can do that only when you intend to have a high CPU load.

Note that you don't have to restart the gateway to change the DEBUG level. The config file is checked regularly.

lafrech commented 10 years ago

Hi.

Did this issue reoccur ?

Regarding the power computation, I may have an element of explanation if the power is computed in emonCMS.

Because the power is the energy divided by the time, if the energy is correctly transmitted, but with a wrong timestamp, then the power is wrongly computed.

The timestamp is added when the data is flushed from the serial port. Normally, the gateway runs fast enough to flush the serial port so that the delay is not noticed. However, if it is stuck 10 seconds trying to send something to emoncms, then the flushing of the serial port is delayed.

Obviously, if some other program stalls the Pi, there's not much we can do about that, apart from increasing the execution priority. If it is the sending process that stalls, then multi-threading would help.

Potential improvement leads: