motoz / PellMon

Logging, monitoring and configuration for pellet burners.
27 stars 14 forks source link

INFO - Timeout again, give up and return fail #85

Closed LaustL closed 6 years ago

LaustL commented 6 years ago

Hi all,

After update Debian software on raspberry and Pellmon to ver. 0.7.0 I get problems with time out after some time. I get following message

INFO - Timeout again, give up and return fail and

INFO - error in retry [Errno 0] GetItem failed and

INFO - Openweathermap update error

I can reboot the Raspberry and Pellmon runs agin, but efter some time I get the time out problems.

Anyone there can help with those problems?

Thanks in advance

Laust Laursen

--

motoz commented 6 years ago

Some more info would be good to have: Which plugins are you running? Do you get any measurement values after the error, or does it stop completely?

LaustL commented 6 years ago

I don't get any measurement values after the time out and I runs following plugins :

INFO - Activated plugins: ScotteCom, RaspberryGPIO, OWFS, SiloLevel, Consumption, Cleaning, Onewire, Openweathermap, Exec

motoz commented 6 years ago

You could try disabling one plugin at time to try to locate the plugin that is failing.

Are you really running both OWFS and OneWire at the same time? That's of course possible, but it could also be a mistake (that's why I'm asking). What do you have connected the those plugins (hardware wise)?

The Exec plugin can of course cause all sorts of problems since you can use it to run anything, what do use it for?

The Openweathermap plugin is not that much tested, so it could very well be that one.

LaustL commented 6 years ago

I have now deactivated OWFS, Onewire, Openweathermap, Exe plugins but I still get time outs.? I think it works fine before I updated Raspberry software and Pellmon software also with all the plugins activated?

motoz commented 6 years ago

Ok, that leaves scottecom and raspberrygpio then as possible problem sources, I suppose. Are you on raspbian jessie or stretch?

I'm not using scottecom anymore myself (switched to a V7 controller), so it's a bit hard to debug this. You coul try to set: loglevel = debug in your config, that could give some more info in the log.

motoz commented 6 years ago

I found a possible explanation. When I originally made the first version of the python script that talked to scotte burners, that later became pellmon, I found a strange problem. The burner would just stop responding every now and then. When trying to find out what happened by manually trying different things in a serial terminal connected to the burner I found out that sending a request for a specific frame kicked it into gear again and it started responding normally. I put this code into the script and it has been part of the pellmonsrv logger since then, even as the code for scotte burners was later moved out to a plugin.

The 'solution' was a bit ugly and I had no idea why it was needed, but anyway with the workaround pellmon run without problems for years for lots of users. But now I see that I messed up the 'scotte workaround' with the big redesign of the pellmon internal database in 0.7, ie. whatever it did before it doesn't do anymore...

Do you think you could manage to run pellmon in debug mode from source according to this guide: https://github.com/motoz/PellMon/wiki/Contributing-to-PellMon ?

The error is here: https://github.com/motoz/PellMon/blob/master/src/Pellmonsrv/pellmonsrv.py#L256 It should be something like:

                            # Strange fix for stange problem with some scotte burners
                            scottefix = conf.database['oxygen_regulation'].value

I can make a .deb with the (possible) fix later.

My scotte was version 6.33 btw, I always assumed that the workaround wasn't needed for all scotte versions. What version do you have?

LaustL commented 6 years ago

I have now tried to run but stop when I get to the DBUS stuff, I don't know what to do?.

I have find the pellmonsrv.py file and see the same error

udklip

I run raspbian jessie on scotte chip version detected as: 6.82

I can see I in the event log have this info 2017-11-29 21:03:18,879 - INFO - invalid setting for plugin_dirs I dont know if this info always has been there What means this info ?

motoz commented 6 years ago

I made a new pre-release, v0.7.1-alpha1, that contain the fix. Could you check if it helps? (It's completely untested)

LaustL commented 6 years ago

Okay thanks. I will try to install the new pre-release for test and let you know if it helps,

LaustL commented 6 years ago

Unfortunately, I still got timeout. After I had install the pre-release it run smoothly during the night, but here to day there has been two timeout :-(

motoz commented 6 years ago

The timeouts are expected, but does it recover from them without restarting or not? What exactly does the log look like? Did you try setting loglevel=debug in your config?

LaustL commented 6 years ago

It does not recover, screendump of system with out data and document with log file

capture

Log.docx

motoz commented 6 years ago

Thanks, I'll have to try to start up the old controller and see if I can reproduce the problem without a burner connected. I suppose you'll have to go back to the old pellmon version in the mean time.

LaustL commented 6 years ago

Okay I will go back to the old pellmon version

LaustL commented 6 years ago

Now I am confused. It is very strange, I am now back on the old version and running version 0.6.1 and I still got the time outs.

motoz commented 6 years ago

That's interesting... Then it looks like it's either a hardware problem or something actually broke in the latest updates to raspbian jessie. Both seems a bit unlikely so I was quite willing to accept that the problem was pellmon v0.7.0. Do you have another usb-serial adapter to try with? Or if you still have the raspbian image you originally burned to the sd-card you could try starting over with that on a spare sd-card. If it really is problem with raspbian then another thing to try is of course the new raspbian stretch image.

LaustL commented 6 years ago

I have bought a new usb-serial adaptor and I am now running a new raspian stretch image with pellmon 0.7.0. It have now been runing i sevel days with out any time outs. So it seems to be the adator that have been the problem. I still get follwing warnings: -invalid setting for plugin_dirs -Python module pyowm is missing -Openweathermap plugin error: No module named pyowm -Failed to activate plugins: Openweathermap

motoz commented 6 years ago

So it was a hardware fault after all, great that you have it running.

The warning invalid setting for plugin_dirs is harmless. You don't have that setting and it warns that the setting is empty. Quite unnecessary, I should remove that.

Python module pyowm is missing. The openweathermap plugin needs the python module 'pyowm', install it with sudo pip install pyowm.

Failed to activate plugins: Openweathermap: same as above

LaustL commented 6 years ago

It is now running perfectly. Thanks for the great support. Merry christmas.