moroen / IKEA-Tradfri-plugin

A Python plugin for Domoticz to controll IKEA Gateway
68 stars 23 forks source link

Devices disappearing #127

Closed McMornan closed 3 years ago

McMornan commented 3 years ago

Hi,

sorry to bother again. Still latest version of Domoticz and your plugin. (did an update of libs as well just to make sure)

I register the bridge as described in the setup documentation. All my lamps, groups etc. are discovered and added as devices. Next morning they are either gone completely - except the gateway reboot switch device - or have been removed and readded.

This makes it a little hard to use the devices ...

I do a nightly restart of domoticz to make a full backup - could this be an issue ? I have also "allow new devices" enabled constantly.

thanks, Lars

Bolten88 commented 3 years ago

Have you checked the devices section? Are the devices there? Also check the hardware section, is the plugin still enabled?

McMornan commented 3 years ago

Hi, yes the plugin is enabled.

However if I click on "update" in the hardware panel (update of the ikea plugin device settings) the devices are re-added. And yes I checked the devices panel. All ikea devices have new idx numbers after having been readded.

Lars

moroen commented 3 years ago

@McMornan

So when you click "Update", the old devices are removed and re-added with new idx'es? Does the same happen if you stop and restart domoticz?

Could you please enable debug, and provide the relevant sections from the log?

McMornan commented 3 years ago

not quite. Devices are gone after the restart in the morning. When I update the ikea hardware device, the devices are added again - unfortunately wiht new device IDs.

The devices seem still to be there when I do a domoticz restart (sudo systemctl restart domoticz.service) regardless if I enable adding new devices or not in teh settings

I will enable debug and see what the log tells me overnight ...

moroen commented 3 years ago

Well, then your problem seems related to the fact that the devices disappears from the database.

When the plugin is added, updated or domoticz starts, the devicelist from the tradfri gateway is compared to the devicelist in domoticz. Any devices not present in domotivz will be added, and any devices present in domoticz but not on the gateway, will be removed from the device-list in domoticz.

BTW, why do you restart domoticz in the morning?

McMornan commented 3 years ago

I do a daily backup - doeing a backup with a live database can have unpredictable results. Thats why I stop domoticz in the early morning, do the backup and then restart.

Is it possible that the connection to the tradfri gateway has not been resumed yet (thus no devices present) when this check is done ? Is is possible to prevent removal of devices not found ?

Bolten88 commented 3 years ago

I do a daily backup - doeing a backup with a live database can have unpredictable results. Thats why I stop domoticz in the early morning, do the backup and then restart.

Is it possible that the connection to the tradfri gateway has not been resumed yet (thus no devices present) when this check is done ? Is is possible to prevent removal of devices not found ?

I do domoticz backups twice a day once in the night while domoticz is running. Never had issues restoring the database.

moroen commented 3 years ago

@McMornan

Is it possible that the connection to the tradfri gateway has not been resumed yet (thus no devices present) when this check is done ? Is is possible to prevent removal of devices not found ?

A missing connection to the gateway can't explain this. If the gateway is not responding, the plugin waits and retries. Removing devices no longer present on the gateway is a fundamental function, and I seriously doubt that this is the problem, unless the gateway actually responds with a empty list or somehow have changed the coap endpoints, which I find highly unlikely.

As you have seen yourself, restarting domoticz without performing the backup does not trigger this:

The devices seem still to be there when I do a domoticz restart (sudo systemctl restart domoticz.service) regardless if I enable adding new devices or not in teh settings

Is it possible that something in your backup procedure resets (or unwittingly restores a previous version of) the database?

McMornan commented 3 years ago

Hm, my script is quite simple:

!/bin/bash

if [ ! -d "/mnt/diskstation/Software" ]; then sudo mount -a fi

DIR=$1 TIME=date +%m-%d-%y # This Command will add date in Backup File Name. FILENAME=backup-$TIME.tar.gz # Here i define Backup file name format. SRCDIR=/home/pi # Location of Important Data Directory (Source of backup). DESDIR=/data/$DIR # Destination of backup file. tar -cpzf $DESDIR/$FILENAME $SRCDIR cp $DESDIR/$FILENAME /mnt/diskstation/Software/code/domoticz cp $SRCDIR/domoticz/Config/zwcfg_0xdc88a28a.xml $DESDIR/zwbackup-$TIME.xml cp $SRCDIR/domoticz/Config/zwcfg_0xdc88a28a.xml /mnt/diskstation/Software/Code/domoticz/zwbackup-$TIME.xml

cp $SRCDIR/domoticz/Config/ozwcache_0xdc88a28a.xml $DESDIR/zwcachebackup-$TIME.xml cp $SRCDIR/domoticz/Config/ozwcache_0xdc88a28a.xml /mnt/diskstation/Software/Code/domoticz/zwcachebackup-$TIME.xml

END

I will enable backup and check if I find out more. If I am the only one with this problem, its probably me, no the plugin :(

tomg06 commented 3 years ago

Hi,

I've had the same issue today, all tradfri lamps disappeared without restarting domoticz. There is no log of this. When I add press update in the hardware section they reappear in devices but indeed with a different IDX. Results in all blockly's not functioning any more.

moroen commented 3 years ago

Strange!

The latest version of the plugin was released 14 days ago, and unless you've recently updated, it's a bit difficult to understand why this happens now. Have domoticz been updated? (I use domoticz from source, no automatic updates, so I wouldn't know). What it the firmware version of the gateway?

I'm unable to reproduce this error, both my test and production systems works as expected.

tomg06 commented 3 years ago

Well maybe this has something to do with a update in homebridge? I've done a update on this yesterday to version 1.2.4. But I thought this had nothing to do with domoticz in a way i could effect the tradfri plugin.

Domoticz: 2020.2 IKEA Tradfri Plugin: 0.10.1 Tradfricoap: 0.0.13 Py3coap: 0.9.1

Uptime domoticz is : 13 Days, 14 Hours

McMornan commented 3 years ago

I do a daily backup - doeing a backup with a live database can have unpredictable results. Thats why I stop domoticz in the early morning, do the backup and then restart. Is it possible that the connection to the tradfri gateway has not been resumed yet (thus no devices present) when this check is done ? Is is possible to prevent removal of devices not found ?

I do domoticz backups twice a day once in the night while domoticz is running. Never had issues restoring the database.

Ah well, you'll find out if you had issues when you want to restore a backup in an emeegency situation😈. Sorry, I used to work as an admin - maybe you are right with that database - but usually its a very bad idea to back up a hot database....

moroen commented 3 years ago

@McMornan

Ah well, you'll find out if you had issues when you want to restore a backup in an emeegency situation😈. Sorry, I used to work as an admin - maybe you are right with that database - but usually its a very bad idea to back up a hot database....

Stopping domoticz and doing a backup shouldn't cause the devices to disappear. Did the devices disappear today as well? What happens if stop domoticz, runs the backupscript manually, and then restarts? Do you use homebridge?

@tomg06 Any issues, or was this i one-off? I don't see why updating homebridge should affect the plugin, but I dont't use homebridge, and if this happened after a homebridge upgrade, it seems possible

McMornan commented 3 years ago

Yes, devices have disappeared again, sorry. This time I made sure that "allow new devices" was disabled.

McMornan commented 3 years ago

After reenabling "allow devices" and updating the ikea plugin in the hardware settings devices are there again. I could live with me having to manually remove devices which I have removed from my IKEA gateway. How about that ?

moroen commented 3 years ago

After reenabling "allow devices" and updating the ikea plugin in the hardware settings devices are there again. I could live with me having to manually remove devices which I have removed from my IKEA gateway. How about that ?

Some kind of workaround could probably be implemented, but that wouldn't solve the root cause. You would also need an option to not add new devices, or you might end up with multiple devices in domoticz.

So, this only happens once a day? Does it happen when running the backup script manually?

Could you try to list the devices from the command line in the evening, and again in the morning after the devices has been removed, and post both?

$ python3 plugin.py list
Sarcas666 commented 3 years ago

Ah, someone with a similar problem! Some days ago all my IKEA-PLUGIN devices disappeared in from Domoticz. I did not notice this first, I was updating my rPI when my wife complained she had to cook in the dark for hours with only the light of her phone … (I came home from a late shift). After a reboot devices had returned, but not all of them work. Devices in the living room work normally, devices in the kitchen don’t. And I think I made it worse by trying to fix it :/ I tried to reset the lamps and re-add them, delete them from the app, and messing things up as bad as possible. So I am now planning to try to somehow reset all the kitchen stuff, and add remote and lamps one-by-one again and hopefully domoticz will pick that up somehow.

But the main thing was, devices disappeared and reappeared but did not all work.

McMornan commented 3 years ago

Hi, I can do that. I'll upload the dumps tomorrow

McMornan commented 3 years ago

@Sarcas666: my devices do work after they reappear - just with new IDX ids

McMornan commented 3 years ago

Just an afterthought - I have one group called "Küche" - kitchen. Could the special character 'ü' be a problem ? Maybe the loop collecting the devices throwing an uncaught exception, resulting in an empty list ...

moroen commented 3 years ago

Just an afterthought - I have one group called "Küche" - kitchen. Could the special character 'ü' be a problem ? Maybe the loop collecting the devices throwing an uncaught exception, resulting in an empty list ...

You might be on to something, but then I would expect it to fail also when you try stopping and restarting domoticz manually...

I've been stopping and starting domoticz intermittently for the past 24 hours, no sign of this. I've tried to click the update-button on the hardware page several times, the devices remain as they should. I've updated both my debian (production) and manjaro (development) systems with the latest packages, no problem.

As far as I can tell, there hasn't been a update of the firmware that could explain this.

The common denominator I can sew between Sarcas666 and McMornan, is that it appears that you both are using raspberry Pis? Would it be possible to try a separate domoticz instance, just with the plugin, no other devices and no backups?

McMornan commented 3 years ago

Hi, devices have disappeared again. I will stop the backup for a few days and see if that is the reason.

Sarcas666 commented 3 years ago

I removed the devices in my kitchen in the Ikea-app (lamps driver and remote) and reset them. I added the remote to the hub and the lamps to the remote. After restarting Domoticz the old kitchen devices were gone and the new ones created. The living room devices remained unchanged and remained working. So this was successful. I adapted my Domoticz scripts to the new situation and everything seems to work again as intended.

Sidenote; the disappearance only happened once for me. I have so far rebooted the rPi many times without loosing any devices. After the Big Disappearance I did a thorough update of my rPi. I usually update the normal way regularly but I never updated the python thingies. Pip. There was a long list of outdated stuff. Perhaps that helped somehow.

Currently I am running: Buster 10.6 Python 3.7.3 IKEA Tradfri Plugin: 0.10.2 Tradfricoap: 0.0.14 Py3coap: 0.9.1 coapcmd: (/home/pi/domoticz/plugins/IKEA-Tradfri/bin/coapcmd) 0.1.6 Domoticz 2020.2 12732

McMornan commented 3 years ago

Hi, I stopped the backup for last night ... and what shall I say ? The devices are still there in the morning.

I assume that at the time I do the backup the IKEA bridge ist not reacheable thus no devices are detected. I will change the time of the backup to early morning say 7am and see if that changes anything ...

Funny thing is: if I do a normal domoticz restart using the systemctl cmd the devices do not disappear ...

moroen commented 3 years ago

I still fail to see how an unreachable IKEA Gateway would result in an empty devicelist. The plugin updates (adds/removes/set status) when domoticz starts up (or when clicking "Update" on the hardware page). If the IKEA Tradfri gateway isn't available, the plugin just waits, doesn't do any updates, until the IKEA gateway once again is available. To verify this, stop domoticz, disconnect the gateway, and restart domoticz from the command line to see the log:

2020/12/12 13:32:33 COAP DTLS Error: Handshake timeout
2020-12-12 13:32:33.324  (Ikea) Connection to gateway timed out
2020-12-12 13:32:42.826  (Ikea) Pushing 'onHeartbeatCallback' on to queue
2020-12-12 13:32:42.848  (Ikea) Processing 'onHeartbeatCallback' message
2020-12-12 13:32:42.848  (Ikea) Calling message handler 'onHeartbeat'.
2020-12-12 13:32:42.848  (Ikea) Timeout flag set, retrying...
2020/12/12 13:32:48 COAP DTLS Error: Handshake timeout
2020-12-12 13:32:48.849  (Ikea) Connection to gateway timed out
2020-12-12 13:32:58.351  (Ikea) Pushing 'onHeartbeatCallback' on to queue
2020-12-12 13:32:58.372  (Ikea) Processing 'onHeartbeatCallback' message
2020-12-12 13:32:58.372  (Ikea) Calling message handler 'onHeartbeat'.
2020-12-12 13:32:58.372  (Ikea) Timeout flag set, retrying...
2020/12/12 13:33:04 COAP DTLS Error: Handshake timeout
2020-12-12 13:33:04.373  (Ikea) Connection to gateway timed out
2020-12-12 13:33:13.875  (Ikea) Pushing 'onHeartbeatCallback' on to queue
2020-12-12 13:33:13.895  (Ikea) Processing 'onHeartbeatCallback' message
2020-12-12 13:33:13.895  (Ikea) Calling message handler 'onHeartbeat'.
2020-12-12 13:33:13.895  (Ikea) Timeout flag set, retrying...
2020-12-12 13:33:14.164  (Ikea - TV-Stue - Spot - Battery) Updating device from 0:'100' to have values 0:'100'.
...

If debug is enabled, the log will show that devices are added and deleted, so if the plugin actually deletes the devices on restart after your backup, this should be reflected in the log. So if the plugin somehow gets an empty device-list, you should see a lot of lines in the log stating that devices are being removed.

Just a suggestion: Have you checked the timestamp on the domoticz.db file before and after the backup?

Bolten88 commented 3 years ago

Today i had the same issue during a normal day. No backup, no reboot but the devices where gone. Updating the plugin in the hardware section fixed it with new id's. Very strange.

A few days ago i updated the plugin, tradfricoap and pycoap.

moroen commented 3 years ago

Today i had the same issue during a normal day. No backup, no reboot but the devices where gone. Updating the plugin in the hardware section fixed it with new id's. Very strange.

A few days ago i updated the plugin, tradfricoap and pycoap.

What is the version of the IKEA Tradfri firmware? And would you by any chance have a relevant log? Have you noticed if the ID, not only the IDX, of the devices might have changed as well?

McMornan commented 3 years ago

Hi, as I said yesterday I have changed the backup time to 7am. Devices are still there today. So, not sure what was the reason - I guess I will just not touch it now :)

Gateway is running firmware 1.12.31 (not sure if this is international - I am from Germany)

Lars

Bolten88 commented 3 years ago

Today i had the same issue during a normal day. No backup, no reboot but the devices where gone. Updating the plugin in the hardware section fixed it with new id's. Very strange. A few days ago i updated the plugin, tradfricoap and pycoap.

What is the version of the IKEA Tradfri firmware? And would you by any chance have a relevant log? Have you noticed if the ID, not only the IDX, of the devices might have changed as well?

Hi Moroen, this morning it happend again. My gateway firmware version is also 1.12.31. Not sure if the id's where changed. The IDX where changed again this morning.

I noted the id's and check that if it happens again.

moroen commented 3 years ago

This is getting more and more frustrating...

My gateway has the same firmware as yours. Still no problems on my own two systems, and I really need a log with debug enabled if I'm to have any chance figuring this out.

BTW: Are you by any chance rebooting the IKEA gateway regulary, and does a gateway reboot coincide with the devices disapperaring?

I managed to create a test case for a situation that I previously assumed would work (the Gateway dropping a connection while in the process of getting the device list), and confirmed that that doesn't trigger this error...

Bolten88 commented 3 years ago

Hi Moroen,

i was able to reproduce it. Hopefully it will help you. The log (Log 8.txt) shows a simple 'deleting devices'. Where (Log 9.txt) shows that the devices are added again. There are no errors or something.

11:34 --> Turn off the IKEA gateway --> Devices where offline but still there 11:37 --> Turn on the IKEA gateway --> Devices where still there and everthing fine 11:45 --> Goto hardware, click plugin, click update --> Devices where still there and everthing fine 11:49 --> Restart domoticz --> sudo service domoticz.sh restart --> Devices where still there and everthing fine 11:51 --> Restart raspberry pi --> Devices where still there and everthing fine 11:56 --> Restart the IKEA gateway --> Devices where gone --> Log 8.txt 11:59 --> Goto hardware, click plugin, click update --> Devices are back with new IDX but ID's are the same --> Log 9.txt

The IDX's get mixxed up when the devices are removed and added again. I have one blockly that doesn't work anymore after this happens. Screen 1: After the devices where added again Screen 2: How it supposed to be Screen 3: Overview of IKEA devices

Log 1.txt Log 2.txt Log 3.txt Log 4.txt Log 5.txt Log 6.txt Log 7.txt Log 8.txt Log 9.txt

Screen 1 Screen 2 Screen 3

moroen commented 3 years ago

@Bolten88

Thank you, that was very helpfull! The log confirmed that it indeed is a timeout issue (as these kind of non-constant errors almost invariably are, and timeouts are notoriously difficult to debug). In your case it seems related to polling(?). As an preliminary precaution disabling polling should reduce the likelyhood of this, as the update functions is only run on startup and on polling for changes.

I'll get started on figuring out a fix as soon as possible. Ideally I would like to fix the root cause, but at least it should be somewhat trivial to implement a work-around.

Bolten88 commented 3 years ago

@Bolten88

Thank you, that was very helpfull! The log confirmed that it indeed is a timeout issue (as these kind of non-constant errors almost invariably are, and timeouts are notoriously difficult to debug). In your case it seems related to polling(?). As an preliminary precaution disabling polling should reduce the likelyhood of this, as the update functions is only run on startup and on polling for changes.

I'll get started on figuring out a fix as soon as possible. Ideally I would like to fix the root cause, but at least it should be somewhat trivial to implement a work-around.

That is great. Good luck! I have polling on 30 sec (yes i know longer is better). Most of the time i can use the shorter polling time without issues, i will set the polling on 120 seconds for now.

moroen commented 3 years ago

Found it! (I hope...)

When the Tradfri Gateway reboots (or is powered on), there is a small period where the gateway is available on the network, but the coap subsystem hasn't finished starting up. If the plugin requests a update in this period, and the timing is right (or wrong depending on your perspective) tradfricoap would return a incomplete list of devices, leading the plugin to delete the "missing" devices. So either a startup or a poll in exactly this period, could trigger the error.

There is an update to both tradfricoap (0.0.15) and the plugin (0.10.3) just released, that should fix this...

Bolten88 commented 3 years ago

Found it! (I hope...)

When the Tradfri Gateway reboots (or is powered on), there is a small period where the gateway is available on the network, but the coap subsystem hasn't finished starting up. If the plugin requests a update in this period, and the timing is right (or wrong depending on your perspective) tradfricoap would return a incomplete list of devices, leading the plugin to delete the "missing" devices. So either a startup or a poll in exactly this period, could trigger the error.

There is an update to both tradfricoap (0.0.15) and the plugin (0.10.3) just released, that should fix this...

Great, i just updated and will check what happens when i reboot the gateway several times.

Bolten88 commented 3 years ago

@moroen so far so good. I've set my polling intervall to 30 sec again and rebooted the gateway several times. It did not happen again for now. I will do some more reboots over the day and see if it won't happen again.

Bolten88 commented 3 years ago

@moroen i kept rebooting the gateway several times a day for 2 days now and the issue did not return. So it is solved for me.

moroen commented 3 years ago

Sounds good! The fix published a couple of days a go was somewhat of a hack, working around an error in the underlying gocoap library. This is now fixed "the proper way", by raising and handling the correct exceptions. This requires an update to all components, except the plugin itself, which is still 0.10.3

IKEA Tradfri Plugin: 0.10.3
Tradfricoap: 0.0.17
Py3coap: 0.9.2
coapcmd: (...) 0.1.7

As I have mentioned previously, testing is not my strong suite, so thanks to everyone contributing with testing and error reports!

McMornan commented 3 years ago

Hi, still stable so far ... (thumbs up). Fix seems to work.

moroen commented 3 years ago

No new errors reported, hopefully fixed. Closing issue, feel free to reopen if needed.