micolous / cbus

Use Python to communicate with a Clipsal C-Bus PCI or CNI! Includes MQTT bridge (for Home Assistant), PCI simulator, and reverse engineering notes.
http://cbus.readthedocs.org/
GNU Lesser General Public License v3.0
49 stars 27 forks source link

Lights not turning off reliably #22

Open sladey opened 3 years ago

sladey commented 3 years ago

I am integrating with Home Assistant and this is such a great addition to any CBUS setup. Logging from HA suggests that the MQTT messages are being sent and received as expected. I have a HA script that is trying to turn 4 lights on and off. The 'on' script works but the 'off' script does not turn all the lights off. Here is the debug from cmqttd:

2021-01-07 14:37:03,635 send: <PointToMultipointPacket object: application=56, source_address=None, sals=[]> 2021-01-07 14:37:03,635 send: b'\0538000119A9p\r' 2021-01-07 14:37:03,636 send: <PointToMultipointPacket object: application=56, source_address=None, sals=[]> 2021-01-07 14:37:03,636 send: b'\05380001259Dq\r' 2021-01-07 14:37:03,639 send: <PointToMultipointPacket object: application=56, source_address=None, sals=[]> 2021-01-07 14:37:03,639 send: b'\0538000120A2r\r' 2021-01-07 14:37:03,640 send: <PointToMultipointPacket object: application=56, source_address=None, sals=[]> 2021-01-07 14:37:03,640 send: b'\053800011AA8s\r' 2021-01-07 14:37:03,794 Incoming data: b'p.q.r.!' 2021-01-07 14:37:03,794 Got packet: ConfirmationPacket(code=b'p', success=True) 2021-01-07 14:37:03,794 recv: confirmation: code = b'p', success = True 2021-01-07 14:37:03,795 Incoming data: b'q.r.!' 2021-01-07 14:37:03,795 Got packet: ConfirmationPacket(code=b'q', success=True) 2021-01-07 14:37:03,795 recv: confirmation: code = b'q', success = True 2021-01-07 14:37:03,795 Incoming data: b'r.!' 2021-01-07 14:37:03,795 Got packet: ConfirmationPacket(code=b'r', success=True) 2021-01-07 14:37:03,796 recv: confirmation: code = b'r', success = True 2021-01-07 14:37:03,796 Incoming data: b'!' 2021-01-07 14:37:03,796 Got packet: PCIErrorPacket() 2021-01-07 14:37:03,796 recv: PCI cannot accept data 2021-01-07 14:37:05,197 Incoming data: b'053FFF001EFFE794058CF59F\r\n' 2021-01-07 14:37:05,198 Got packet: InvalidPacket(payload=b'\xff\x00\x1e\xff\xe7\x94\x05\x8c\xf5', exception=NotImplementedError('Unknown status request type 0x1e')) 2021-01-07 14:37:05,199 hcp: unhandled other packet: InvalidPacket(payload=b'\xff\x00\x1e\xff\xe7\x94\x05\x8c\xf5', exception=NotImplementedError('Unknown status request type 0x1e'))

As you can see I get and invalid packet. When I run the script a second time I get the same issue.

2021-01-07 14:45:01,462 send: <PointToMultipointPacket object: application=56, source_address=None, sals=[]> 2021-01-07 14:45:01,462 send: b'\0538000119A9z\r' 2021-01-07 14:45:01,463 send: <PointToMultipointPacket object: application=56, source_address=None, sals=[]> 2021-01-07 14:45:01,463 send: b'\05380001259Dg\r' 2021-01-07 14:45:01,465 send: <PointToMultipointPacket object: application=56, source_address=None, sals=[]> 2021-01-07 14:45:01,465 send: b'\0538000120A2h\r' 2021-01-07 14:45:01,467 send: <PointToMultipointPacket object: application=56, source_address=None, sals=[]> 2021-01-07 14:45:01,467 send: b'\053800011AA8i\r' 2021-01-07 14:45:01,616 Incoming data: b'z.g.h' 2021-01-07 14:45:01,616 Got packet: ConfirmationPacket(code=b'z', success=True) 2021-01-07 14:45:01,617 recv: confirmation: code = b'z', success = True 2021-01-07 14:45:01,617 Incoming data: b'g.h' 2021-01-07 14:45:01,617 Got packet: ConfirmationPacket(code=b'g', success=True) 2021-01-07 14:45:01,617 recv: confirmation: code = b'g', success = True 2021-01-07 14:45:01,617 Incoming data: b'h' 2021-01-07 14:45:01,716 Incoming data: b'h.!' 2021-01-07 14:45:01,716 Got packet: ConfirmationPacket(code=b'h', success=True) 2021-01-07 14:45:01,716 recv: confirmation: code = b'h', success = True 2021-01-07 14:45:01,717 Incoming data: b'!' 2021-01-07 14:45:01,717 Got packet: PCIErrorPacket() 2021-01-07 14:45:01,717 recv: PCI cannot accept data

To get the light to turn off I need to activate another scene that includes the light in question, then try turning off again and this mostly works.

Unfortunately this does not happen every time I run the on and off scripts. Here is what a successful lights on and then lights off looks like:

2021-01-07 14:46:38,944 send: <PointToMultipointPacket object: application=56, source_address=None, sals=[]> 2021-01-07 14:46:38,944 send: b'\053800021AB4F3j\r' 2021-01-07 14:46:38,988 send: <PointToMultipointPacket object: application=56, source_address=None, sals=[]> 2021-01-07 14:46:38,989 send: b'\0538000219BEEAk\r' 2021-01-07 14:46:38,989 send: <PointToMultipointPacket object: application=56, source_address=None, sals=[]> 2021-01-07 14:46:38,989 send: b'\0538000220B4EDl\r' 2021-01-07 14:46:38,989 send: <PointToMultipointPacket object: application=56, source_address=None, sals=[]> 2021-01-07 14:46:38,989 send: b'\053800792525m\r' 2021-01-07 14:46:39,106 Incoming data: b'j.k.l.' 2021-01-07 14:46:39,106 Got packet: ConfirmationPacket(code=b'j', success=True) 2021-01-07 14:46:39,107 recv: confirmation: code = b'j', success = True 2021-01-07 14:46:39,107 Incoming data: b'k.l.' 2021-01-07 14:46:39,107 Got packet: ConfirmationPacket(code=b'k', success=True) 2021-01-07 14:46:39,107 recv: confirmation: code = b'k', success = True 2021-01-07 14:46:39,107 Incoming data: b'l.' 2021-01-07 14:46:39,107 Got packet: ConfirmationPacket(code=b'l', success=True) 2021-01-07 14:46:39,107 recv: confirmation: code = b'l', success = True 2021-01-07 14:46:39,250 Incoming data: b'm.' 2021-01-07 14:46:39,251 Got packet: ConfirmationPacket(code=b'm', success=True) 2021-01-07 14:46:39,251 recv: confirmation: code = b'm', success = True

2021-01-07 14:46:45,611 send: <PointToMultipointPacket object: application=56, source_address=None, sals=[]> 2021-01-07 14:46:45,611 send: b'\0538000119A9n\r' 2021-01-07 14:46:45,618 send: <PointToMultipointPacket object: application=56, source_address=None, sals=[]> 2021-01-07 14:46:45,619 send: b'\05380001259Do\r' 2021-01-07 14:46:45,665 send: <PointToMultipointPacket object: application=56, source_address=None, sals=[]> 2021-01-07 14:46:45,665 send: b'\0538000120A2p\r' 2021-01-07 14:46:45,665 send: <PointToMultipointPacket object: application=56, source_address=None, sals=[]> 2021-01-07 14:46:45,665 send: b'\053800011AA8q\r' 2021-01-07 14:46:45,778 Incoming data: b'n.' 2021-01-07 14:46:45,779 Got packet: ConfirmationPacket(code=b'n', success=True) 2021-01-07 14:46:45,779 recv: confirmation: code = b'n', success = True 2021-01-07 14:46:46,194 Incoming data: b'o.p.q.' 2021-01-07 14:46:46,195 Got packet: ConfirmationPacket(code=b'o', success=True) 2021-01-07 14:46:46,195 recv: confirmation: code = b'o', success = True 2021-01-07 14:46:46,195 Incoming data: b'p.q.' 2021-01-07 14:46:46,195 Got packet: ConfirmationPacket(code=b'p', success=True) 2021-01-07 14:46:46,196 recv: confirmation: code = b'p', success = True 2021-01-07 14:46:46,196 Incoming data: b'q.' 2021-01-07 14:46:46,196 Got packet: ConfirmationPacket(code=b'q', success=True) 2021-01-07 14:46:46,196 recv: confirmation: code = b'q', success = True

Any help would be really appreciated. I am running cmqttd on

Distributor ID: Ubuntu Description: Ubuntu 20.04.1 LTS Release: 20.04 Codename: focal

Python version: 3.8.5

beerygaz commented 2 years ago

I'm seeing this issue too. It seems especially prevalent when I run a script or call a light.turn_off service for multiple groups simultaneously. I see them hit MQTT and cmqttd but only the first one or two lights go off, the rest do not. I see the same PCIErrorPacket error as you do.

Could it be that there are too many messages in rapid succession?

2022-06-19 19:30:00,010 send: b'\\05380001487Aq\r'
2022-06-19 19:30:00,153 Incoming data: b'm.n.o.'
2022-06-19 19:30:00,154 Got packet: ConfirmationPacket(code=b'm', success=True)
2022-06-19 19:30:00,154 recv: confirmation: code = b'm', success = True
2022-06-19 19:30:00,154 Incoming data: b'n.o.'
2022-06-19 19:30:00,154 Got packet: ConfirmationPacket(code=b'n', success=True)
2022-06-19 19:30:00,154 recv: confirmation: code = b'n', success = True
2022-06-19 19:30:00,154 Incoming data: b'o.'
2022-06-19 19:30:00,154 Got packet: ConfirmationPacket(code=b'o', success=True)
2022-06-19 19:30:00,154 recv: confirmation: code = b'o', success = True
2022-06-19 19:30:00,297 Incoming data: b'p.'
2022-06-19 19:30:00,298 Got packet: ConfirmationPacket(code=b'p', success=True)
2022-06-19 19:30:00,298 recv: confirmation: code = b'p', success = True
2022-06-19 19:33:29,724 send: <PointToMultipointPacket object: application=223, source_address=None, sals=[ClockUpdateSAL(val=datetime.date(2022, 6, 19)), ClockUpdateSAL(val=datetime.time(19, 33, 29, 724450))]>
2022-06-19 19:33:29,724 send: b'\\05DF000E0207E60613060D0113211DFFA2r\r'
2022-06-19 19:33:29,867 Incoming data: b'!'
2022-06-19 19:33:29,867 Got packet: PCIErrorPacket()
2022-06-19 19:33:29,867 recv: PCI cannot accept data
2022-06-19 19:33:32,871 Incoming data: b'0565380002230039\r\n'
2022-06-19 19:33:32,871 Got packet: <PointToMultipointPacket object: application=56, source_address=101, sals=[<LightingRampSAL object: group_address=35, duration=0, level=0>]>
bdnstn commented 2 years ago

I'm also seeing an issue where the lights don't turn off reliably when set off from an automation with a large number of other lights. The lights show as off in Home Assistant, but some actually remain on.

sladey commented 2 years ago

I am seeing the same thing. I am putting lights in a light group then turning off the group. Some lights stay on. If I trigger the Home Assistant script multiple times I typically get additional lights turn off (from the original group).

rbswift commented 1 year ago

Did anyone have any luck resolving this? It would be really great to be able to control several groups in a batch reliably.

bdnstn commented 1 year ago

I didn’t. I went back to using my cgateweb based solution, which has been very stable.

mitchell-johnson commented 1 year ago

I am seeing the same issue on my setup. running scenes with multiple lights doesn't work consistently

rbswift commented 1 year ago

I'm using cbus for a new home and this is on my todo list to investigate and hopefully fix as I need reliable bulk group changes. I'd love to eliminate c-gate and the java runtime from my final environment as these seem quite redundant. If anyone has any hints on where to start looking that would be great! Cheers

Nick-Adams-AU commented 1 year ago

If you have a 5500SHAC or similar, check out https://github.com/autoSteve/acMqtt

rbswift commented 1 year ago

@Nick-Adams-AU that looks great but I only have the old Ethernet CNI

mitchell-johnson commented 1 year ago

After some investigation I believe that the issue is just that its sending commands too quickly for the CNI to handle. Here is the comment from pciprotocol.py ` def on_pci_cannot_accept_data(self): """ Event called whenever the PCI cannot accept the supplied data. Common reasons for this occurring:

    * The checksum is incorrect.
    * The buffer in the PCI is full.

    Unfortunately the PCI does not tell us which requests these are
    associated with.

    This error can occur if data is being sent to the PCI too quickly, or
    if the cable connecting the PCI to the computer is faulty.

    While the PCI can operate at 9600 baud, this only applies to data it
    sends, not to data it receives.

    """
    logger.debug('recv: PCI cannot accept data')`

Actions are to figure out what the best rate is and implement a message queue

rbswift commented 1 year ago

That's a great lead @mitchell-johnson thanks. I wonder if someone with a SHAC or similar could send a batch of commands through and snoop for the timing it uses? Come to think of it even the toolkit can batch switch multiple groups can't it. Maybe c-gate could be a point of reference for timing?

mitchell-johnson commented 1 year ago

It seems its a known thing. In the setup code there is already. I have forked this PR https://github.com/micolous/cbus/pull/36 that adds a throttled queue to get the status of all devices on connection. I dunno where 97 ms came from but works for that so im going to see if I can just reuse the same thing. https://github.com/CABrouwers/cbus/blob/c8c4000b81cfa1258c466d0c45b313f8ff6759d1/cbus/daemon/cmqttd.py#L174

rbswift commented 1 year ago

That sounds promising! I'm usually bit averse to the old just add a sleep as a fix but sometimes working with undocumented protocols force a bit of improvisation.

mitchell-johnson commented 1 year ago

I have pushed the changes here if you wanted to try them out. Just adds a 100ms delay between each light. I don't love it. I believe I should be able to figure out which command fails and retry it but at least its working for me now

rbswift commented 1 year ago

Awesome thanks. My cbus kit is packed up ready for install at a new build but next time I fire it up I will definitely test it. Hopefully someone else will too! Great work.

mitchell-johnson commented 1 year ago

Feel free to email me mitchell at johnson dot fyi if you need a hand with the setup

sladey commented 1 year ago

Hey Mitchell, I tried to install your fork of cmqttd and things went pretty crazy. Previously I was running the micolous code from 2021. In your fork there have been several updates around how labels are handled. I am not using a project.cbz file and with the new version I was getting errors in publish_all_lights. Are you using a project.cbz file? Do you think I can just cherry pick your queue related changes?

mitchell-johnson commented 1 year ago

Hey @sladey Yeah my code is based on the changes made in PR #36 so is pretty different to what you would be running. I dont see why not. You would need to pull in the class Periodic and the delay code. let me know if you want some help.

sladey commented 1 year ago

Thanks @mitchell-johnson I have made the changes and it is working like a charm. The delay give a nice ripple effect through the house as I operate on the big groups we have here. Thanks so much for nutting out this problem I have been frustrated by it (but not enough to debug myself) for the last two years !! Thanks again.

mitchell-johnson commented 1 year ago

There are 2 areas that I might get to researching. The CNI obviously has some sort of buffer that gets full up. In theory rather than waiting 100ms to send each command we should be able to do 2-3 at a time then wait some amount of MS then continue. Also there is confirmation code logic that is not fully implemented that should allow you to check if a given command is executed successfully. Doing both of those would be the best solution.

damon-atkins commented 1 year ago

Might find this useful for debugging/tracing communications https://github.com/robert/how-to-build-a-tcp-proxy/blob/master/tcp_proxy.py