the1laz / cgateweb

MQTT interface for C-Bus written in Node.js
MIT License
21 stars 20 forks source link

Issues with multiple commands sent in quick succession? #20

Open lockyt opened 5 years ago

lockyt commented 5 years ago

Hi @the1laz - great work with cgateweb. It's a great little script.

I am however having an issue with CBUS which may or may not be the cgateweb script causing the problem which I am hoping you can help with.

To explain the situation. We have a Qnap NAS running docker containers in the following configuration. CBUS(CNI)<->Cgate<->Cgateweb<->mosquito eclipse<->Home Assistant<->Homekit. All works just fine for turning single lights on/off and works super fast. However, when we send more complex commands such as turning off the kitchen lights or downstairs lights, it has problems where it seems some lights fail to report back to HA that they turned on/off. So for example, you tell Siri to turn off all kitchen lights and any lights that are on turn off. However, in HA it shows that some kitchen lights are still on but they have actually turned off. Looking at the cgateweb logs, it shows the command being sent to turn the lights off and they work (lights go out) but the responses don't come back for the lights (which are showing as still on).

Looking at cgate logs, I can see the commands come in to turn on the lights and see the responses go back for all the lights. So it seems that somewhere between cgate and cgateweb the responses to update what happened are getting lost.

I have isolated it down to one of these two (cgate/cgateweb) as everything else seems to be working as expected.

Here are the screenshots of both cgate and cgateweb during the kitchen lights on/off scenario as above, as well as what is happening in Home Assistant:

Turn off all kitchen lights either through HA or SIRI:

Screen Shot 2019-07-24 at 7 51 30 pm

all physically turn off as requested but HA doesn't get a response for 28 and 30 so it thinks they are still on

Screen Shot 2019-07-24 at 7 51 24 pm

Turn on all kitchen lights through either HA or SIRI:

Screen Shot 2019-07-24 at 7 51 54 pm

only turns on those in HA/CBUS that it thinks are off (ie, everything but 28/30)

Screen Shot 2019-07-24 at 7 51 48 pm

If you go from the bottom up on the logs you can find where all lights were turned on or off. You can see the device ids being turned on/off and the responses from cgate confirming them. But cgateweb only shows some of the responses coming in, not all and the ones that don't are the lights that stay on/off. Now it could be cgate logging something it isn't sending or perhaps cgateweb not processing and accepting the responses. We have a very large cbus installation and it could be related to that.. a total of ~190 lights and blinds in the place. However this issue happens even on the smallest group - in this instance only 6 lights.

Any help would be greatly appreciated. I will need to confirm what version of cgate we are using and whether there is anything special with the cbus installation (need to speak with the CBUS guy) but thought I'd check with you first.

Thanks for your help in advance.

Locky.

lockyt commented 5 years ago

Hi @the1laz - I spoke with the CBus guy today and he suggested that it could just be the CBus network not being able to process a bunch of commands being sent at once. Although I think we are seeing a response from cgate in the console, it doesn't mean it is actually being sent so I guess it is possible. He is suggesting we send commands with a microsecond or two delay to ensure it doesn't get hit with too many commands all at once. Not sure if that is possible with the script.

He also suggested we use levels instead of on/off commands which I will test. Some of the lights aren't dimmable so it's not a perfect solution and not sure how it would work.

He is also going to setup a scene or two for me to use instead but it kinda defeats the purpose of using Home Assistant/HomeKit. I want to be able to move lights in or out of a room on HomeKit and ask for all lights in that group to be turned on/off. Keep it dynamic. The CBus guy here in Canberra is impossible to get hold of at the best of times, so would prefer not to rely on groups being defined in CBus.

Thanks.

Locky.

lockyt commented 5 years ago

Sorry one other question - does cgateweb use a quality of service parameter with MQTT? If so, is it possible to change it to guaranteed delivery?

Thanks!

the1laz commented 5 years ago

Sorry Locky, I haven't had a chance to have a good look at this but I'm sure it won't be too hard to solve. I didn't really consider sending multiple commands at once when I put this together, so I'm not surprised that it's not working. I'll have a look this weekend.

lockyt commented 5 years ago

Awesome - that'd be great thanks mate. You may want to check my logic is right first as I'd hate you to do a bunch of work to find it is something I am doing stupid.

lockyt commented 5 years ago

Hey Stephen - any luck over the weekend? Get a chance to have a look? Sorry no pressure - just glad you are happy to help.

the1laz commented 5 years ago

No luck last weekend due to my pi crashing at some point then giving me grief when starting up. Got some time today though and stuck a couple of message queues in that'll hopefully help. Seems good at my end, let me know if it fixes it for you.

the1laz commented 5 years ago

Don't forget to grab a copy of your settings before downloading the latest commit. Sorry, the settings file (and all the rest of the code) could be better, but I've not really put much effort into this since I first got it working.

lockyt commented 5 years ago

Ok great thanks Stephen. Unfortunately I won't be able to test it until later in the week but I will let you know how I go!

lockyt commented 5 years ago

Hi Stephen - just tested the new script and definitely a major improvement. I noticed the events have slowed down (your flow control I presume) and whereas it worked properly 10% of the time before, it now works properly 90% of the time. Out of the 10 or so tests I did, we did have a command or two not respond on it's state (which in turn doesn't update HA on the status of the light).

I am hoping it is just a tuning thing now to get the sweet spot of speed vs commands not being responded to? Is that something I can change on the fly or am I over simplifying it? Please let me know.

But definitely a lot better so I think we are going down the right track.

Thanks again mate for all your help.

lockyt commented 5 years ago
Screen Shot 2019-08-17 at 9 19 12 am

This is a screen shot on one of the times it didn't work. The lights actually turned off (so the commands got to CBUS) but the responses from the CBUS through cgateweb don't all get through. This was only one test of 2 out of 10 that didn't work.

Thanks mate.

the1laz commented 5 years ago

Hi Locky, I'm hoping it's just a tuning thing. I've added an option to change the message interval in settings.js and bumped the default up to 200ms between messages. Hopefully that'll work for you, seems to be working for me.

lockyt commented 5 years ago

Great. I'll try again and let you know.

lockyt commented 5 years ago

Hi Stephen - just to let you know that it all now seems to be working a charm. Thanks so much for all your help! Really appreciated.

the1laz commented 5 years ago

No problem Locky, good to hear it's working.

GledholtHall commented 4 years ago

I'm kinda surprised at this - but obviously it works.

There is a TCP socket connection to C-Gate (no dropped info there) and C-Gate handles all the C-Bus communication and includes it's own buffering and flow control. It is true that C-Bus itself has very limited capacity for rapid commands but C-Gate manages this (or should).

The connection to and from MQTT is TCP based so again no dropped messages there so it's a fairly solid path apart from 'within C-Gate' over which you have no control. The fact the buffer solution works implies issues there.

... anyway as it works, this is not too important

lockyt commented 4 years ago

Thanks @GledholtHall - not wasted effort at all as it is good to discuss these things for better understanding. The changes that @the1laz made certainly worked as I think the issue was not at either the MQTT or CBUS end but within the NodeJS script he developed. He hadn't needed queuing within his script previously as his use case differed from mine but very helpfully added some queuing for me (which I still really appreciate thank you @the1laz!) which worked a charm. It did introduce a more serial on/off of bulk commands from what it was prior to controlling queued events (which was more synchronous), but that is as expected. If @the1laz had the desire to improve his script further, a more synchronous queueing would be happily received, but the way it works now is all a-ok. I will add that the client has been going solidly for nearly 12 months now with zero issues so fixed it well and good. Thanks.

haqeem18 commented 3 years ago

Hi @lockyt , I've just got cgateweb running on a pi4 and facing the same issues aswell. I was wondering what interval did you set for your use case? I've tried fine tuning with the timing which does improve, but there is still a good chance of not all the state update correctly.

Thanks.

lockyt commented 3 years ago

Hi @haqeem18. It was actually for a client so I don't fully remember what interval I ended up using and don't have access to his network atm. I think I went slightly over the default 200ms but you are best to go with trial and error. Set it and try a large on/off command - we used to test turning on/off his kitchen area which was about 20 CBUS devices over and over until we got a number that didn't have any failures.

Sorry I can't be more specific but just up the number slightly until it is no longer causing problems.

Thanks.

rbswift commented 2 years ago

I'm seeing the same behaviour. It seems to be caused by receiving message at high rate from cbus. I'm testing using 12 group scenes to test so not writing anything via cgateweb, just reading received messages. With telnet all 12 events are received fine but with mosquitto_sub via cgateweb several are missed

telnet pi4 20025
lighting on //HOME/254/56/1  #sourceunit=102 OID=2c4bd5c0-db1d-103a-9fba-c36d60777dc1
lighting on //HOME/254/56/2  #sourceunit=102 OID=2c4b1270-db1d-103a-9fbe-c36d60777dc1
lighting on //HOME/254/56/3  #sourceunit=102 OID=2c4bfcd0-db1d-103a-9fbb-c36d60777dc1
lighting on //HOME/254/56/4  #sourceunit=102 OID=2c4c23e0-db1d-103a-9fbc-c36d60777dc1
lighting on //HOME/254/56/5  #sourceunit=102 OID=2c4c4af0-db1d-103a-9fbd-c36d60777dc1
lighting on //HOME/254/56/6  #sourceunit=102 OID=2c4b3980-db1d-103a-9fbf-c36d60777dc1
lighting on //HOME/254/56/7  #sourceunit=102 OID=2c4b6090-db1d-103a-9fc0-c36d60777dc1
lighting on //HOME/254/56/8  #sourceunit=102 OID=2c4baeb0-db1d-103a-9fb9-c36d60777dc1
lighting on //HOME/254/56/9  #sourceunit=102 OID=2c4b87a0-db1d-103a-9fc1-c36d60777dc1
lighting on //HOME/254/56/10  #sourceunit=102 OID=2c4bfcd0-db1d-103a-9fc4-c36d60777dc1
lighting on //HOME/254/56/11  #sourceunit=102 OID=2c4baeb0-db1d-103a-9fc2-c36d60777dc1
lighting on //HOME/254/56/12  #sourceunit=102 OID=2c4bd5c0-db1d-103a-9fc3-c36d60777dc1

lighting off //HOME/254/56/1  #sourceunit=102 OID=2c4bd5c0-db1d-103a-9fba-c36d60777dc1
lighting off //HOME/254/56/2  #sourceunit=102 OID=2c4b1270-db1d-103a-9fbe-c36d60777dc1
lighting off //HOME/254/56/3  #sourceunit=102 OID=2c4bfcd0-db1d-103a-9fbb-c36d60777dc1
lighting off //HOME/254/56/4  #sourceunit=102 OID=2c4c23e0-db1d-103a-9fbc-c36d60777dc1
lighting off //HOME/254/56/5  #sourceunit=102 OID=2c4c4af0-db1d-103a-9fbd-c36d60777dc1
lighting off //HOME/254/56/6  #sourceunit=102 OID=2c4b3980-db1d-103a-9fbf-c36d60777dc1
lighting off //HOME/254/56/7  #sourceunit=102 OID=2c4b6090-db1d-103a-9fc0-c36d60777dc1
lighting off //HOME/254/56/8  #sourceunit=102 OID=2c4baeb0-db1d-103a-9fb9-c36d60777dc1
lighting off //HOME/254/56/9  #sourceunit=102 OID=2c4b87a0-db1d-103a-9fc1-c36d60777dc1
lighting off //HOME/254/56/10  #sourceunit=102 OID=2c4bfcd0-db1d-103a-9fc4-c36d60777dc1
lighting off //HOME/254/56/11  #sourceunit=102 OID=2c4baeb0-db1d-103a-9fc2-c36d60777dc1
lighting off //HOME/254/56/12  #sourceunit=102 OID=2c4bd5c0-db1d-103a-9fc3-c36d60777dc1
mosquitto_sub -t '#' -v | grep cbus

cbus/read/254/56/1/state ON
cbus/read/254/56/1/level 100
cbus/read/254/56/3/state ON
cbus/read/254/56/3/level 100
cbus/read/254/56/7/state ON
cbus/read/254/56/7/level 100
cbus/read/254/56/8/state ON
cbus/read/254/56/8/level 100
cbus/read/254/56/9/state ON
cbus/read/254/56/9/level 100
cbus/read/254/56/10/state ON
cbus/read/254/56/10/level 100

cbus/read/254/56/1/state OFF
cbus/read/254/56/1/level 0
cbus/read/254/56/2/state OFF
cbus/read/254/56/2/level 0
cbus/read/254/56/4/state OFF
cbus/read/254/56/4/level 0
cbus/read/254/56/5/state OFF
cbus/read/254/56/5/level 0
cbus/read/254/56/7/state OFF
cbus/read/254/56/7/level 0
cbus/read/254/56/8/state OFF
cbus/read/254/56/8/level 0
cbus/read/254/56/9/state OFF
cbus/read/254/56/9/level 0
cbus/read/254/56/10/state OFF
cbus/read/254/56/10/level 0
cbus/read/254/56/11/state OFF
cbus/read/254/56/11/level 0
cbus/read/254/56/12/state OFF
cbus/read/254/56/12/level 0
rbswift commented 2 years ago

It seems like this line: event.on('data', function(data) will sometimes receive multiple lines representing multiple c-bus events from cgate in a single net.socket event.

The code as it stands will only process the first line and drop the rest

The additional events can be seen if we uncomment line 320 in index.js

Hopefully this is the cause for everyone. i.e. there's no need to slow down the rate of messages, we just need to change the code to process all the c-bus event lines instead of just the first one then we can return to using cgate at full speed.

I'm no expert but it looks like we should use a carrier rather than just expect one-to-one correlation between socket-on-data and c-bus events as has been used in homebridge-cbus

GledholtHall commented 2 years ago

Useful detective work - watching the progress with interest.

I haven't noticed this issue but I don't think I have many times when such a rapid number of changes happen coincidentally. But I would like my implementation to be 100% and operate at max speed so if there's a fix that would be reassuring.

rbswift commented 2 years ago

Thanks @GledholtHall you can easily reproduce the problem if you use toolkit to set up a cbus scene with 12 relay groups. You'd probably get dropped events most times with as few as 6 groups. Setting multiple groups in rapid succession from a cbus key unit scene key the existing message interval setting isn't relevant as that only applies to sending rather than receiving.

rbswift commented 2 years ago

This is fixed in my fork now and high rate events pretty close to instantaneous.

GledholtHall commented 2 years ago

That's great - can I merge your changes into the branch I use (which supports more c-bus applications) ?

rbswift commented 2 years ago

That's great - can I merge your changes into the branch I use (which supports more c-bus applications) ?

Absolutely please try it

bdnstn commented 2 years ago

Which branch/fork are you using? I'm about to make some changes also to get my c-bus system working with HA. There are already 17 forks, and nothing committed to the master since Aug last year. It's starting to look a bit chaotic. What is the best way to proceed?

@.***

From: GledholtHall @.> Sent: Monday, 11 July 2022 2:24 AM To: the1laz/cgateweb @.> Cc: Subscribed @.***> Subject: Re: [the1laz/cgateweb] Issues with multiple commands sent in quick succession? (#20)

That's great - can I merge your changes into the branch I use (which supports more c-bus applications) ?

- Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fthe1laz%2Fcgateweb%2Fissues%2F20%23issuecomment-1179776073&data=05%7C01%7C%7C2431738e15bc4ac0b2bf08da62a1539a%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637930742296245522%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=tnsLYar4v7XJfyaYOXAG54V5xFGYo2IJCxTBDte%2BlDw%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAF5SOLMFOJ6FWL63THP46ODVTMILDANCNFSM4IGPCD4A&data=05%7C01%7C%7C2431738e15bc4ac0b2bf08da62a1539a%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637930742296245522%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=h2fUHG1AvP6p6LeQQF7MHkZZNDoKs9IQCgHK%2FsKED%2BI%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.**@.>>