home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
72.18k stars 30.21k forks source link

Haiku SenseMe integration is not compatible with new firmware #69370

Closed HADev2 closed 2 years ago

HADev2 commented 2 years ago

The problem

I have been running this integration with my current setup and it has been truly remarkable. However BAF has updated their app as well as the fan firmware so that all of their fans can be controlled with a single app. In my home I have both an Haiku and I6 fan. However, with the firmware upgrade from BAF, this broke the SenseMe integration with the Haiku fans. Granted now I have a single app on my phone that can control both fans and sport the same interface. I do not know if this is something other have experienced or may run into. Apparently whatever BAF has done now supports both fan types yet broke this cool integration.

What version of Home Assistant Core has the issue?

core-2022.3.8

What was the last working version of Home Assistant Core?

core-2022.3.8

What type of installation are you running?

Home Assistant OS

Integration causing the issue

SenseMe

Link to integration documentation on our website

https://www.home-assistant.io/integrations/senseme/

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

The issue started after the Haiku had its firmware updated so that this fan could be consolidated into a single BAF app (that runs on my iPhone). In essence, the SenseMe integration would still be working fine if it was not for this firmware change. My point here is to notify others on what has happened and should they update their firmware what the result maybe.

probot-home-assistant[bot] commented 2 years ago

senseme documentation senseme source (message by IssueLinks)

probot-home-assistant[bot] commented 2 years ago

Hey there @mikelawrence, @bdraco, mind taking a look at this issue as it has been labeled with an integration (senseme) you are listed as a code owner for? Thanks! (message by CodeOwnersMention)

bdraco commented 2 years ago

They might have changed everything over to the i6 protocol.

There is a homebridge plug that works with that https://www.npmjs.com/package/homebridge-i6-bigassfans

Can you give it a shot and see if it will find the fans?

bdraco commented 2 years ago

From reading the app description, it looks like the fans no longer speak the senseme protocol after the update, and now speak the i6 protocol.

Likely we need a new integration for the i6 protocol

bdraco commented 2 years ago

https://github.com/home-assistant/alerts.home-assistant.io/pull/510

WJKramer commented 2 years ago

I can confirm all 4 of my fans and 5 stand alone lights no longer work in home assistant after the firmware update and app migration form Haiku to BAF.

bdraco commented 2 years ago

Thanks for the heads up. We have published an alert: https://alerts.home-assistant.io/#senseme.markdown

I've reached out to BAF to see if they have any docs on the new protocol

gardiner4 commented 2 years ago

They might have changed everything over to the i6 protocol.

There is a homebridge plug that works with that https://www.npmjs.com/package/homebridge-i6-bigassfans

Can you give it a shot and see if it will find the fans?

I tried out the i6 plug in and it does work. It reports some errors but does function in turning the fan on/off, and setting the speed.

bdraco commented 2 years ago

Great news, there is a way to downgrade the firmware.

In the app https://play.google.com/store/apps/details?id=com.baf.i6&hl=en_US&gl=US https://apps.apple.com/us/app/big-ass-fans/id1472272574

Choose App Settings on the left side bar

Long press on the app version for 5 seconds

Follow the instructions

oogje commented 2 years ago

Great news, there is a way to downgrade the firmware.

In the app https://play.google.com/store/apps/details?id=com.baf.i6&hl=en_US&gl=US https://apps.apple.com/us/app/big-ass-fans/id1472272574

Choose App Settings on the left side bar

Long press on the app version for 5 seconds

Follow the instructions

Hello. Although I'm working on homebridge support for Haiku fans that have been updated to use a similar protocol as the i6 , I've been avoiding updating the firmware in my i6 fan for fear it would break my i6 homebridge plugin. If I thought there's a way to downgrade then I'd try the update.

The long press doesn't do anything on my (old) version of the BAF app (1.7.0), which isn't surprising but I'm concerned the downgrade feature might be limited to Haiku models.

My question is, did this downgrade feature come to you with any indication that it would work with i6 models?

Thank you.

bdraco commented 2 years ago

The downgrade is only for the Haiku models

oogje commented 2 years ago

The downgrade is only for the Haiku models

Thank you.

appark commented 2 years ago

Great news, there is a way to downgrade the firmware.

In the app https://play.google.com/store/apps/details?id=com.baf.i6&hl=en_US&gl=US https://apps.apple.com/us/app/big-ass-fans/id1472272574

Choose App Settings on the left side bar

Long press on the app version for 5 seconds

Follow the instructions

I cannot seem to find instructions on how to downgrade with IOS?

bdraco commented 2 years ago

I cannot seem to find instructions on how to downgrade with IOS?

BAF support can walk you though the process.

jfroy commented 2 years ago

Downgrading worked for me via the latest BAF app. Used the old Haiku app to setup, then used the built-in integration (not HACS) to add my fans.

The i6 protocol seems to expose the temperature sensor (at least the homebridge plugin supports that). This may be an opportunity to bring the i6 protocol to HASS and increase functionality.

bdraco commented 2 years ago

The i6 protocol seems to expose the temperature sensor (at least the homebridge plugin supports that). This may be an opportunity to bring the i6 protocol to HASS and increase functionality.

Is that something you are interested in working on? It looks like it's using protobuf.

jfroy commented 2 years ago

The i6 protocol seems to expose the temperature sensor (at least the homebridge plugin supports that). This may be an opportunity to bring the i6 protocol to HASS and increase functionality.

Is that something you are interested in working on? It looks like it's using protobuf.

Yeah perhaps. I am weary of getting into that because I assume the existing maintainers of the senseme integration will actually take that on.

jfroy commented 2 years ago

I have started prototyping a Python AIO library for interfacing with BAF products using the i6 protocol, using the awesome homebridge implementation as a reference. It's public at https://github.com/jfroy/aiobafi6.

I only have a single Haiku w/o LED fan, so it's going to be janky and bad unless people chip in with network traces. I have very little modern Python experience, so please keep feedback civil. I also don't have a ton of time to dedicate to it, but a few more weekends should yield something usable.

bdraco commented 2 years ago

https://protobuf-decoder.netlify.app/ https://jamesdbrock.github.io/protobuf-decoder-explainer/

https://pypi.org/project/protobuf/

These might be helpful.

jfroy commented 2 years ago

https://protobuf-decoder.netlify.app/ https://jamesdbrock.github.io/protobuf-decoder-explainer/

https://pypi.org/project/protobuf/

These might be helpful.

It is actually protobuf? The varint128 encoding is the same (but there's not a lot of ways to do varint), but otherwise when you look at the data, there are oddities that suggest it is not binary protobuf. I'll look at this again -- it sure as heck would be nice if it was, because indeed I'd just throw out the garbage custom parsing code and just write a .proto.

jfroy commented 2 years ago

Sure looks like protobuf once you remove the 0xc0 bracketing and emulation prevention scheme.

jfroy commented 2 years ago

Working on a basic proto file, for a fan query, the first 2 messages parse OK, but the 3rd seems corrupt. Every standard proto tool just barfs at it.

12692267 1203A804 011203B0 04011203 A0040112 04B8048C 151203F0 04001203 F8040012 03DBDC04
                                                                                 ^^
011204C8 04D80412 03D00400 1204D804 A0381206 E204036E 6F771203 E8040012 03900500 12039A05
^^
001203E0 0A011206 F00A80DD DBDD0112 07E80AFF FFFFFF0F 1203F80A 00
                               ^^

Up to the first ^^ the message is well formed. But then things go off the rails. The first ^^ is a sub-message length field (3 bytes), followed by 3 bytes, and then 0x01 (second ^^). This is very different from the usual 0x12 (variable length write type tag 2) structure from most messages, and it indicates a double value. If you keep interpreting the proto that way, you run out of bytes and nothing makes sense. If you skip the 0x01 bytes, the message makes more sense, but you will run out of bytes.

This payload is also a bit different because the top-level 0x12 message (variable length tag 2) does not span the entire message. The length field (0x69) stops at the 0xF8 byte at the end, which means 0A 00 is another field in the top-level message. If you interpret it, it's a variable length field (wire format 2) with tag 1, with a length field of 0, which makes no sense.

jfroy commented 2 years ago

Sigh, never mind. I had a bug in my message framing emulation preventing byte code. This is what I get for not writing tests.

bdraco commented 2 years ago

Once you have the library working I'll upgrade a couple of my fans and see about what I can do to help.

We will need a new integration since it's a new protocol, but much of the Senseme integration can be used as a template for that. I can help with that as well once we get to that point

jfroy commented 2 years ago

I was thinking we could use the same integration, bundle the 2 protocol libraries and have smarts about picking the right one.

bdraco commented 2 years ago

It would be cleaner to do a new one since the old one is going to go away soon anyways (rumor is unless you firewall it off, everything gets upgraded to the new protocol in the fall...not sure if it will actually happen though). We won't have the complexity of maintaining the legacy and the new code in the same integration. I'm happy to do the bulk of the work on the Home Assistant side.

jfroy commented 2 years ago

Pushed my proto file and reworked my prototype to use the normal Python protobuf library. Seems to be working well, but I have limited test hardware.

https://github.com/jfroy/aiobafi6/blob/main/proto/aiobafi6.proto

bdraco commented 2 years ago

Can someone with an i6 fan try the above repo?

appark commented 2 years ago

Is a Haiku L a i6 Fan?

jfroy commented 2 years ago

I guess I picked a bad name -- i6 refers to the new protobuf-based protocol, originally deployed with the i6 line of fans, but now being deployed to Haiku fans with the latest firmware and the new app. So, if you upgrade your firmware, your Haiku fan should speak that new protocol.

I'm working on finishing to map the commands into the proto file. I'll push a few updates today with more test commands for people to try, as well as a keep-alive mode where the prototype keeps listening for fan property updates (ran overnight fine on my network and computer and fan).

appark commented 2 years ago

Thank you

jfroy commented 2 years ago

I’ve just pushed another round of patches. The prototype command can now set any property by name. For example, to set the speed, you can now do python3 aiobafi6/main.py -i <ip> speed 4. The name must be one of the field names in the Property proto message. So for example, you could also enable whoosh mode by doing whoosh_enable 1.

At this point, fan control seems pretty good. It would be helpful if other folks could use the prototype to set light settings and report if anything is broken or missing. There is a --dump flag to enable binary proto dumping for the query mode and the generic property setting mode.

To install or update, clone or pull the repo and run pip3 install -e . to recompile the proto. You will need to have protoc in your path.

jfroy commented 2 years ago

I am not seeing presence updates with the current code, and given that I'm dumping the entire response proto, I don't think the fan is sending the data. The protocol may require additional commands or different options to send presence updates. It would be a pretty big functional regression to lose that.

jfroy commented 2 years ago

Pushed an update that implements device discovery. The new firmware uses dns service discovery instead of custom broadcast packets.

seajack0 commented 2 years ago

I'd be happy to help test. Any chance you can quickly make your main HACS compliant for easy maintenance? Thanks!

Structure link for compliance

jfroy commented 2 years ago

I'm working on turning the prototype into a proper library at the moment. I am completely unfamiliar with Home Assistant's APIs and conventions, but I'm hoping to just copy-pasta the current integration and swap the underlying library. I'm trying to make them API-compatible-ish.

bdraco commented 2 years ago

After 2022.5 ships this week I should have some time this coming weekend

After that, and once you are happy with it, I can send PRs to adjust it for Home Assistant standards/conventions.

oogje commented 2 years ago

The i6 protocol seems to expose the temperature sensor (at least the homebridge plugin supports that). This may be an opportunity to bring the i6 protocol to HASS and increase functionality.

Is that something you are interested in working on? It looks like it's using protobuf.

Thank you @bdraco for suggesting protobufs and @jfroy for building a working implementation. I've now rewritten my janky homebridge plugin message parser with protobufs in mind.

bdraco commented 2 years ago

2022.5 is looking good so far. Looks like I'll have time to build an integration using @jfroy 's library this weekend

jfroy commented 2 years ago

I'm almost done with the library version. Going to push soon. Seems to mostly be working OK, but using it in a HASS integration is going to be the deal test.

I can probably take a jab at starting an integration after that, if you want.

jfroy commented 2 years ago

Version 0.1.0 of the library has been published.

https://pypi.org/project/aiobafi6/

bdraco commented 2 years ago

I ran to the code a bit.

I might have missed something, but it looks like it's polling the device instead of getting push updates like the original firmware.

Have you found a way for it to push the state change as soon as it happens on the fan?

jfroy commented 2 years ago

The new firmware does send pushes, but not for every property. So the library does include configurable polling. The default is 60 seconds. The test command line overrides that to 15 seconds (matches the BAF application's behavior). Also, callbacks have a 0.2 seconds coalescing delay so rapid changes will only yield one dispatch 0.2 seconds after the last update.

But you should be able to see from the debug logs that some properties update immediately, such as speed, if you change them (using another instance of the command line, or using the BAF app).

Are you seeing a different behavior?

jfroy commented 2 years ago

I also haven't entirely explored what the property query enum does. The BAF app sends an initial ALL query, which returns every property "page" or "section", then sends a regular FIRMWARE_MORE_DATETIME_API query every 15 seconds. The library right now always sends a ALL query on the interval. It does change the query results returned by the firmware, but it does not seem to affect which properties are pushed. The firmware just sends "push query results" for certain properties to all open connections, as best I can tell.

jfroy commented 2 years ago

I've just pushed a revision to add a flag to the command line to make the query interval configurable. You can also set it to 0 to disable background queries (don't do that with the direct mode[^1], that will in fact spam the device, which will cause it to reboot).

[^1]: The direct mode (--direct) does not use the library and instead the command line handles networking on its own. It offers a more powerful query loop that diffs the protobuf and handles unknown fields.

bdraco commented 2 years ago

I won't be home until Saturday to test, but I'll dig in once I am.

I was planning on converting the original lib to use an asyncio.Protocol with something like before the firmware change happened because all the wait_fors slowed everything down but never had a chance to make it happen.

jfroy commented 2 years ago

I won't be home until Saturday to test, but I'll dig in once I am.

I was planning on converting the original lib to use an asyncio.Protocol before the firmware change happened because all the wait_fors slowed everything down but never had a chance to make it happen.

I'm using asyncio StreamReader and StreamWriter because it was easy and Reader's readuntil() is really convenient for the wire protocol. I skimmed the implementation and it uses a Protocol under the covers.

I use a wait_for to wrap the open_connection call, which should not be frequent or in a critical path and is kind of the way to timeout quickly on a connection that isn't happening, and another to wrap the readuntil call in the main read loop to deal with broken tcp connections. Indeed, without that wait_for to apply a custom timeout, it can take a long time (hours) for the OS and asyncio to notice the other end is gone (I tested this by killing power to my fan).

I'm new to python and asyncio (I do go, rust, and C++ for a living), so I'm not familiar with the history of asyncio, best practices for performance, etc. Is there a particular problem with wait_for? I haven't tested with a lot of devices (I only have a single real fan, but I could build a quick simulator), or running code library for a long time (my longest test is about 24 hours), so I don't know if there are problems there. But it certainly reacts quickly to property changes I make from the BAF app or my test program (I've tested most of the fan settings visible in the BAF app — but no light settings, as I don't have that hardware).

Most of my testing has been on macOS 12 with python 3.8, although I've also ran tests on Arch running in WSL2. I haven't tested on a Pi yet (probably should do that...).

No rush on the testing!

bdraco commented 2 years ago

I'm using asyncio StreamReader and StreamWriter because it was easy and Reader's readuntil() is really convenient for the wire protocol. I skimmed the implementation and it uses a Protocol under the covers.

I use a wait_for to wrap the open_connection call, which should not be frequent or in a critical path and is kind of the way to timeout quickly on a connection that isn't happening,

That's usually fine as long as you don't reconnect/retry too frequently as it can cause this issue: https://github.com/python-kasa/python-kasa/pull/340

and another to wrap the readuntil call in the main read loop to deal with broken tcp connections. Indeed, without that wait_for to apply a custom timeout, it can take a long time (hours) for the OS and asyncio to notice the other end is gone (I tested this by killing power to my fan).

You may find that the asyncio.Protocol does a good job of calling connection_lost when it disconnects (depends on the device)

For cases where the device abruptly goes offline and never comes back (no TCP RST), we usually we setup some type of periodic loop.call_later chain to declare the connection dead if we haven't seen data on the wire in a period of time that it is expected. Also we generally avoid the patten where you cancel it when you get data since cancelation and pushing a new timer on to the event loop queue is much more expensive than comparing time since last data. For flux_led we have a simple counter to declare the device unavailable when we reach the maximum number of polled updates without a response https://github.com/Danielhiversen/flux_led/blob/master/flux_led/aiodevice.py#L662

I'm new to python and asyncio (I do go, rust, and C++ for a living), so I'm not familiar with the history of asyncio, best practices for performance, etc. Is there a particular problem with wait_for? I haven't tested with a lot of devices (I only have a single real fan, but I could build a quick simulator), or running code library for a long time (my longest test is about 24 hours), so I don't know if there are problems there. But it certainly reacts quickly to property changes I make from the BAF app or my test program (I've tested most of the fan settings visible in the BAF app — but no light settings, as I don't have that hardware).

wait_for does quite a bit under the hood https://github.com/python/cpython/blob/main/Lib/asyncio/tasks.py#L435 which really starts to add up when you have a lot of them running. I try hard to avoid using them in loops where we end up reading small frames as the setup/tear down of each iteration when reading frequently is quite noticeable when you have a lot of devices (worse when it frequently times out since you have to pay for the exception overhead as well).

If the payloads are large and the reads are infrequent, and only sent when polled, it likely doesn't make too much difference as the number of wait_fors is low relative to reading the data (still better to not have them at all if they can be avoided). At least with the old firmware, it quite chatty, and sent lots of unsolicited (push) small packets so this was noticeable as we did lots of short reads which was also very apparent when you did an strace since this pattern generated quite a lot of system call overhead as well.

Most of my testing has been on macOS 12 with python 3.8, although I've also ran tests on Arch running in WSL2. I haven't tested on a Pi yet (probably should do that...).

Any time you have an asyncio.sleep, if you can replace it with a loop.call_later, its generally going to be more performant since you have less running coroutines that the event loop has to deal with. It also generally means you can avoid the complexity of canceling a task since canceling the timer handle that loop.call_later returns is much cheaper and you don't have to think about what the task may be doing when you cancel the task.

I remember we had a user report they had 20 or so fans so I expect the wait_for pattern would start to be noticeable if their HA instance was otherwise loaded. I have 6 of these in production right now at home, and the current SenseME integration has the longest startup time of all the the integrations I'm using and is noticeable when using the built in profiler integration. That's the motivation I had for wanting to convert it to use an asyncio.Protocol directly.

jfroy commented 2 years ago

Thanks for the feedback, that's very informative. I'll try to rework the code accordingly.

I'm using asyncio StreamReader and StreamWriter because it was easy and Reader's readuntil() is really convenient for the wire protocol. I skimmed the implementation and it uses a Protocol under the covers. I use a wait_for to wrap the open_connection call, which should not be frequent or in a critical path and is kind of the way to timeout quickly on a connection that isn't happening,

That's usually fine as long as you don't reconnect/retry too frequently as it can cause this issue: python-kasa/python-kasa#340

The library tries to reconnect at most every 5 seconds, and it never gives up. I'm not clear what the expectation is for a HASS integration -- do you give up after some time (how long?), or do you always keep trying to recover. I've seen a couple of bugs for other integrations that seem not to recover and require a HASS restart or integration reload. That's a pretty bad user experience. If for whatever reason the owner is doing electrical or network work, making the devices unreachable for hours or days, the integration should just keep trying to recover. Or does HASS deal with that at a higher level?

I implement this rate limting with asyncio.Sleep:

while True:
    try:
        # Connect to the device, potentially after a delay to avoid banging
        # on an unresponsive or invalid address.
        delay = next_connect_ts - time.monotonic()
        if delay > 0:
            await asyncio.sleep(delay)
        next_connect_ts = time.monotonic() + _DELAY_BETWEEN_CONNECT_ATTEMPTS
        self._connection = await asyncio.wait_for(
            asyncio.open_connection(...),
            timeout=_DELAY_BETWEEN_CONNECT_ATTEMPTS,
        )

For cases where the device abruptly goes offline and never comes back (no TCP RST), we usually we setup some type of periodic loop.call_later chain to declare the connection dead if we haven't seen data on the wire in a period of time that it is expected. Also we generally avoid the patten where you cancel it when you get data since cancelation and pushing a new timer on to the event loop queue is much more expensive than comparing time since last data. For flux_led we have a simple counter to declare the device unavailable when we reach the maximum number of polled updates without a response https://github.com/Danielhiversen/flux_led/blob/master/flux_led/aiodevice.py#L662

I was testing the no RST case, indeed (power loss, yank cable, take interface down). When a proper reset can be sent on the wire, the high-level Stream abstractions respond immediately, since they are implemented with a Protocol and as you said that reacts quickly. I did look into esoteric socket options, like user timeout, tcp keep-alive, but none of that is really battle-tested, standard, or easy to setup. The ultimate goal was to provide a good user experience where the entities become unavailable in HASS relatively quickly (order minutes, not hours).

And got it – wait_for bad, call_later good 😅.

wait_for does quite a bit under the hood https://github.com/python/cpython/blob/main/Lib/asyncio/tasks.py#L435 which really starts to add up when you have a lot of them running. I try hard to avoid using them in loops where we end up reading small frames as the setup/tear down of each iteration when reading frequently is quite noticeable when you have a lot of devices (worse when it frequently times out since you have to pay for the exception overhead as well).

If the payloads are large and the reads are infrequent, and only sent when polled, it likely doesn't make too much difference as the number of wait_fors is low relative to reading the data (still better to not have them at all if they can be avoided). At least with the old firmware, it quite chatty, and sent lots of unsolicited (push) small packets so this was noticeable as we did lots of short reads which was also very apparent when you did an strace since this pattern generated quite a lot of system call overhead as well.

The new firmware is not too chatty. You don't get anything on the wire unless a property changes or you send a query. But the firmware does not gather its responses into large packets, it just sends a bunch of small packets (order 100-200 bytes at most, usually < 50 bytes), which is not ideal.

Any time you have an asyncio.sleep, if you can replace it with a loop.call_later, its generally going to be more performant since you have less running coroutines that the event loop has to deal with. It also generally means you can avoid the complexity of canceling a task since canceling the timer handle that loop.call_later returns is much cheaper and you don't have to think about what the task may be doing when you cancel the task.

I remember we had a user report they had 20 or so fans so I expect the wait_for pattern would start to be noticeable if their HA instance was otherwise loaded. I have 6 of these in production right now at home, and the current SenseME integration has the longest startup time of all the the integrations I'm using and is noticeable when using the built in profiler integration. That's the motivation I had for wanting to convert it to use an asyncio.Protocol directly.

👍

bdraco commented 2 years ago

I've started working on the integration. I'll share a branch when I have something useful