rospogrigio / airbnk_mqtt

MQTT control of Airbnk locks.
GNU General Public License v3.0
27 stars 6 forks source link

ESPHome #24

Open formatBCE opened 2 years ago

formatBCE commented 2 years ago

Hi @rospogrigio ! Long time :)

I recently developed custom BLE presence for HA, based on ESP32, and meanwhile realized, how to get it working as custom ESPHome component. Do you think it's worth to move my custom gateway firmware to similar component? It would require 1 file and 6 lines of code to add it.

I actually think, that it would be useful for new users. Tasmota is good enough, but ESPHome is widely used with HA. E.g. someone could install this to SONOFF NS Panel. :)

rospogrigio commented 1 year ago

Wow, I was about to help you out but you got right to the point very quickly. I see you already found the function where you find all the error conditions. It's strange that the bad state is detected only on the second attempt to operate it but at least it's possible to detect it, now you have to inspect the code to understand how the app is able to recover from this state, you said it does it via Bluetooth so it must be feasible, keep digging. And also, I share your opinion on the badly written code, that's why it took days to understand all the encryption part 😄

formatBCE commented 1 year ago

OMG, I'm trying to understand this code for 2 days already. No luck yet. I was thinking of creating some BLE man-in-the-middle device to put it between lock and vendor gateway and log all communication - but it seems to be impossible (at least without advanced skills and separate test lock).

Continue digging.

formatBCE commented 1 year ago

FFF3 codes (valid ones) are increasing in 14 and 20 bytes each time I lock/unlock device, including incorrect command.

I believe, this open/close number is somehow used in next code generation, right? Probably, somewhere we use incorrect number on event like this? I will check corresponding adv payload and compare to FFF3 state. Also I will investigate, if adv and FFF3 are changing in same manner, when operating from vendor app.

Will also try to decompile with different tool - seems like I stuck a bit.

formatBCE commented 1 year ago

Ok, "06" means "invalid signature". Digging.

rospogrigio commented 1 year ago

Could it be that in the "stuck condition" the ADV is not reporting the exact eventsCounter ? That would explain, maybe. Have you experienced moments in which the value in the ADV and the one in the FFF3 are different? It should be in the bytes 12 and 13 of the FFF3 characteristic...

formatBCE commented 1 year ago

That I will check today - comparing adv and FFF3 in different states. Will ping on result.

formatBCE commented 1 year ago

Ok, now it's superclear.

Initial state:

baba0303010400363030303539380000014a00000-d9f-9a000000
aa0011020400f001040000000-d9f-014a9a00008c

After unlock/lock (HA):

baba0303010400363030303539380000014a00000-da1-9a000000
aa0011020400f001040000000-da1-014a9a00008e

After unlock/lock (vendor app):

baba0303010400363030303539380000014b00000-da3-9a000000
aa0011020400f001040000000-da3-014b9a000091

After 1st lock on locked (HA):

baba0303010400363030303539380000014b00000-da3-9a000000
aa0011020400f001040000000-da4-014b9a000092

So adv payload doesn't reflect actual number of open/closed count in event of repeating command. I understand, why my fork does have this problem, while main branch doesn't. I removed checking that FFF3, because it makes ESP WiFi connection unstable.

So looks like official app does take that FFF3 value every time. Which is OK for powerful Android/iOS device with separate antennas, but is undesirable for ESP32. What 'bout vendor's gateway, i doubt it requires constant WiFi connection, and also it has definitely higher lags on open/close, than both custom gateway and vendor app via Bluetooth.

Also, AFAIK, @nourmehdi mentioned, that incorrect lock setting could lead to unavailability of lock. I guess it was same thing.

3 things to investigate:

  1. Check, if we can somehow read that FFF3 data anyways.
  2. Check, if we can send some command to synchronize adv data to FFF3.
  3. Check, how vendor app does recover lock from that 06 state (in that case we don't have correct number in either of sources).
formatBCE commented 1 year ago

Well, i kinda made it working, but i doubt it's viable solution. I introduces variable expected_lockEvents, and each time i'm sending command, that value is increased, and dropped back, if received lockEvents in adv after command was completed is greater or equals.

Just checked it easily, and it works.

Will investigate better solution in meantime.

rospogrigio commented 1 year ago

OK perfectly clear and this brings back to memory why I was skeptical about removing reading the FFF3 status. My lock does not show this behavior but also my lock does not detect manual opening/closing so I guess it's a matter of lock FW/characteristics. It is not super clear the last fix you explained, where did you add that variable? In the Integration code I suppose. But still I don't understand in which case you drop it back. However, I do remember that I had implemented that sort of "lockEvents prediction after command" but I seem to remember that there was some corner case that made it fail, need to remember how.

formatBCE commented 1 year ago

I drop that back, if lockEvents from adv are bigger or equal to prediction. If not - then I use predicted value for generating new command. And keep increasing that prediction, if adv is old.

Won't work for your case though. Frankly saying, only now I understand your problem.

Anyway, it's not good solution. We need to have data from FFF3, or somehow force updating lockEvents in adv.

rospogrigio commented 1 year ago

Can you refresh my memory? Why does reading FFF3 compromise wifi stability and why reading the ADV doesn't?

formatBCE commented 1 year ago

Sure.

So as I said, radio in ESP32 sits on same antenna. And when we're writing to GATT server characteristic, or reading from characteristic, we should create BLE client instance, and explicitly connect to lock GATT server (where characteristics leave). This connection takes whole antenna time. But advertising catching is different thing. It is made to be intrusive, and scanner in gateway has options to set, which allow using antenna for 256 out of each 512 milliseconds, leaving half of antenna time to WiFi. ESP32 supports it on core level, and there's no itching even with continuous scan (which I do now, and it works great).

Oh, and to the FFF3: basically, sending 2 commands to FFF2 takes not much time, but after that we need to read from FFF3, and it takes long time in connected state, which leads to dropping WiFi connection, thus eliminating possibility to respond with command result via MQTT.

rospogrigio commented 1 year ago

I see. How about closing connection after writing FFF2 commands so to let wifi take its time, wait some time like 0.5 or 1 sec and then reconnect to read the FFF3? Does this make sense?

formatBCE commented 1 year ago

Hmm, then we will have result, but very late. Is it something that integration will allow? I mean, for old code, it will be in operating state for all this time, and for my fork, it will allow sending multiple [incorrect] commands, which will hang the lock (those zeroes in fff3). And then we're doomed (no correct payload in any of sources).

I found this function in app code:

public void getAdvertisingInfo() {
        Log.i(this.TAG, "Send GetAdvCode ----------------");
        lockManager.sendDirective(Methods.GetAdvCode, this.writeCallBack);
    }

It writes specific payload to lock right after connecting each time. Payload is ff00aa000e0000000000000000000000020406

IDK if worth trying, but looks like something that does synchronize adv message with current lock state.

rospogrigio commented 1 year ago

Well that looks quite a sync request, I'd try sending it very quickly and check if it solves the issue, it would be awesome.

formatBCE commented 1 year ago

Ok, i made required changes in gateway code and uploaded to ESP. If it works, i will move this logic to integration (all-in-all, it's integration's work to form commands). Can't test now - everybody in my place is sleeping yet. Will ping on result later this day.

formatBCE commented 1 year ago

Unfortunately, it doesn't help - still old adv message.

formatBCE commented 1 year ago

Well, i returned to reading status. And actually, with my late changes to the firmware, it's not so bad. Also, i reduced time delay between trying to read FFF3. And in integration, now lockEvents is biggest number of one received in adv, and read from lockStatus. I still didn't add back code for failing, if that lockStatus is incorrect. Tested in different conditions, and it works.

I don't know, if it'll work for you @rospogrigio , because there's no way still to get that FFF3, if command was not sent - and if it's wrongly sent, FFF3 goes to error state.

I have no way to test it in your conditions though. Maybe, we can periodically read that FFF3... Or maybe, some command will bring it back to life after error (like vendor app does) - i will keep digging.

My problem is solved though, at least in current scope.

rospogrigio commented 1 year ago

Well if you want to share what you did maybe I can try it in my environment and give feedback, even if currently everything is working fine and here we say "never change a winning team" 😉 but if you want some feedback I can test it (I also have a spare ESP32 board since I bought a mini one to setup the fingerprint). Let me know in case. Edit: that's strange that that command does not request the sync as we supposed... I'd be really curious to understand what it does...

formatBCE commented 1 year ago

Ok, updated PR, and use it with latest https://github.com/formatBCE/Airbnk-MQTTOpenGateway/blob/main/src/esphome/airbnk-gateway.h

formatBCE commented 1 year ago

@rospogrigio i just thought on one thing. Gateway can connect and read that fff3 BEFORE creating command. First i thought, that HA integration could generate everything, except lockEvents, and then gateway could put correct lockEvents number right before writing command to the lock. But checking the code, lockEvents is deeply inside of command generation, and i wouldn't want to move that logic to ESP.

Second thought: we can make it in two steps:

  1. Integration requests fff3 payload from gateway, receives it in response and generates command based on that lockEvents.
  2. Integration sends command to operate.

It's a bit more sophisticated, but would eliminate your problem too (when adv doesn't correspond to manual operations).

What do you think, is it something worth trying?

rospogrigio commented 1 year ago

Well it certainly is an interesting approach, but probably will introduce a bit of delay so you'll have more stability but less responsivity, but why not trying. Moving the logic into the ESP might be another option, but I wonder how much time we would gain for this. For the record though, I am currently not having any issue (neither before nor after your last update) since my lock does not detect manual operation, so all of this is not so crucial for me. And as a consequence, I am more interested in responsivity and wifi stability rather than this robustness, while other users that have your same lock might probably prefer this approach. Which is your model by the way? Mine is M510. If you need a tester, here I am...

formatBCE commented 1 year ago

Oh wait a sec.

I thought, your lock doesn't detect manual operations in adv message, but does it internally - which breaks integration work, if lock was operated manually.

But it seems that lockEvents doesn't increase at all, right? Then we're ok. My last changes eliminate the last problem we had.

Cool stuff. :)

rospogrigio commented 1 year ago

Exactly: no increase in lockEvents, no stuck situations in HA. The only weird thing is that sometimes its status turns to Jammed when operated manually (but not always), but it is always possible to send commands, it never enters your stuck situation.

formatBCE commented 1 year ago

Great then! Yeah, I also have "jammed" message, when opening it slowly. Doesn't bother me though. :)

So do we merge it now, or wait for you to test it a bit? Works for me.

rospogrigio commented 1 year ago

Let me check how it goes for a bit. Also, I am still in the middle of finalizing the fingerprint thing (still waiting for a power converter) so let me setup everything and if it works ok let's merge everything ;-)

rospogrigio commented 1 year ago

Mmm I'm already having issues with the new setup. I noticed that the lock had become unavailable so I opened the logs. Then I tried to launch a close command and I don't know if it worked because I'm at my office, but the following error came out:

[09:37:05][D][airbnk_mqtt:105]: BLE scan heartbeat
[09:37:10][D][airbnk_mqtt:105]: BLE scan heartbeat
[09:37:14][D][airbnk_mqtt:123]: Sending adv
[09:37:15][D][airbnk_mqtt:105]: BLE scan heartbeat
[09:37:20][D][airbnk_mqtt:105]: BLE scan heartbeat
[09:37:25][D][airbnk_mqtt:241]: Got command
[09:37:25][D][airbnk_mqtt:169]: Sending to:
[09:37:25][D][airbnk_mqtt:170]: 58:93:xx:xx:xx:xx
INFO airbnk-gateway.local: Error while reading incoming messages: Error while reading data: [Errno 104] Connection reset by peer
INFO airbnk-gateway.local: Ping Failed: Error while reading data: [Errno 104] Connection reset by peer
INFO Disconnected from ESPHome API for airbnk-gateway.local
WARNING Disconnected from API
INFO Successfully connected to airbnk-gateway.local
[09:37:41][D][airbnk_mqtt:105]: BLE scan heartbeat
[09:37:42][D][api:102]: Accepted ::FFFF:192.168.1.12
[09:37:42][D][api.connection:918]: Home Assistant 2023.1.2 (::FFFF:192.168.1.12): Connected successfully
[09:37:44][D][airbnk_mqtt:123]: Sending adv
[09:37:46][D][airbnk_mqtt:105]: BLE scan heartbeat
[09:37:51][D][airbnk_mqtt:105]: BLE scan heartbeat
[09:37:56][D][airbnk_mqtt:105]: BLE scan heartbeat

Not it is still unavailable and I am not receiving any more ADV messages. I didn't move the ESP so I can't explain why but it looks less stable than before. Ideas?

formatBCE commented 1 year ago

Well, this Is good old WiFi connection problem - but why didn't it recover... Probably, makes sense to flash ESP with serial port. I heard, that some Bluetooth related changes require wired flashing (because of different storage layout). But this is just my guess.

It's freaking strange, that things work for me, but don't work for you - and vice-versa. :)

formatBCE commented 1 year ago

BTW, even before command - you have like 1 adv in 10 seconds. It's pretty bad. For me it's ~ 1 per second. Curious, if it's lock, or gateway problem...

rospogrigio commented 1 year ago

Well it comes and goes. I seem to remember that before the update I had an adv every 3-4 secs, now this is the situation:

[14:51:56][D][airbnk_mqtt:123]: Sending adv
[14:51:57][D][airbnk_mqtt:123]: Sending adv
[14:51:59][D][airbnk_mqtt:105]: BLE scan heartbeat
[14:52:04][D][airbnk_mqtt:105]: BLE scan heartbeat
[14:52:05][D][airbnk_mqtt:123]: Sending adv
[14:52:09][D][airbnk_mqtt:105]: BLE scan heartbeat
[14:52:10][D][airbnk_mqtt:123]: Sending adv
[14:52:14][D][airbnk_mqtt:105]: BLE scan heartbeat
[14:52:19][D][airbnk_mqtt:105]: BLE scan heartbeat
[14:52:20][D][airbnk_mqtt:123]: Sending adv
[14:52:24][D][airbnk_mqtt:105]: BLE scan heartbeat
[14:52:29][D][airbnk_mqtt:105]: BLE scan heartbeat
[14:52:30][D][airbnk_mqtt:123]: Sending adv
[14:52:34][D][airbnk_mqtt:105]: BLE scan heartbeat

I will try to flash it via serial if you say it's worth a try, will let you know...

formatBCE commented 1 year ago

That's definitely NOT a software change. Nothing changed in scan code... Yup, command sending got longer a bit - but nothing in adv listening.

rospogrigio commented 1 year ago

Just reflashed via USB, let's see how it goes....

BTW, even before command - you have like 1 adv in 10 seconds. It's pretty bad. For me it's ~ 1 per second. Curious, if it's lock, or gateway problem...

Maybe it's a matter of distance. Let's see what happens when I setup the fingerprint part, I will put it much closer to the lock.

rospogrigio commented 1 year ago

Well, I made another attempt to close it a couple of times and I triggered a kernel panic I believe! Logs:

[15:07:56][D][airbnk_mqtt:169]: Sending to:
[15:07:56][D][airbnk_mqtt:170]: 58:93:d8:42:4e:3a
[15:07:58][D][airbnk_mqtt:178]: Connected to lock.
[15:07:58][D][airbnk_mqtt:189]: Writing command.
[15:07:59][D][airbnk_mqtt:211]: Successfully sent command.
[15:08:00][D][airbnk_mqtt:211]: Successfully sent command.
[15:08:00][D][airbnk_mqtt:211]: Successfully sent command.
[15:08:01][D][airbnk_mqtt:211]: Successfully sent command.
[15:08:01][D][airbnk_mqtt:211]: Successfully sent command.
[15:08:01]E (752490) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
[15:08:01]E (752490) task_wdt:  - async_tcp (CPU 0/1)
[15:08:01]E (752490) task_wdt: Tasks currently running:
[15:08:01]E (752490) task_wdt: CPU 0: IDLE
[15:08:01]E (752490) task_wdt: CPU 1: IDLE
[15:08:01]E (752490) task_wdt: Aborting.
[15:08:01]
[15:08:01]abort() was called at PC 0x40109b84 on core 0
[15:08:01]
[15:08:01]
[15:08:01]Backtrace:0x400839f1:0x3ffbeaec |<-CORRUPTED
WARNING Found stack trace! Trying to decode it
WARNING Decoded 0x400839f1: panic_abort at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_system/panic.c:402
[15:08:02]
[15:08:02]
[15:08:02]
[15:08:02]
[15:08:02]ELF file SHA256: 0000000000000000
[15:08:02]
[15:08:02]Rebooting...
[15:08:02]ets Jun  8 2016 00:22:57
[15:08:02]
[15:08:02]rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
[15:08:02]configsip: 0, SPIWP:0xee
[15:08:02]clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
[15:08:02]mode:DIO, clock div:2
[15:08:02]load:0x3fff0030,len:1184
[15:08:02]load:0x40078000,len:13132
[15:08:02]load:0x40080400,len:3036
[15:08:02]entry 0x400805e4
[15:08:02][I][logger:258]: Log initialized
[15:08:02][C][ota:469]: There have been 0 suspected unsuccessful boot attempts.
[15:08:02][D][esp32.preferences:113]: Saving 1 preferences to flash...
[15:08:02][D][esp32.preferences:142]: Saving 1 preferences to flash: 0 cached, 1 written, 0 failed
[15:08:02][I][app:029]: Running through setup()...
[15:08:02][C][wifi:037]: Setting up WiFi...
[15:08:02][C][wifi:038]:   Local MAC: 08:3A:F2:52:B0:94
[15:08:02][D][airbnk_mqtt:105]: BLE scan heartbeat
[15:08:02][D][wifi:386]: Starting scan...
[15:08:02][D][airbnk_mqtt:123]: Sending adv
[15:08:06][D][wifi:401]: Found networks:
...
formatBCE commented 1 year ago

Can't think of anything that could do it. But I guess FFF3 on your lock gives troubles.

Well, looks like there's no great solution, which will satisfy everyone. We can't also recommend Ethernet version, because it will have kernel panics too :)

I will use my PR, but don't think we can merge it to main. :(

rospogrigio commented 1 year ago

Well I would like to try the other ESP too, maybe the chipset plays a role? One more strange thing: in ESPhome, I now see the device offline: image

but I can read the logs via USB (of course) and it is sending the adv data, and in fact I receive the data via MQTT and the device appears as available. And this is quite strange. If I try to get the logs wirelessly I read:

INFO Reading configuration /config/esphome/airbnk-gateway.yaml...
INFO Starting log output from airbnk-gateway.local using esphome API
WARNING Can't connect to ESPHome API for airbnk-gateway.local: Error resolving IP address: [Errno -5] No address associated with hostname
INFO Trying to reconnect to airbnk-gateway.local in the background

Can't think of anything that could do it. But I guess FFF3 on your lock gives troubles.

It gives trouble on my lock or on my gateway??

formatBCE commented 1 year ago

I meant, your Lock's FFF3 characteristic is somehow hanging the gateway. Probably, long response, or some exceptions on reading. Can't say more - mine is working flawlessly...

On unavailability: it's interesting, looks like mDNS is breaking. As IP is there (it's posting to MQTT), I guess it's mDNS.

Don't know what to say. I guess you will have to revert back to previous version... :(

ofirsnb commented 1 year ago

Hey guys! I recently got into the smart-locks rabbit-hole and ended up here. I have some experience with ESPhome and BLE, and willing to help with the tests (once the M531 will arrive in few weeks, just ordered it).

Jumping in to your recent conversation if I may, @rospogrigio , regarding your problem in ESPhome, @formatBCE is right it's an issue within mDNS. This is an issue regarding ESPhome and not your components. You have to do the following in order to make the devices available after reboot:

wifi: networks:

formatBCE commented 1 year ago

We definitely would need some help :)

I'm Java/Kotlin guy, and my C code is far from excellence. :)

Feel free to make any changes you can. Also, if you have experience of creating custom ESPHome components, I could use your hand (now all code is just in .h file, because at the time I thought it's the only way to include custom code into ESPHome...)

ofirsnb commented 1 year ago

Great to hear!! While my C skills are also somewhat slim, definitely can try help converting it into an ESPhome custom-component 🤞

Anyway when I'll get the lock, I hope I'll be able to assist even further.

BTW, sorry didn't mention it before, you both did AMAZING job so far, thanks for your efforts! 👏