rospogrigio / airbnk_mqtt

MQTT control of Airbnk locks.
GNU General Public License v3.0
27 stars 6 forks source link

Stability improvements #1

Open rospogrigio opened 2 years ago

rospogrigio commented 2 years ago

Let's use this post to discuss ways to improve the stability of the connection: use a different firmware, rebuild tasmota, etc.

Adrian-at-CrimsonAzure commented 2 years ago

Reposting this here as I think it better fits this repo than the other one:

Has anyone identified the chip on the physical device? I have the M300 Bluetooth version and the chip has a similar pinout to an ESP8266, but I'm not good enough at reverse engineering to be sure. The top of the chip is etched away just enough to make it impossible to make out the original number; at the right angles you can make out some lines, but not enough to form full letters.

formatBCE commented 2 years ago

Reposting this here as I think it better fits this repo than the other one:

Has anyone identified the chip on the physical device? I have the M300 Bluetooth version and the chip has a similar pinout to an ESP8266, but I'm not good enough at reverse engineering to be sure. The top of the chip is etched away just enough to make it impossible to make out the original number; at the right angles you can make out some lines, but not enough to form full letters.

I don't believe someone here was tearing device down I got same lock. However, ESP8266 doesn't have BT, afaik. Instead there is WiFi. Do you think there's ESP32 inside as well?

Adrian-at-CrimsonAzure commented 2 years ago

Main board of M300: Top Bottom

There is a sub-board I was too lazy to get to that all the connectors go to (except bottom right, which goes to the DC motor) but that board seems to just connect the batteries and provide some pre-regulation for the mainboard. The little unpopulated header seems to connect to power and three other pins, probably RX/TX/GPIO0 like most Tuya products. Haven't hooked up anything to it yet, might finally be my excuse to get a logic analyzer...

Completely slipped my mind that the ESP8266 doesn't have Bluetooth, I've been working with the ESP32 too much lately. This has an 8x8 layout where as the ESP32 has a 16x16, so no chance of it being an ESP.

formatBCE commented 2 years ago

@rospogrigio today I had spare time to dig into ESP32 BLE stack. I managed to post advertisement to MQTT, it works very well. Also, the topics to use will be more obvious, no need to create rules, and I believe we will get possibility for integration to send both messages at once, instead of waiting for first chunk to write. It will get your code much shorter and easier to maintain.

However, now I'm stuck with sending data to lock - for some reason, it does stuck, when ESP32 tries to initiate BLE client for connection to characteristic. Hopefully, I will find the solution for this. Other things are working already. (Well, I will have to find a way for configuring ESP on-the-fly also, but it's just matter of writing, I believe).

formatBCE commented 2 years ago

@rospogrigio So for now, I got the way to send commands and scan at same time. However, sending (basically, connecting to lock) is giving me hard times.

I'm getting this error: lld_pdu_get_tx_flush_nb HCI packet count mismatch (0, 1)

It happens on connection, and what's the worse, is that after that ESP becomes unresponsive, I cannot even reconnect with serial monitor. Could be that lock is too far (it is actually far), or power is not enough (it's PC USB all-in-all), but it should at least reboot, but it just hangs.

Didn't find any reliable information on this yet. Gonna keep digging. Any help appreciated.

rospogrigio commented 2 years ago

Sorry but I really don't have any expertise on this, wish I could help... As a side note, I'm working on adding a configurable option to set a desired number of automatic retries in case of a FAILCONNECT event, should be ready very soon. Keep on digging, I'm sure you can make it!

formatBCE commented 2 years ago

So what I have now:

Here I stuck. It says "true", basically reports success from write operation. But lock doesn't respond to commands. Either data format is incorrect (I'm sending from *ptr, so it might be the problem, or maybe I have to convert that from string before sending, don't know), or write operation lies to me (unlikely, 'cause characteristic value seems to be changing). If someone has any clues, what to do or how to debug, welcome.

Next steps (after getting it working, of course):

To the last point: gateway is REALLY powerful and stable. It reboots itself in the matter of seconds in case of failure, and can connect to lock (almost) without failures from like 7-8 meters and 1 wall. That's what I was expecting.

formatBCE commented 2 years ago

Yes, I believe, writing string itself was strange solution :)

Will try to convert it back to bytes (I believe, there's 20 bytes in each of two commands), and send that.

Also, I have some doubts on current integration status determining logic. Although mqtt success received, it's still in operating status. That's fine for now, as it doesn't work, but I guess there should be a way to integrate closer, instead of waiting for changed adv message.

rospogrigio commented 2 years ago

I can help you with this: the FFF3 characteristic (you can read it, right? or at least you can use nRF Connect for this) has some status bytes (byte 5, in detail) that provide a error code in case of failure. This is how I understood that I was sending the wrong payloads, too. Look at this function:

    public static String resultString(byte[] bArr2) {
        String str = "";
        if (bArr2[0] == -86) {
            switch (bArr2[5]) {
                case 0:
                    str = "Success";
                    if (bArr2[3] == 2) {
                        if (bArr2[4] != 1) {
                            if (bArr2[4] == 2) {
                                str = str + ("  Device time:" + ((long) PackMaker.byte4ToInt(bArr2, 6)));
                                break;
                            } else {
                                byte b = bArr2[4];
                                break;
                            }
                        } else {
                            str = str + ("  Service time:" + ((long) PackMaker.byte4ToInt(bArr2, 6)));
                            break;
                        }
                    }
                    break;
                case 1:
                    str = "Fail";
                    break;
                case 2:
                    str = "Invalid role type";
                    break;
                case 3:
                    str = "Invalid operation type";
                    break;
                case 4:
                    str = "Invalid opcode";
                    break;
                case 5:
                    str = "No operation authority";
                    break;
                case 6:
                    str = "Invalid signature";
                    break;
                case 7:
                    str = "Serial number expired";
                    break;
                case 8:
                    str = "Out of check-in time";
                    break;
                case 9:
                    str = "Service time expired";
                    break;
                case 10:
                    str = "Locked, cannot open the door";
                    break;
                case 11:
                    str = "Not initialized";
                    break;
                default:
                    str = "";
                    break;
            }
        }
        String str2 = "" + ((int) bArr2[5]);
        return str;
    }

Hope this might help, bye...

rospogrigio commented 2 years ago

To the last point: gateway is REALLY powerful and stable. It reboots itself in the matter of seconds in case of failure, and can connect to lock (almost) without failures from like 7-8 meters and 1 wall. That's what I was expecting.

This is REALLY impressive!! Can't wait to see it in action!! 😮

formatBCE commented 2 years ago

I MADE IT!

Opening/closing works. Status update is not so well yet, there's gap to fill. @rospogrigio here's the situation, please give me advice:

So what i want you to think:

Meanwhile, i will focus on initial gateway configuration.

formatBCE commented 2 years ago

New updates: I managed to make scanning stop right on command received. Now it responds to commands almost instantly.

rospogrigio commented 2 years ago

Cool, congratulations! So, my few first thoughts after reading this:

rospogrigio commented 2 years ago

Oops, I just read your last message... Great!! When can you share your code and firmware so I can try it?

formatBCE commented 2 years ago

I guess it's about the time. Don't want to get interrupted by something and leave you without anything in hands :)

Let me create PR for your integration, and repo for gateway. Actually, without Arduino IDE you won't be able to run it - all stuff like topics, WiFi parameters and MQTT parameters are hard-coded now. I'm trying to get them into some initial setup. But at least we will have some code in the cloud

rospogrigio commented 2 years ago

Cool! Regarding whether to get the values from advert or FFF3, I seem to remember that nourmehdi wrote that FFF3 was more power consuming, so we'd probably better use the advert since you found a way to interrupt it on command. Can't wait to see the code! 😉

formatBCE commented 2 years ago

Here's PR: https://github.com/rospogrigio/airbnk_mqtt/pull/7 If you prefer another branch, just let me know.

formatBCE commented 2 years ago

And here https://github.com/formatBCE/Airbnk-MQTTOpenGateway is the repository. Change things in Settings.h, build and upload. I use VS code with Platformio plugin.

rospogrigio commented 2 years ago

Super super cool! Tomorrow I'll try to find some time to play with it. Could it be possible to set the options in some other way instead of rebuilding it maybe?

formatBCE commented 2 years ago

Could it be possible to set the options in some other way instead of rebuilding it maybe?

Yes, i'm thinking on making WiFi access point with simple web page for initial setup, and when submitted, it will save prefs, reboot and connect to actual WiFi. I never did it before, so it will take some time. But it's doable.

formatBCE commented 2 years ago

@rospogrigio Ok, i uploaded new version to https://github.com/formatBCE/Airbnk-MQTTOpenGateway You may find built binary for esp32 in repo root (firmware.bin).

Flash it, it will start access point AirbnkOpenGateway. Connect to that AP, and go to 192.168.4.1 in browser.

Fill-in the data there. It will create configuration, and reboot ESP32 to connect to your WiFi. Chip will indicate (most probably) with blue LED, when done. Also you may find messages in MQTT explorer. IP will be there, in "tele" subtopic.

Check it out, and tell me what's there.

If you screw up with config, and it won't connect - after some attempts it will reset config and re-deploy AP. Same config reset you can perform from web UI, by navigating to ESP IP address, given by your router.

Cheers.

formatBCE commented 2 years ago

Ok, I guess tomorrow I will change behavior a bit. Because rebooting my HA host (with mqtt add-on) was enough to bring gateway to reset :) Will make it more patient to missing mqtt connection.

formatBCE commented 2 years ago

Did it, now it will erase prefs only on wifi disconnect.

rospogrigio commented 2 years ago

Ok I have some ideas for the code merging, I'll take care of that so you can concentrate in improving the firmware if you believe there's more work to do there. Good job! 👍

rospogrigio commented 2 years ago

@formatBCE how do I flash the firmware? Can I use Tasmota interface, as far as you know? Edit: I also merged your code, creating a separate class for the new gateway but still keeping the older Tasmota device. I also moved the code generation in a dedicated class. Please see PR #9 , I'd suggest to work from there from now on. Let me know!

rospogrigio commented 2 years ago

OK, flashed, configured and launched HA but all entities are unavailable and I see nothing happening... I might have broken something, how can I debug the communication?? Edit: OK fixed almost everything, I have also managed to operate the lock once, even though it took more than 5 secs to operate. Moreover, after I operate it all entities become unavailable and I no longer receive messages so I have to reset the ESP, is it normal?

formatBCE commented 2 years ago

Hi!

Nope, it's not normal. For me it works all the time - yesterday i tried like 50 times opening and closing, and got no errors. I implemented retry inside of the code, so it tries for 4 times to connect and send. It helped to reduce denial cases almost to zero. You may debug in different ways. First - what you see in logs to your integration? they should have some info. Also, you may download MQTT Explorer and check messages receiving in your root topic there to see exact messages. Also, you can connect ESP to your PC USB, and use Putty to connect to serial port, on which ESP is on (baud rate 115200) and check logs from ESP itself.

I just woke, and realized that my ESP disconnected from WiFi 1 hour ago. I think i will make some remote logging to trace the problem. WiFi/Mqtt part for this project i took from another sketch, so maybe worth optimizing.

Thank you for merge, gonna try that today.

rospogrigio commented 2 years ago

Ok I did some tests and I confirm it works like 100% of the times, and from a bigger distance (like 3-4m, inside a wooden cabinet). With Tasmota I was having 50% of failures from 1m in open air... Awesome job!!! I still have some strange status changes so I agree it's improvable. Also, I had set everything up with Tasmota to allow a gateway to connect to multiple locks (ok, I know it's a very unlikely event so I guess we can keep it like this). I'll test it for some days and will provide suggestions. First one: maybe we can pass the MAC address and the number of retries as parameters from the integration? Think about it and let me know your opinion. Thank you so much, I'm so proud of our achievement!!

formatBCE commented 2 years ago

Thank you for kind words :)

Yes it's doable (both multiple locks and sending parameters over mqtt). If we really need this, i will implement :)

Downside to having multiple locks will be following:

formatBCE commented 2 years ago

I improved stability and reconnection flow for gateway, and optimized usage of resources a bit. Check out my newest commit.

Today my main gateway worked for whole day with current stability changes, no drops. Also, on test gateway i tried to drop it forcefully, it was remaining stable.

We'll see in long run, but i'd consider it as RC.

rospogrigio commented 2 years ago

Ok how can I flash it? I used the tasmota interface the first time but now it's no longer available...

formatBCE commented 2 years ago

Use ESPFlasher tool. It's perfect. I will think on OTA.

Sylvania2 commented 2 years ago

IMG_20211210_092922

Chip in M5xx lock is a CC2642

Best regards

Sylvania2 commented 2 years ago

Hi

ESP32 flashed with firmware.bin from formatBCE. Setup with MQTT WIFI.. MQTTLens subscribes to airbnk_lock/# and i get mac/rssi/data/ip/hostname/scan_dur/wait_dur messages. Copy components/airbnk_mqtt to HASS.IO config and add integration. Entering email and verification code, i get "Success! Created configuration for Airbnk". no serial and mqtt setup ? I don't get any entities.. What to do ?

Best regards

rospogrigio commented 2 years ago

Mmm, you should enable logging and post here what you get in the logs because otherwise it's not easy to understand where the problem is...

Sylvania2 commented 2 years ago

Hi..

Does this help ? I deleted a lot of email and keys. Log ends with: Device 'fordør' is filtered out

2021-12-13 08:24:56 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration airbnk_mqtt which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant 2021-12-13 08:25:02 DEBUG (MainThread) [custom_components.airbnk_mqtt] DEVICES ARE {} 2021-12-13 08:25:40 INFO (MainThread) [custom_components.airbnk_mqtt.airbnk_api] Verification code request succeeded. 2021-12-13 08:25:59 INFO (MainThread) [custom_components.airbnk_mqtt.airbnk_api] Retrieving new Token... 2021-12-13 08:25:59 INFO (MainThread) [custom_components.airbnk_mqtt.airbnk_api] Retrieving: https://wehereapi.seamooncloud.com/api/lock/loginByAuthcode?loginAcct=kim****

2021-12-13 08:26:04 INFO (MainThread) [custom_components.airbnk_mqtt.airbnk_api] Token retrieval succeeded. 2021-12-13 08:26:04 DEBUG (MainThread) [custom_components.airbnk_mqtt.config_flow] Done!: ****

2021-12-13 08:26:04 INFO (MainThread) [custom_components.airbnk_mqtt.airbnk_api] getCloudDevices. 2021-12-13 08:26:04 INFO (MainThread) [custom_components.airbnk_mqtt.airbnk_api] Retrieving: https://wehereapi.seamooncloud.com/api/v2/lock/getAllDevicesNew?language=2&userId=**

2021-12-13 08:26:07 DEBUG (MainThread) [custom_components.airbnk_mqtt.airbnk_api] GetCloudDevices succeeded (200): { "code":200, "data":[ { "sn":"1200566", "deviceName":"fordør",****


"info":"OK",
"totalNum":0,
"totalPage":0

} 2021-12-13 08:26:07 INFO (MainThread) [custom_components.airbnk_mqtt.airbnk_api] Device 'fordør' is filtered out 2021-12-13 08:26:07 DEBUG (MainThread) [custom_components.airbnk_mqtt] DEVICES ARE {}

Best regards

Sylvania2 commented 2 years ago

Hi

if dev_data["gateway"] == "": _LOGGER.info("Device '%s' is filtered out", dev_data["deviceName"])

Gateway ?

Best regards

rospogrigio commented 2 years ago

OK my fault. The fact is that when we call the getAllDevicesNew cloud API we receive information also about the gateways and fingerprint devices, and I wanted to filter them out. The best way I thought that it was to filter out devices who have the "gateway" attribute empty, but I did not consider that some users could have not bought the W100 gateway. I will fix this, in the meantime you could just remove the if block, or replace == with != and you should be able to proceed. Let me know, bye!

Sylvania2 commented 2 years ago

Yep .. != works

Best regards

rospogrigio commented 2 years ago

I have fixed it and commited in the PR, then merged to master and published a new release.

rospogrigio commented 2 years ago

So @formatBCE , I've been trying for some time the integration with the new FW and it works quite ok. I see some space for improvements though, in detail with the advert receival. I noticed that the telemetry message is received very punctually (every 10 sec.), while the advert message is very variable (from 1-2sec. to more than 30s, in my case). This makes the status update quite delayed, and as a consequence if one needs to send double open commands, might need to wait quite some time before he can send the second command. So, we should consider updating status (and lockEvents) from FFF3 characteristic, at least right after sending a command, since nourmehdi says that it's much more power consuming than receiving advert messages. So, here are the questions for you: 1) is there a way to force the request of an advert message, or having it being sent more punctually? I believe not but you know more than me on the matter; 2) as an alternative, can you implement the possibility to retrieve the data from FFF3 characteristic? I can implement the message parsing in the integration, to make you save time. I'd be curious to see if it is received more quickly than advert; 3) another option would be to trust the "success" command received, and update the status and increment the lockEvents forcefully. Let me know your thoughts, thank you!

formatBCE commented 2 years ago

So @formatBCE , I've been trying for some time the integration with the new FW and it works quite ok. I see some space for improvements though, in detail with the advert receival. I noticed that the telemetry message is received very punctually (every 10 sec.), while the advert message is very variable (from 1-2sec. to more than 30s, in my case). This makes the status update quite delayed, and as a consequence if one needs to send double open commands, might need to wait quite some time before he can send the second command. So, we should consider updating status (and lockEvents) from FFF3 characteristic, at least right after sending a command, since nourmehdi says that it's much more power consuming than receiving advert messages. So, here are the questions for you: 1) is there a way to force the request of an advert message, or having it being sent more punctually? I believe not but you know more than me on the matter; 2) as an alternative, can you implement the possibility to retrieve the data from FFF3 characteristic? I can implement the message parsing in the integration, to make you save time. I'd be curious to see if it is received more quickly than advert; 3) another option would be to trust the "success" command received, and update the status and increment the lockEvents forcefully. Let me know your thoughts, thank you!

Hi!

Yes, adverts are sent as they received, which is sometimes the issue for BLE - I noticed, that for this lock I have less advertisements found, than for my BLE tags (I use them for presence tracking, and scanning code is pretty same).

To questions:

  1. No, unfortunately, it's not. There is so-called BLE active scan, which requires answer from server (lock), but it is a lot harder on BT stack, both ESP and lock, which means lagging for gateway, and battery drain on lock. Moreover, I tried active scan in my research, and it didn't show more productivity with the lock.
  2. Yes, I can do that. It shouldn't be so hard - to read FFF3 after writing, and send state in operation results. Since we will be doing that once, there won't be any big battery consumption. I found though, that lock starts to show new state in adverts after some time, so we might have some delay set up to avoid confusing in integration logic.
  3. It would be cool, but if we write (unpurposely) something wrong to characteristic, I don't know if we will get error in all cases, so it's not 100%-trustful, I believe.

Let's stick with FFF3 reading for now, I will do my best to make that change ASAP. Cheers.

formatBCE commented 2 years ago

@rospogrigio I just committed version 1.0.1 https://github.com/formatBCE/Airbnk-MQTTOpenGateway

formatBCE commented 2 years ago

Rewrote OTA natively, excluded third-party library. Takes less space, no annoying ADS and confusing interface Keeping it simple.

rospogrigio commented 2 years ago

Great! Would there be the possibility to keep the configuration when flashing? It is annoying to re-configure it every time... Edit: I've been trying the new fw but it's not working as expected: the FFF3 data does not contain the info that should be there (lockEvents and battery status...), so I believe that the characteristic is updated later. So, either we introduce a delay before we read it or we introduce a call to request its value and we call it after some time (like 0.5s or 1s)... what do you think? The immediate value can be useful because at first it should contain the error code, so you can actually verify if the writing was successful, so the second option might be the best. Let me have your thoughts, bye!

formatBCE commented 2 years ago

Great! Would there be the possibility to keep the configuration when flashing? It is annoying to re-configure it every time... Edit: I've been trying the new fw but it's not working as expected: the FFF3 data does not contain the info that should be there (lockEvents and battery status...), so I believe that the characteristic is updated later. So, either we introduce a delay before we read it or we introduce a call to request its value and we call it after some time (like 0.5s or 1s)... what do you think? The immediate value can be useful because at first it should contain the error code, so you can actually verify if the writing was successful, so the second option might be the best. Let me have your thoughts, bye!

Ok, I got little bit confused here.

  1. My configuration doesn't drop on flashing. Only on reset. Maybe, it's the matter of device I use, idk...
  2. For me, FFF3 reading returns exactly that hex data, that I read from it with nRF app. Always updated.
  3. However, i found the way to bug the process. Nourmehdi mentioned lock state, when it becomes irresponsive. So if I send several commands to the lock without waiting for adv in between - it hangs the lock. I believe it's completely normal - integration doesn't know about new state yet, thus payload is wrong. What's interesting - FFF3 starts to send payload with mostly zeroes after that. And sometimes, even though lock is opening/closing as expected, this payload is still corrupted. Manual manipulation cures it for me. And if I wait between commands - problem doesn't appear. Could you try to reproduce this? Meanwhile, I will try to introduce some easier way to restore config for you.
formatBCE commented 2 years ago

@rospogrigio here https://www.esp32.com/viewtopic.php?t=10487 someone has same problem. Quote from that thread NVS is not typically erased during a firmware flash

Do we have someone else to confirm this behaviour?

rospogrigio commented 2 years ago
  1. For me, FFF3 reading returns exactly that hex data, that I read from it with nRF app. Always updated.

Well, this is not what I am experiencing. This is what I read in the logs after my last closing: 2021-12-14 20:39:09 DEBUG (MainThread) [custom_components.airbnk_mqtt.custom_dev ice] Received operation result {"success":true,"error":"","sign":1639510746,"mac ":"58:xx:xx:xx:xx:3A","lockStatus":"AA00040312001500000000000000000000000000"}

But this is what I see in nRF Connect: (here you can see the lockEvent and battery data we need: IMG_20211214_205325

I don't understand what's going on and why my lock behaves differently from yours... Can you? And how can we overcome this? Maybe introducing a request for the FFF3 data?

formatBCE commented 2 years ago

@rospogrigio ok you're right, it happened to me today several times. I made 1.0.3. In that, i introduced reading fff3 until it returns correct value (but not more than 5 times) with delay of 200 millis. Works for me every time.

Since, i believe, integration doesn't count on that value yet, correct combination would be:

If i don't wait for adv, after second operation lock becomes unresponsive. I believe, it's because integration doesn't know new parameters yet, so command is incorrect. I hope it will be fixed with my new changes, as you can rely on fff3 result now.

If i wait for adv, it works flawlessly. Try it please.

P.S. introducing separate topic for fff3 reading command would be overkill, i think. If it's absolutely necessary, i can do it (probably, i will change command format instead). But i hope current state will be enough.

rospogrigio commented 2 years ago

I tried it but it still shows the same behavior 😢 But I updated it using OTA and it kept the configuration so no need to set it up again, good job 👍 Any other ideas for the FFF3 issue?