technyon / nuki_hub

Use an ESP32 as a Hub between a NUKI Lock and your smarthome.
MIT License
524 stars 40 forks source link

"Client nukihub already connected, closing old connection" with v7 #87

Closed mahikeulbody closed 1 year ago

mahikeulbody commented 1 year ago

Hi,

I upgraded my nuki hub with the last firmware. Now I have many :

Client nukihub already connected, closing old connection New client connected from 192.168.0.227:58397 as nukihub"

With the previous version (v6.11) I had some reconnections because of malformed mqtt packet. It seems I have not anymore these malformed packet, only these new messages.

Mincka commented 1 year ago

Good suggestion, I'll do it with my second Atom Lite.

Mincka commented 1 year ago

So I would exclude an hardware issue related to the M5 Atom Lite. Either the Atom with ESPHome crashes in a few minutes/hours and it's related to Wi-Fi, or it stays up and it's related to the firmware?... Suspense...

alexdelprete commented 1 year ago

Julien, can you flash the ESPHome Bluetooth Proxy on the 3rd Atom Lite? This is officially supported and integrated in HA. I already use one of these, so we can compare results. Flashing is super simple from that webpage. :)

Mincka commented 1 year ago

It's already ESPHome with esp32_ble_tracker so I could enable the BT proxy but not sure what would be the added value for this specific troubleshooting. I use the BLE tracker for the ble_rssi module that tracks the beacons of the lock. Let's continue to discuss about this on Discord, that's another thread where we aren't discussing anymore about the original issue. Completely my fault, sorry OP!

mahikeulbody commented 1 year ago

I continue having 1 or 2 messages per hour as this one :

2023-01-26 15:57:30: New connection from 192.168.0.227:57921 on port 1883.
2023-01-26 15:57:30: Client nukihub already connected, closing old connection.
2023-01-26 15:57:30: New client connected from 192.168.0.227:57921 as nukihub (p2, c1, k60, u'xxx').

but for the first time, this afternoon nuki hub seems to have crashed without to be able to restart by itself. The ESP was no more visible from my wifi AP and it was unreachable from my browser. I had to unplug and plugin it in order it could come back to network.

And then I get the "Client nukihub already connected, closing old connection" message" as usual.

alexdelprete commented 1 year ago

I continue having 1 or 2 messages per hour as this one :

Are you using the CS=false fw?

For the wifi, what brand of APs are you using?

mahikeulbody commented 1 year ago

I am using 7.2 version. Where can I download CS=false fw ? My AP is the one of my internet box (Freebox).

mahikeulbody commented 1 year ago

To speak frankly, I am more worried about this first crash of nukihub where it is not able to restart by itself than the "client nukihub already" message.

alexdelprete commented 1 year ago

You are right. But probably the two things are related, at least the hypothesis is that there could be a wifi issue, but it's just a feeling for now. :)

Do you have a lot of reconnects?

mahikeulbody commented 1 year ago

Do you have a lot of reconnects?

Not so much. For example, now, no one during the last 90'. But in average, one or two per hour.

that there could be a wifi issue.

A wifi issue should not crash the fw, just may be restart it (theoretically, of course).

alexdelprete commented 1 year ago

A wifi issue should not crash the fw, just may be restart it (theoretically, of course).

Key word is "should". :)

But I like searching for the root causes, more than the recovery stuff. A restart every 0.5-1h is not normal.

I had 2 restarts in 24h, and I'm not satisfied about this. Let alone every 30m...

mahikeulbody commented 1 year ago

Well, it seems to me it is the first time since I check that I have 90' without at least one reconnection (but I am not checking every hour). Coincidentally (or not...) the nukihub crashed 90' ago and since I restarted it manually, no more reconnection.

alexdelprete commented 1 year ago

I am using 7.2 version. Where can I download CS=false fw ?

If you want to try it, this is 7.3 with CS=false and QoS=1 that I compiled: [nuki_hub_7.3-adp-cs0-qos1.zip]

7.3 was just released, with the CSS fix for chrome/edge browsers. I just changed CS and QoS.

mahikeulbody commented 1 year ago

Yes, I will try it but before I prefer to wait a new reconnect since it seems I have no more them a longer time as usual.

alexdelprete commented 1 year ago

Yes, I will try it but before I prefer to wait a new reconnect since it seems I have no more them a longer time as usual.

I agree...more data we have, the better.

alexdelprete commented 1 year ago

cc @mahikeulbody @Mincka @mundschenk-at

UPDATE: @technyon just released this test version (link below), it has an updated wifi manager that manages reconnections, hopefully it will also improve wifi connection. It has CS=false and QoS=1.

Share your findings. We're also on discord. 🤞🏼

nuki_hub_7.3-wm.zip

mahikeulbody commented 1 year ago

Just in time : I got a new reconnection :-) I will try this 7.3-wm fw.

technyon commented 1 year ago

@mahikeulbody Please recheck with release 8.0

mundschenk-at commented 1 year ago

For me, 8.0 has fixed all reconnection issues.

mahikeulbody commented 1 year ago

I feel I have less occurrences but I have someones still :

2023-01-29 12:09:29: New connection from 192.168.0.227:64973 on port 1883.
2023-01-29 12:09:29: Client nukihub already connected, closing old connection.
2023-01-29 12:09:29: New client connected from 192.168.0.227:64973 as nukihub (p2, c0, k15, u'xxx')
technyon commented 1 year ago

What's "less"? If it occurs only rarely, I think it's fine. It more or less only a warning.

mundschenk-at commented 1 year ago

I've had a log trace running for the last two days (by accident) and the only MQTT reconnects where when the ESP rebooted either due to changed settings or firmware updates (and once because no BLE beacons were detected for the set timeframe).

technyon commented 1 year ago

Yes, this can be normal behaviour. Maybe the ESP rebooted, or something interrupted the connection. The ESP couldn't close the connection, and reconnected giving that warning,

mahikeulbody commented 1 year ago

I don't know why but the log of Mosquitto shows me only a couple of hours so it is difficult to do a statistic. I checked this night and there was no reconnection during the two hours of the log ; later, this morning, I had 3 reconnections in less than 30'. So, the issue is still there, no doubt. But as long as there is no need to restart manually nuki hub, it is not a real problem for me.

mundschenk-at commented 1 year ago

Are you logging reboots of your ESP? If those reconnects are from reboots, there is no issue here.

technyon commented 1 year ago

... or let's rather say the issue is the reboots. The many reconnects before came most likely from the mqtt library which is now replaced. So if it happens rarely, the next step would be to investigate the reboots (which will be very challenging like all "sometimes" bugs).

mundschenk-at commented 1 year ago

Yes, that's what I meant by "here". I should have emphasized that more.

mahikeulbody commented 1 year ago

Are you logging reboots of your ESP? If those reconnects are from reboots, there is no issue here.

How I can do that ?

mundschenk-at commented 1 year ago

I use an automation in HA to log nuki/maintenance/log to file, and via the uptime sensor I've created in HA (from the nuki/maintenance/uptime topic). If you have not enabled the latter, you could at least deduce the time of the last reboot from the current nuki/maintenance/uptime message.

mahikeulbody commented 1 year ago

Are you logging reboots of your ESP? If those reconnects are from reboots, there is no issue here.

How I can do that ?

technyon commented 1 year ago

Log the maintenance/uptime node. It holds the number of minutes the device is up. After reboot, it starts counting again at 0.

alexdelprete commented 1 year ago

Are you logging reboots of your ESP? If those reconnects are from reboots, there is no issue here.

How I can do that ?

mqtt:
  sensor:
    - name: "Portoncino Log"
      state_topic: "nuki_hub/maintenance/log"
    - name: "Portoncino Uptime"
      state_topic: "nuki_hub/maintenance/uptime"
      device_class: duration
      unit_of_measurement: min
mahikeulbody commented 1 year ago

I had two "Client nukihub already connected" messages at the same time that uptime restarted from 0. Last uptime before to restart was 143'. Of course, I have no idea of the reason to restart. I will post more uptime values.

alexdelprete commented 1 year ago

the graph is enough, from the sensor:

image

mahikeulbody commented 1 year ago

I think it is much better than before v8 (at least for me).

Screenshot_20230130_094326

alexdelprete commented 1 year ago

Indeed, I've passed 36h since I upgraded to v8. Best version so far.

image

mahikeulbody commented 1 year ago

image

There are some very small uptime. I have a question : is nuki hub able to restart alone if it crashes ? If not, what are the events/conditions which drive it to decide to restart ?

I have "Network Timeout until restart", "Restart timer" and "Restart if bluetooth beacons not received" set to -1 so they are not these conditions. So what else ?

Is it possible it restarts because the usb charger fails sometimes ? I will try with another charger during this night.

technyon commented 1 year ago

You should try disabling restart timer, that will make it restart after the configured time, unconditionally.

The other two are basically about 1. network fails and 2. bluetooth fails.

Making sure to have a good power supply is a good idea and can have an effect on stability. Also make you use a good USB cable.

mahikeulbody commented 1 year ago

You should try disabling restart timer [...]

As I said in my previous comment, it is disabled already.

I tried with a another usb supply, without success.

Mincka commented 1 year ago

@mahikeulbody what ESP hardware do you use? Which Wi-Fi brand and setup (mesh, APs, which bands enabled...) do you have?

mahikeulbody commented 1 year ago

I have this ESP. The wifi is provided by the internet box of my FAI. It is a little bit old so it manages only Wifi b/g/n (2,4Ghz). I set 20Mhz of band size. Signal "force" is about -69dB. I have also unset "Restart on disconnect" and set "Network Timeout until restart" to -1. So, right now, I have no more configurable conditions which lead to a restart run by the fw. Do exist another conditions tested by the fw to restart ? If not, that would imply that nuki hub crashes and it restarts by itself. But is the ESP really able to restart by itself in case of crash ? At the moment, I am trying with another usb power supply (another one) and another wifi canal.

Mincka commented 1 year ago

Yes, the ESP can restart "by itself". I have the same issue with Atom M5 Lite. I even tried on a phone Wi-Fi AP and same behavior, random reboots after 15 minutes to 5 hours. Nobody knows what's the root cause because others don't have them or just don't know about them because it reboots fast. The issue can be either software, hardware, BT radio or Wi-Fi radio, or something else...

If you can log serial output, you may see this: image

The reason in my case is "LoadProhibited" and it can because of memory leak, memory corruption or corrupted flash. In that case a watchdog restarts the ESP and there is no logging in Nuki_Hub. I don't even think this can be caught with a giant try/catch.

mahikeulbody commented 1 year ago

random reboots after 15 minutes to 5 hours.

I have about the same max uptime : 4,8 hours.

or just don't know about them because it reboots fast.

May be, indeed. But some others here check the uptime and they don't have anymore the problem since v8.

Mincka commented 1 year ago

In fact, they did not have the same issues. We troubleshot the reboot issues together on Discord and their ESP rebooted for other reasons, mainly because of issues with dependency libraries. To be sure about the type of reboot in your case, you need to log serial for a few hours and you may see this kind of message.

Mincka commented 1 year ago

I installed many versions for the past 10 days and here are the results: image The giant uptime of 1500 minutes one time happened only once.

mahikeulbody commented 1 year ago

In fact, they did not have the same issues.

I think we had also these mqtt libraries issues, which seems now corrected with v8 fw logically also for us. But in our case, there is at least one additional issue.

Mincka commented 1 year ago

Yes, I tried to find things in common with you but you don't have the same Wi-Fi brand (I have Netgear AP with 2.4/5Ghz), nor the same ESP (I use M5 Atom Lite). What's your lock? I have version 3.0 non pro. Are you French btw? "mahikeul" sounds a lot like something done by a French speaking person. :)

mahikeulbody commented 1 year ago

I have a Nuki 2.0 (purchased with the bridge). (and, yes, I am french ;-)

mahikeulbody commented 1 year ago

I changed quickly many times the wifi canal and then nuki hub crashed... without to be able to restart by itself. My AP displays the period of wifi inactivity of each device connected and I can see the wifi inactivity of nuki hub is in the range [0..14"]. But this afternoon I set my AP to 40Mhz wide (2 x 20 Mhz) and the wifi inactivity of nuki hub was in the range [0..3"] and I got an uptime of 200' (which is good in my statistics). When finally nuki hub restarted, I check the wifi on the AP : it was back to 20 Mhz wide and the wifi inactivity of nuki hub was back to [0..14"] range. I don't know if these informations are useful or relevant but I feel that could be something to investigate.

Mincka commented 1 year ago

Interesting, 40 Mhz is disabled at home on 2.4 Ghz. I'll try to enable 20/40 coexistence to see if it makes any difference. It was disabled because I did not see any reason to let AP switch between both on 2.4 Ghz. However, even if I live in an area with few neighbor networks, the BT devices may cause issues on this band. The channel is set to a specific one but I'll also try to set it back to auto.

Yesterday I tried to disable HomeKit support (that was enabled on one of the locks) and authorized pairing. It did not make any difference.