zigpy / bellows

A Python 3 project to implement EZSP for EmberZNet devices
GNU General Public License v3.0
186 stars 88 forks source link

[bellows.uart] Lost serial connection: read failed: device reports readiness to read but returned no data (device disconnected or multiple access on port?) #357

Closed smmoroz closed 3 years ago

smmoroz commented 4 years ago

Hi,

Since a few days now I have the issue with ZHA with Elelabs ELR022:

2020-10-02 05:13:26 ERROR (bellows.thread_0) [bellows.uart] Lost serial connection: read failed: device reports readiness to read but returned no data (device disconnected or multiple access on port?)

The bug is 100% reproducible.

After such message all my Zigbee devices become Unavailable.

The log from bellows and zigpy from 02.10.2020 is home-assistant-2020-10-02-ezsp-shutdown-full.log.

I'm currently running HA 0.115.5.

I'm ready to help with the debug.

Adminiuga commented 4 years ago

Provide more details about the environment. But looks like usb issue or driver issue.

smmoroz commented 4 years ago

I'm running HA 0.155.6 now on RPi4:

arch armv7l
chassis embedded
dev false
docker true
docker_version 19.03.11
hassio true
host_os HassOS 4.13
installation_type Home Assistant OS
os_name Linux
os_version 4.19.127-v7l
python_version 3.8.5
supervisor 245
timezone Europe/Moscow
version 0.115.6
virtualenv false

I use Elelabs ELR022 shield:

bash-5.0# python3 Elelabs_EzspFwUtility.py probe -p /dev/ttyAMA0
2020/10/06 16:55:02 Elelabs_EzspFwUtility:   Elelabs adapter detected:
2020/10/06 16:55:02 Elelabs_EzspFwUtility:   Adapter: ELR022
2020/10/06 16:55:02 Elelabs_EzspFwUtility:   Firmware: 6.3.0-178
2020/10/06 16:55:02 Elelabs_EzspFwUtility:   EZSP v6

I'm using the shield for about 9 months. I had almost no problems except for these messages:

2020-10-06 12:21:17 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart

But device didn't went completely offline. But since I added few more Zibgee devices (Sonoff, Legrand) I have this issue.

Do you need more information? Which info?

Adminiuga commented 4 years ago

That's the information I was looking for. regarding the Lost serial connection: read failed: device reports readiness to read it seems the NCP reset routine went sideways, but what is more concerning, that quite often you get:

2020-10-02 05:00:23 DEBUG (MainThread) [bellows.ezsp.protocol] Send command nop: ()
2020-10-02 05:00:23 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'53762157542fed357e'
2020-10-02 05:00:23 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'3676a157542f2c147e'
2020-10-02 05:00:23 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'8430fc7e'
2020-10-02 05:00:23 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame 5 (nop) received: b''
2020-10-02 05:00:36 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'4676b157546f15b658944a24ab5593499c0ef37aa9309874dbde3783fd817db8eaec6d648bde6fc8d0f34c8c4007abec763ba58e5c4749fd6d1d03bbc17038aa237e'
2020-10-02 05:00:36 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'8520dd7e'
2020-10-02 05:00:36 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'4e76b157546f15b658944a24ab5593499c0ef37aa9309874dbde3783fd817db8eaec6d648bde6fc8d0f34c8c4007abec763ba58e5c4749fd6d1d03bbc1703835c37e'

...

2020-10-02 05:00:36 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'8610be7e'
2020-10-02 05:00:36 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'6e76b157546f15b658904124ab1593499cae431eee8e9874f5ded183f77b16a7ebcb397e'
2020-10-02 05:00:36 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'87009f7e'
2020-10-02 05:00:36 DEBUG (bellows.thread_0) [bellows.uart] Error code: ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT, Version: 2, frame: b'c20251a8bd7e'
2020-10-02 05:00:36 DEBUG (bellows.thread_0) [bellows.uart] Error code: ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT, Version: 2, frame: b'c20251a8bd7e'
2020-10-02 05:00:36 DEBUG (bellows.thread_0) [bellows.uart] Error code: ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT, Version: 2, frame: b'c20251a8bd7e'
2020-10-02 05:00:36 DEBUG (bellows.thread_0) [bellows.uart] Error code: ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT, Version: 2, frame: b'c20251a8bd7e'
2020-10-02 05:00:36 DEBUG (bellows.thread_0) [bellows.uart] Error code: ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT, Version: 2, frame: b'c20251a8bd7e'

So the NCP resets are caused by these Error code: ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT and you get these because prior if you notice, there's some UART traffic at 2020-10-02 05:00:23, no traffic for 13s and then burst of traffic at 2020-10-02 05:00:36 including the MAX ACK Timeout error. It should not be quiet in those 13s, because bellows sends a 'nop' command every 10s when it is operating normally.

The only one reported issue with similar behavior -- bellows freezing, was caused by a bad power supply. In that specific case it was always reproducible and triggered by exact events, but the fix was the replacement of the power supply. I'd start with that 1st, as having so many errors poping in the relativerly short amount of time is not typical, not unless the system is overloaded.

smmoroz commented 4 years ago

I'm using official Raspberry Pi 4 power supply. I don't have any additional hardware (no USB, not SSD, nothing). Is it enough? Or should I consider even more powerful power supply?

As for MAX ACK timeout error.

Indeed, I saw that strange behavior at 2020-10-02 05:00:23. But let's look at 2020-10-02 04:59:20. Between 04:59:20 and 04:59:51 11 messages were received with 13s max time difference (04:59:25 and 04:59:38) between the messages. But there was no MAX ACK timeout error.

It is also strange to me, that only 1 message was processed during this time period by ezsp.protocol component. Processing took just 2 seconds (from 04:59:52 to 04:59:53).

I checked the log for this period. There is almost nothing interesting there, except this:

2020-10-02 04:59:53 DEBUG (MainThread) [homeassistant.core] Bus:Handling <Event timer_out_of_sync[L]: seconds=32.17796659399755>
2020-10-02 04:59:53 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.legrand_connected_outlet_electrical_measurement is taking over 10 seconds
2020-10-02 04:59:53 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.legrand_connected_outlet_dbce0b00_electrical_measurement is taking over 10 seconds
2020-10-02 04:59:53 WARNING (MainThread) [homeassistant.components.sensor] Updating zha sensor took longer than the scheduled update interval 0:00:30

What does it mean timer_out_of_sync?

And also there was almost no CPU load at that time.

2020-10-02 04:59:53 DEBUG (MainThread) [homeassistant.core] Bus:Handling <Event state_changed[L]: entity_id=sensor.processor_use, old_state=<state sensor.processor_use=15; unit_of_measurement=%, friendly_name=Processor use, icon=mdi:cpu-32-bit @ 2020-10-02T04:59:19.347021+03:00>, new_state=<state sensor.processor_use=25; unit_of_measurement=%, friendly_name=Processor use, icon=mdi:cpu-32-bit @ 2020-10-02T04:59:53.609635+03:00>>

What would you recommend to do now?

Adminiuga commented 4 years ago

the recommendation is still the same: Replace the power supply. Make sure the home-assistant_v2.db stays well bellow 400MB. How old is the SD card? timer_out_of_sync for 32s is way too much, especially as you say the CPU is not busy. But that's another pointer to the possible problem with a power supply.

smmoroz commented 4 years ago

SD card is about 6 months old.

home-assistant_v2.db is 804.0M 😄 I will purge it now.

I will also try a more powerful power supply.

I'll report back.

smmoroz commented 4 years ago

@Adminiuga I've purged home-assistant_v2.db file yesterday at 9PM. Now it is about 310MB, instead of 804MB.

Since that time I had only one EZSP reset. The log is below.

I will keep monitoring the situation. But so far, it seems, as you suggested, that the problem is related to the high load of my RPi4. Maybe it is worth to mention in the documentation somehow.

I will also try a new power supply later this week.

2020-10-08 09:36:55 DEBUG (MainThread) [bellows.ezsp.protocol] Send command nop: ()
2020-10-08 09:36:55 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'55962157542f5bd87e'
2020-10-08 09:36:55 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'5696a157542f48007e'
2020-10-08 09:36:55 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'8610be7e'
2020-10-08 09:36:55 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame 5 (nop) received: b''
2020-10-08 09:37:06 DEBUG (MainThread) [bellows.ezsp.protocol] Send command sendUnicast: (<EmberOutgoingMessageType.OUTGOING_DIRECT: 0>, 0xBD12, EmberApsFrame(profileId=260, clusterId=2820, sourceEndpoint=1, destinationEndpoint=1, options=<EmberApsOption.APS_OPTION_ENABLE_ROUTE_DISCOVERY|APS_OPTION_RETRY: 320>, groupId=0, sequence=165), 166, b'\x00\xa5\x00\x0b\x05')
2020-10-08 09:37:06 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'66972157541e15a0e4904b21a15493099d4e270e4bcb672efdcd66e2747e'
2020-10-08 09:37:06 DEBUG (MainThread) [bellows.ezsp.protocol] Send command nop: ()
2020-10-08 09:37:06 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'6797a157541e1559f8c47e'
2020-10-08 09:37:06 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'87009f7e'
2020-10-08 09:37:06 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'77942157542fa17d337e'
2020-10-08 09:37:06 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'7094a157542fb46a7e'
2020-10-08 09:37:06 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'8070787e'
2020-10-08 09:37:06 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame 52 (sendUnicast) received: b'00eb'
2020-10-08 09:37:06 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame 5 (nop) received: b''
2020-10-08 09:37:06 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'0094b157541515a0e4904b21a15493099d4e27404bce677bb07e'
2020-10-08 09:37:06 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'8160597e'
2020-10-08 09:37:06 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame 63 (messageSentHandler) received: b'0012bd0401040b010140010000eba60000'
2020-10-08 09:37:06 DEBUG (MainThread) [bellows.ezsp.protocol] Send command sendUnicast: (<EmberOutgoingMessageType.OUTGOING_DIRECT: 0>, 0x4003, EmberApsFrame(profileId=260, clusterId=2820, sourceEndpoint=1, destinationEndpoint=1, options=<EmberApsOption.APS_OPTION_ENABLE_ROUTE_DISCOVERY|APS_OPTION_RETRY: 320>, groupId=0, sequence=167), 168, b'\x00\xa7\x00\x0b\x05')
2020-10-08 09:37:06 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'1094b157546f15b658904124ab1593499cec8b6cff739874f4dec688f77b3f8eebcd67c77e'
2020-10-08 09:37:06 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'82503a7e'
2020-10-08 09:37:06 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'02952157541e15b119904b21a15493099d4e270c45cb672cfdcd66716a7e'
2020-10-08 09:37:06 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame 69 (incomingMessageHandler) received: b'000401040b010140010000a2acc712bdffff0918a5010b0500290000'
2020-10-08 09:37:06 DEBUG (MainThread) [zigpy.zcl] [0xbd12:1:0x0b04] ZCL deserialize: <ZCLHeader frame_control=<FrameControl frame_type=GLOBAL_COMMAND manufacturer_specific=False is_reply=True disable_default_response=True> manufacturer=None tsn=165 command_id=Command.Read_Attributes_rsp>
2020-10-08 09:37:06 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'2195a157541e155e9edb7e'
2020-10-08 09:37:06 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'83401b7e'
2020-10-08 09:37:06 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame 52 (sendUnicast) received: b'00ec'
2020-10-08 09:37:06 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'3195b157541515b119904b21a15493099d4e274745ce675d757e'
2020-10-08 09:37:06 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'8430fc7e'
2020-10-08 09:37:06 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'4195b157546f15b658904124ab1593499c427b18ee8e9874f4dec488f77b3f8eebcd7fbe7e'
2020-10-08 09:37:06 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'8520dd7e'
2020-10-08 09:37:07 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame 63 (messageSentHandler) received: b'0003400401040b010140010000eca80000'
2020-10-08 09:37:07 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame 69 (incomingMessageHandler) received: b'000401040b0101400100000c5cb30340ffff0918a7010b0500290000'
2020-10-08 09:37:07 DEBUG (MainThread) [zigpy.zcl] [0x4003:1:0x0b04] ZCL deserialize: <ZCLHeader frame_control=<FrameControl frame_type=GLOBAL_COMMAND manufacturer_specific=False is_reply=True disable_default_response=True> manufacturer=None tsn=167 command_id=Command.Read_Attributes_rsp>
2020-10-08 09:37:16 DEBUG (MainThread) [bellows.ezsp.protocol] Send command nop: ()
2020-10-08 09:37:16 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'159a2157542fbae37e'
2020-10-08 09:37:16 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'529aa157542fc58a7e'
2020-10-08 09:37:16 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'8610be7e'
2020-10-08 09:37:16 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame 5 (nop) received: b''
2020-10-08 09:37:32 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'629ab157546f15b658924a24ab1593499c437b18ee8e9874fadeeb83fc7e2fa6f6857e'
2020-10-08 09:37:32 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'87009f7e'
2020-10-08 09:37:32 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'6a9ab157546f15b658924a24ab1593499c437b18ee8e9874fadeeb83fc7e2fa61cf77e'
2020-10-08 09:37:32 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'87009f7e'
2020-10-08 09:37:32 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'6a9ab157546f15b658924a24ab1593499c437b18ee8e9874fadeeb83fc7e2fa61cf77e'
2020-10-08 09:37:32 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'87009f7e'
2020-10-08 09:37:32 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'6a9ab157546f15b658924a24ab1593499c437b18ee8e9874fadeeb83fc7e2fa61cf77e'
2020-10-08 09:37:32 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'87009f7e'
2020-10-08 09:37:32 DEBUG (bellows.thread_0) [bellows.uart] Error code: ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT, Version: 2, frame: b'c20251a8bd7e'
2020-10-08 09:37:32 DEBUG (bellows.thread_0) [bellows.uart] Error code: ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT, Version: 2, frame: b'c20251a8bd7e'
2020-10-08 09:37:32 DEBUG (bellows.thread_0) [bellows.uart] Error code: ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT, Version: 2, frame: b'c20251a8bd7e'
2020-10-08 09:37:32 DEBUG (bellows.thread_0) [bellows.uart] Error code: ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT, Version: 2, frame: b'c20251a8bd7e'
2020-10-08 09:37:32 DEBUG (bellows.thread_0) [bellows.uart] Error code: ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT, Version: 2, frame: b'c20251a8bd7e'
2020-10-08 09:37:32 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame 69 (incomingMessageHandler) received: b'00040106000101400100000d5cb30340ffff0718880a00001001'
2020-10-08 09:37:32 DEBUG (MainThread) [zigpy.zcl] [0x4003:1:0x0006] ZCL deserialize: <ZCLHeader frame_control=<FrameControl frame_type=GLOBAL_COMMAND manufacturer_specific=False is_reply=False disable_default_response=True> manufacturer=None tsn=136 command_id=Command.Report_Attributes>
2020-10-08 09:37:32 DEBUG (MainThread) [zigpy.zcl] [0x4003:1:0x0006] ZCL request 0x000a: [[Attribute(attrid=0, value=<TypeValue type=Bool, value=Bool.true>)]]
2020-10-08 09:37:32 DEBUG (MainThread) [zigpy.zcl] [0x4003:1:0x0006] Attribute report received: on_off=1
2020-10-08 09:37:32 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame 69 (incomingMessageHandler) received: b'00040106000101400100000d5cb30340ffff0718880a00001001'
2020-10-08 09:37:32 DEBUG (MainThread) [zigpy.zcl] [0x4003:1:0x0006] ZCL deserialize: <ZCLHeader frame_control=<FrameControl frame_type=GLOBAL_COMMAND manufacturer_specific=False is_reply=False disable_default_response=True> manufacturer=None tsn=136 command_id=Command.Report_Attributes>
2020-10-08 09:37:32 DEBUG (MainThread) [zigpy.zcl] [0x4003:1:0x0006] ZCL request 0x000a: [[Attribute(attrid=0, value=<TypeValue type=Bool, value=Bool.true>)]]
2020-10-08 09:37:32 DEBUG (MainThread) [zigpy.zcl] [0x4003:1:0x0006] Attribute report received: on_off=1
2020-10-08 09:37:32 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame 69 (incomingMessageHandler) received: b'00040106000101400100000d5cb30340ffff0718880a00001001'
2020-10-08 09:37:32 DEBUG (MainThread) [zigpy.zcl] [0x4003:1:0x0006] ZCL deserialize: <ZCLHeader frame_control=<FrameControl frame_type=GLOBAL_COMMAND manufacturer_specific=False is_reply=False disable_default_response=True> manufacturer=None tsn=136 command_id=Command.Report_Attributes>
2020-10-08 09:37:32 DEBUG (MainThread) [zigpy.zcl] [0x4003:1:0x0006] ZCL request 0x000a: [[Attribute(attrid=0, value=<TypeValue type=Bool, value=Bool.true>)]]
2020-10-08 09:37:32 DEBUG (MainThread) [zigpy.zcl] [0x4003:1:0x0006] Attribute report received: on_off=1
2020-10-08 09:37:32 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame 69 (incomingMessageHandler) received: b'00040106000101400100000d5cb30340ffff0718880a00001001'
2020-10-08 09:37:32 DEBUG (MainThread) [zigpy.zcl] [0x4003:1:0x0006] ZCL deserialize: <ZCLHeader frame_control=<FrameControl frame_type=GLOBAL_COMMAND manufacturer_specific=False is_reply=False disable_default_response=True> manufacturer=None tsn=136 command_id=Command.Report_Attributes>
2020-10-08 09:37:32 DEBUG (MainThread) [zigpy.zcl] [0x4003:1:0x0006] ZCL request 0x000a: [[Attribute(attrid=0, value=<TypeValue type=Bool, value=Bool.true>)]]
2020-10-08 09:37:32 DEBUG (MainThread) [zigpy.zcl] [0x4003:1:0x0006] Attribute report received: on_off=1
2020-10-08 09:37:32 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2020-10-08 09:37:32 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2020-10-08 09:37:32 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2020-10-08 09:37:32 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2020-10-08 09:37:32 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2020-10-08 09:37:32 DEBUG (bellows.thread_0) [bellows.uart] Closed serial connection
2020-10-08 09:37:33 DEBUG (MainThread) [bellows.ezsp] Resetting EZSP
smmoroz commented 4 years ago

@Adminiuga , let me update you on this issue.

I played a lot with home-assistant_v2.db. I definitely confirm that while the file is below approx. 400Mb, there are almost no NCP resets. As the size of the file approaches 400Mb the resets start to appear in the log.

Unfortunately, I didn't yet find a way to reliably keep the size of the file below 400Mb. I have already removed most of my sensors from recorder. I configured purge automatically and keep only 7 last days. But still, I have to manually repack the DB from time to time.

Another observation. My logger is still in zha: debug. That is why the log file sometimes becomes huge. If the size is >200-300 Mb the whole system slows down and NCP often resets.

Does it mean that RPi4 is not powerful enough for my setup? Or is it sill the power supply issue?

Finally, the new power supply has just arrived. So, now I will test it.

Adminiuga commented 4 years ago

Try 0.117.2 to see if it gets better, as it has some changes in zigpy to not stall the loop.

Regarding pi4, i think the sd card is the real bottle neck. Running my prod on Odroid xu4 with eMMC and have no resets with full zha debug logging and db file i think is about 500mb, don't remember.

There are some sql queries you could run on home assistant db to see which top entity and domain which has most entries and decide whether you need to keep those or not.

select count(state_id) as count, entity_id from states group by entity_id order by count asc;

domains:

select count(state_id) as count, domain from states group by domain order by count asc;

Adminiuga commented 4 years ago

But try 0.117.2 first

Adminiuga commented 4 years ago

btw, right now my home-assistant_v2.db have stabilized around 450MB with 14 days of data kept.

smmoroz commented 4 years ago

I tested 0.117.4. Currently home-assistant_v2.db is about 420MB.

The error is still here.

2020-11-05 13:27:55 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'0b6bb157546f15b6589c4a24ab1593499c7c1b0029c39874fade2083ed7e1f2792be7e'
2020-11-05 13:27:55 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'8160597e'
2020-11-05 13:27:55 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'1b6bb157546f15b658924a24ab1593499c7d1b0029c39874fade2783fc7e2fa79fe87e'
2020-11-05 13:27:55 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'82503a7e'
2020-11-05 13:27:55 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame 69 (incomingMessageHandler) received: b'0004010600010140010000333cabc40dffff0718440a00001000'
2020-11-05 13:27:55 DEBUG (MainThread) [bellows.zigbee.application] Received incomingMessageHandler frame with [<EmberIncomingMessageType.INCOMING_UNICAST: 0>, EmberApsFrame(profileId=260, clusterId=6, sourceEndpoint=1, destinationEndpoint=1, opti
ons=<EmberApsOption.APS_OPTION_ENABLE_ROUTE_DISCOVERY|APS_OPTION_RETRY: 320>, groupId=0, sequence=51), 60, -85, 0x0dc4, 255, 255, b'\x18D\n\x00\x00\x10\x00']
2020-11-05 13:27:55 DEBUG (MainThread) [zigpy.zcl] [0x0dc4:1:0x0006] ZCL deserialize: <ZCLHeader frame_control=<FrameControl frame_type=GLOBAL_COMMAND manufacturer_specific=False is_reply=False disable_default_response=True> manufacturer=None tsn=
68 command_id=Command.Report_Attributes>
2020-11-05 13:27:55 DEBUG (MainThread) [zigpy.zcl] [0x0dc4:1:0x0006] ZCL request 0x000a: [[Attribute(attrid=0, value=<TypeValue type=Bool, value=Bool.false>)]]
2020-11-05 13:27:55 DEBUG (MainThread) [zigpy.zcl] [0x0dc4:1:0x0006] Attribute report received: on_off=0
2020-11-05 13:27:55 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame 69 (incomingMessageHandler) received: b'0004010800010140010000323cabc40dffff0718430a11002080'
2020-11-05 13:27:55 DEBUG (MainThread) [bellows.zigbee.application] Received incomingMessageHandler frame with [<EmberIncomingMessageType.INCOMING_UNICAST: 0>, EmberApsFrame(profileId=260, clusterId=8, sourceEndpoint=1, destinationEndpoint=1, opti
ons=<EmberApsOption.APS_OPTION_ENABLE_ROUTE_DISCOVERY|APS_OPTION_RETRY: 320>, groupId=0, sequence=50), 60, -85, 0x0dc4, 255, 255, b'\x18C\n\x11\x00 \x80']
2020-11-05 13:27:55 DEBUG (MainThread) [zigpy.zcl] [0x0dc4:1:0x0008] ZCL deserialize: <ZCLHeader frame_control=<FrameControl frame_type=GLOBAL_COMMAND manufacturer_specific=False is_reply=False disable_default_response=True> manufacturer=None tsn=
67 command_id=Command.Report_Attributes>
2020-11-05 13:27:55 DEBUG (MainThread) [zigpy.zcl] [0x0dc4:1:0x0008] ZCL request 0x000a: [[Attribute(attrid=17, value=<TypeValue type=uint8_t, value=128>)]]
2020-11-05 13:27:55 DEBUG (MainThread) [zigpy.zcl] [0x0dc4:1:0x0008] Attribute report received: on_level=128
2020-11-05 13:27:55 DEBUG (bellows.thread_0) [bellows.uart] Error code: ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT, Version: 2, frame: b'c20251a8bd7e'
2020-11-05 13:27:55 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame 69 (incomingMessageHandler) received: b'0004010600010140010000333cabc40dffff0718440a00001000'
2020-11-05 13:27:55 DEBUG (MainThread) [bellows.zigbee.application] Received incomingMessageHandler frame with [<EmberIncomingMessageType.INCOMING_UNICAST: 0>, EmberApsFrame(profileId=260, clusterId=6, sourceEndpoint=1, destinationEndpoint=1, opti
ons=<EmberApsOption.APS_OPTION_ENABLE_ROUTE_DISCOVERY|APS_OPTION_RETRY: 320>, groupId=0, sequence=51), 60, -85, 0x0dc4, 255, 255, b'\x18D\n\x00\x00\x10\x00']
2020-11-05 13:27:55 DEBUG (MainThread) [zigpy.zcl] [0x0dc4:1:0x0006] ZCL deserialize: <ZCLHeader frame_control=<FrameControl frame_type=GLOBAL_COMMAND manufacturer_specific=False is_reply=False disable_default_response=True> manufacturer=None tsn=
68 command_id=Command.Report_Attributes>
2020-11-05 13:27:55 DEBUG (MainThread) [zigpy.zcl] [0x0dc4:1:0x0006] ZCL request 0x000a: [[Attribute(attrid=0, value=<TypeValue type=Bool, value=Bool.false>)]]
2020-11-05 13:27:55 DEBUG (MainThread) [zigpy.zcl] [0x0dc4:1:0x0006] Attribute report received: on_off=0
2020-11-05 13:27:55 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2020-11-05 13:27:55 DEBUG (MainThread) [bellows.zigbee.application] Received _reset_controller_application frame with (<NcpResetCode.ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT: 81>,)
2020-11-05 13:27:55 DEBUG (MainThread) [bellows.zigbee.application] Resetting ControllerApplication. Cause: 'NcpResetCode.ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT'
2020-11-05 13:39:31 DEBUG (bellows.thread_0) [bellows.uart] Error code: ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT, Version: 2, frame: b'c20251a8bd7e'
2020-11-05 13:39:31 DEBUG (bellows.thread_0) [bellows.uart] Error code: ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT, Version: 2, frame: b'c20251a8bd7e'
2020-11-05 13:39:31 DEBUG (bellows.thread_0) [bellows.uart] Error code: ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT, Version: 2, frame: b'c20251a8bd7e'
2020-11-05 13:39:31 DEBUG (bellows.thread_0) [bellows.uart] Error code: ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT, Version: 2, frame: b'c20251a8bd7e'
2020-11-05 13:39:31 DEBUG (bellows.thread_0) [bellows.uart] Error code: ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT, Version: 2, frame: b'c20251a8bd7e'
2020-11-05 13:39:31 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame 69 (incomingMessageHandler) received: b'0004010504010100010000663cab4372ffff08182b0a000021b012'
2020-11-05 13:39:31 DEBUG (MainThread) [bellows.zigbee.application] Received incomingMessageHandler frame with [<EmberIncomingMessageType.INCOMING_UNICAST: 0>, EmberApsFrame(profileId=260, clusterId=1029, sourceEndpoint=1, destinationEndpoint=1, o
ptions=<EmberApsOption.APS_OPTION_ENABLE_ROUTE_DISCOVERY: 256>, groupId=0, sequence=102), 60, -85, 0x7243, 255, 255, b'\x18+\n\x00\x00!\xb0\x12']
2020-11-05 13:39:31 DEBUG (MainThread) [zigpy.zcl] [0x7243:1:0x0405] ZCL deserialize: <ZCLHeader frame_control=<FrameControl frame_type=GLOBAL_COMMAND manufacturer_specific=False is_reply=False disable_default_response=True> manufacturer=None tsn=
43 command_id=Command.Report_Attributes>
2020-11-05 13:39:31 DEBUG (MainThread) [zigpy.zcl] [0x7243:1:0x0405] ZCL request 0x000a: [[Attribute(attrid=0, value=<TypeValue type=uint16_t, value=4784>)]]
2020-11-05 13:39:31 DEBUG (MainThread) [zigpy.zcl] [0x7243:1:0x0405] Attribute report received: measured_value=4784
2020-11-05 13:39:31 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2020-11-05 13:39:31 DEBUG (MainThread) [bellows.zigbee.application] Received _reset_controller_application frame with (<NcpResetCode.ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT: 81>,)
2020-11-05 13:39:31 DEBUG (MainThread) [bellows.zigbee.application] Resetting ControllerApplication. Cause: 'NcpResetCode.ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT'
2020-11-05 13:39:31 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2020-11-05 13:39:31 DEBUG (MainThread) [bellows.zigbee.application] Received _reset_controller_application frame with (<NcpResetCode.ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT: 81>,)
2020-11-05 13:39:31 DEBUG (MainThread) [bellows.zigbee.application] Resetting ControllerApplication. Cause: 'NcpResetCode.ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT'
2020-11-05 13:39:31 DEBUG (MainThread) [bellows.zigbee.application] Preempting ControllerApplication reset
2020-11-05 13:39:31 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2020-11-05 13:39:31 DEBUG (MainThread) [bellows.zigbee.application] Received _reset_controller_application frame with (<NcpResetCode.ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT: 81>,)
2020-11-05 13:39:31 DEBUG (MainThread) [bellows.zigbee.application] Resetting ControllerApplication. Cause: 'NcpResetCode.ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT'
2020-11-05 13:39:31 DEBUG (MainThread) [bellows.zigbee.application] Preempting ControllerApplication reset
2020-11-05 13:39:31 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2020-11-05 13:39:31 DEBUG (MainThread) [bellows.zigbee.application] Received _reset_controller_application frame with (<NcpResetCode.ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT: 81>,)
2020-11-05 13:39:31 DEBUG (bellows.thread_0) [bellows.uart] Closed serial connection
2020-11-05 13:39:31 DEBUG (MainThread) [bellows.zigbee.application] Resetting ControllerApplication. Cause: 'NcpResetCode.ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT'
2020-11-05 13:39:31 DEBUG (MainThread) [bellows.zigbee.application] Preempting ControllerApplication reset
2020-11-05 13:39:31 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2020-11-05 13:39:31 DEBUG (MainThread) [bellows.zigbee.application] Received _reset_controller_application frame with (<NcpResetCode.ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT: 81>,)
2020-11-05 13:39:31 DEBUG (MainThread) [bellows.zigbee.application] Resetting ControllerApplication. Cause: 'NcpResetCode.ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT'
2020-11-05 13:39:31 DEBUG (MainThread) [bellows.zigbee.application] Preempting ControllerApplication reset
2020-11-05 13:39:31 DEBUG (MainThread) [bellows.ezsp] Couldn't send command nop(()). EZSP is not running

And after that endless RESET_ASSERT

2020-11-05 16:05:59 DEBUG (bellows.thread_0) [bellows.uart] Error code: RESET_ASSERT, Version: 2, frame: b'c2020682af7e'
2020-11-05 16:05:59 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart

Like this

2020-11-05 16:06:20 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame 63 (messageSentHandler) received: b'0003400401040b010140000000f5a46600'
2020-11-05 16:06:20 DEBUG (MainThread) [bellows.zigbee.application] Received messageSentHandler frame with [<EmberOutgoingMessageType.OUTGOING_DIRECT: 0>, 16387, EmberApsFrame(profileId=260, clusterId=2820, sourceEndpoint=1, destinationEndpoint=1, options=<EmberApsOption.APS_OPTION_RETRY: 64>, groupId=0, sequence=245), 164, <EmberStatus.DELIVERY_FAILED: 102>, b'']
2020-11-05 16:06:20 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame 63 (messageSentHandler) received: b'0003400401040b010140000000f5a46600'
2020-11-05 16:06:20 DEBUG (MainThread) [bellows.zigbee.application] Received messageSentHandler frame with [<EmberOutgoingMessageType.OUTGOING_DIRECT: 0>, 16387, EmberApsFrame(profileId=260, clusterId=2820, sourceEndpoint=1, destinationEndpoint=1, options=<EmberApsOption.APS_OPTION_RETRY: 64>, groupId=0, sequence=245), 164, <EmberStatus.DELIVERY_FAILED: 102>, b'']
2020-11-05 16:06:20 DEBUG (MainThread) [bellows.zigbee.application] Invalid state on future for message tag 164 - probably duplicate response: invalid state
2020-11-05 16:06:20 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame 63 (messageSentHandler) received: b'0003400401040b010140000000f5a46600'
2020-11-05 16:06:20 DEBUG (MainThread) [bellows.zigbee.application] Received messageSentHandler frame with [<EmberOutgoingMessageType.OUTGOING_DIRECT: 0>, 16387, EmberApsFrame(profileId=260, clusterId=2820, sourceEndpoint=1, destinationEndpoint=1, options=<EmberApsOption.APS_OPTION_RETRY: 64>, groupId=0, sequence=245), 164, <EmberStatus.DELIVERY_FAILED: 102>, b'']
2020-11-05 16:06:20 DEBUG (MainThread) [bellows.zigbee.application] Invalid state on future for message tag 164 - probably duplicate response: invalid state
2020-11-05 16:06:20 DEBUG (MainThread) [zigpy.device] [0x4003] Delivery error for seq # 0xa3, on endpoint id 1 cluster 0x0b04: message send failure
2020-11-05 16:06:20 DEBUG (MainThread) [bellows.ezsp.protocol] Send command nop: ()
2020-11-05 16:06:20 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'046d2157542fbd457e'
2020-11-05 16:06:20 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'416da157542f496c7e'
2020-11-05 16:06:20 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'8520dd7e'
2020-11-05 16:06:20 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame 5 (nop) received: b''
2020-11-05 16:06:21 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'516db157546f10b2598b4a25aa5592499c4ed8abedce9874e747607a657e3da766d8de36217cc3db58c769938c0fadecfb2ea5371c7e'
2020-11-05 16:06:21 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'8610be7e'
2020-11-05 16:06:21 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame 69 (incomingMessageHandler) received: b'0500001f0000000000000000ff000000ffff1a8103f3990002008d150059ae8304008d15001fca2c04008d1500'
2020-11-05 16:06:21 DEBUG (MainThread) [bellows.zigbee.application] Received incomingMessageHandler frame with [<EmberIncomingMessageType.INCOMING_BROADCAST_LOOPBACK: 5>, EmberApsFrame(profileId=0, clusterId=31, sourceEndpoint=0, destinationEndpoint=0, options=<EmberApsOption.APS_OPTION_NONE: 0>, groupId=0, sequence=0), 255, 0, 0x0000, 255, 255, b'\x81\x03\xf3\x99\x00\x02\x00\x8d\x15\x00Y\xae\x83\x04\x00\x8d\x15\x00\x1f\xca,\x04\x00\x8d\x15\x00']
2020-11-05 16:06:21 DEBUG (MainThread) [zigpy.zdo] [0x0000:zdo] ZDO request ZDOCmd.Parent_annce: [[00:15:8d:00:02:00:99:f3, 00:15:8d:00:04:83:ae:59, 00:15:8d:00:04:2c:ca:1f]]
2020-11-05 16:06:21 DEBUG (MainThread) [zigpy.zdo] [0x0000:zdo] Unsupported ZDO request:ZDOCmd.Parent_annce
2020-11-05 16:06:21 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'616db1575415134ea6944a3aaa5592499c4e27abedce6736f47e'
2020-11-05 16:06:21 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'87009f7e'
2020-11-05 16:06:21 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame 63 (messageSentHandler) received: b'06fcff00001f0000000000000000000000'
2020-11-05 16:06:22 DEBUG (MainThread) [bellows.zigbee.application] Received messageSentHandler frame with [<EmberOutgoingMessageType.OUTGOING_BROADCAST: 6>, 65532, EmberApsFrame(profileId=0, clusterId=31, sourceEndpoint=0, destinationEndpoint=0, options=<EmberApsOption.APS_OPTION_NONE: 0>, groupId=0, sequence=0), 0, <EmberStatus.SUCCESS: 0>, b'']
2020-11-05 16:06:22 DEBUG (MainThread) [bellows.zigbee.application] Unexpected message send notification tag: 0
2020-11-05 16:06:23 DEBUG (bellows.thread_0) [bellows.uart] Error code: RESET_ASSERT, Version: 2, frame: b'c2020682af7e'
2020-11-05 16:06:23 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
Adminiuga commented 4 years ago
2020-11-05 16:06:23 DEBUG (bellows.thread_0) [bellows.uart] Error code: RESET_ASSERT, Version: 2, frame: b'c2020682af7e'
2020-11-05 16:06:23 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart

This error code Error code: RESET_ASSERT means NCP firmware is crashing on an assert condition But other than that system is behaving as designed.

As about why NCP throws

2020-11-05 13:27:55 DEBUG (bellows.thread_0) [bellows.uart] Error code: ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT, Version: 2, frame: b'c20251a8bd7e'

That remains a mystery till this time. Bellows ACKs every incoming message immediately in the same method it handles the incoming message and runs in its own thread. Mostly seen this happen on Pi2,3 and now 4. I've always attributed it to the slow SD card, as the problem was always exacerbated during heavy disk IO. If you have anything else beside the recorder doing disk IO (like a crazy idea of running influxdb on a SD card) this would cause same issues.

R5fan commented 3 years ago

I have the same problem. Cant believe it has anything to do with the database size or power supply, Im running HAOS on a nuc with an SSD. My zigbee hub is a tasmota flashed eachen hub. It worked fine for a few days, then when I rebooted the hub, HA will no longer reconnect and I get this error:

Logger: bellows.uart
Source: /usr/local/lib/python3.9/site-packages/bellows/uart.py:199
First occurred: 10:21:50 AM (4 occurrences)
Last logged: 10:24:25 AM

Lost serial connection: read failed: socket disconnected

In my tasmota console, I do see HA connecting to the TCP port.

MattWestb commented 3 years ago

Can you open tasmota gui from your NUC ? Its one different issue then you is using network connection and if tasmota console is not seen ZHA trying connecting its one network problem and not one "local hardware" like the original issue.

Easiest testing is using bellows CLI like this (updating your IP and port then this is for tuya ZBGW):

bellows -d socket://192.168.2.60:8888 info
R5fan commented 3 years ago

Yeah it wasnt a network connection issue (Im using ethernet also). It does appear to be a tasmota issue though, and I "solved" it. I changed the tasmota template to run zigbee2tasmota to see if my hub itself was still working (it was). When I switched back to the template that allows a serial connection, ZHA could connect to it. So Im guessing the problem is with tasmota.

MattWestb commented 3 years ago

Tasmota is normally working OK with ZHA but can being tricky getting it working all the way so i have "converting" to ESPHome on my ESP Zigbee coordinators then its easier setting up then have funding the right setting up the serial server but its one thing of "taste" :-))

Great that you was getting it working !!

R5fan commented 3 years ago

well the tasmota version that supports this hub is (I think?) a nightly build (9.5.0.9) or perhaps more likely its an issue with the EmberZNet firmware (Im using ncp-uart-nsw_6.7.9_115200.ota) ? Anyway, Ill keep relying on my ikea hub for now, and keep this hub for testing with just a few zigbee devices until I understand the problem better.

MattWestb commented 3 years ago

Then tasmota console was not seen any try ZHA connecting its not the Zigbee module firmware that is making problem and the 6.7.9 is very stable (but the Sonoff hardware is not) so tasmota have not setting up all OK. Tasmota versions or tasmota settings is / was the problem.

Always better moving little no critical lights and see that is working OK and then moving little more and not doing all at one time if getting problems !!

R5fan commented 3 years ago

Well, I had been running this hub for a few days with 2 or 3 zigbee devices and it worked fine. Then I added 20-ish devices, and it still worked fine. Until I unplugged it and plugged it in again, then I got this problem. I guess I need to put that on my list of things to test before I trust :) BTW, its not the sonoff bridge, its the eachen ewelink bridge, the one with ethernet.

MattWestb commented 3 years ago

Perhaps tasmota is not saving the setting OK in the ESP32 ? Try doing some rebooting and also power all off one time for being sure that all is working OK before start moving all to the new coordinator.

The eachen looks very nice but it still have one not so good made Zigbee module but perhaps they have getting better quality in the new devices then the original was having mush hardware problems.

pipiche38 commented 7 months ago

I have similar issue reported https://github.com/zigpy/zigpy/discussions/1378