tube0013 / tube_gateways

Information and Documentation on Tube's Zigbee Gateways
261 stars 51 forks source link

Failed to read from client 192.168.10.153 with error 128! #172

Closed pugmandan closed 9 months ago

pugmandan commented 9 months ago

12:45:01 | [E] | [stream_server:145] | Failed to write to client 192.168.10.153 with error 128! 12:45:01 | [W] | [stream_server:168] | Failed to read from client 192.168.10.153 with error 128!

Getting the above error in my tubeszb-cc2652p7-poe-2023 after it had been running for a number of hours. I confirmed that the client (zigbee2mqtt) was still operational (and showing no issues on its end or in its logs). I also operated a Zigbee device (turned light on and off) and it worked without delay.

The tubeszb logs was creating 10's of new log lines per second. I suspect this has been the cause of the stick crashing as it just logs endlessly until I suspect it runs out of memory.

Is there any information as to what error 128 means please? I have tried google with no success. Furthermore both the stick and zigbee2mqtt were both accessible so there should not have been any errors.

pugmandan commented 9 months ago

Has happened again following a restart of the stick. Screenshot attached.

image

pugmandan commented 9 months ago

There was an infrequent message which was hard to capture (due to the scrolling nature of the log but I have captured it:

14:54:14 | [E] | [stream_server:145] | Failed to write to client 192.168.10.153 with error 128! 14:54:14 | [W] | [stream_server:168] | Failed to read from client 192.168.10.153 with error 128! 14:54:14 | [E] | [stream_server:104] | Incoming bytes available, but outgoing buffer is full: stream will be corrupted!

tube0013 commented 9 months ago

128 seems to be the buffer size. I just pushed up the p7 esphome config,

can you try compiling a version with the buffer size under the stream component at a higher number - you can uncomment: https://github.com/tube0013/tube_gateways/blob/8abec1cb5145d5b881602a0a5a9f16f0c2db8f44/models/current/tubeszb-cc2652-P7-poe-2023/firmware/esphome/tubeszb-cc2652p7-poe-2023.yaml#L109 Line 109

If you add this config to ESPHome you should be able to push the fw to the device over the network, otherwise you will need to download and manually flash the Legacy binary with esphomeflasher over seria/usb (with no PoE connected)

I've had 2 testers of the p7 using it for several months with no reported issues, and this is the first I'm hearing of this one, so appreciate paitence in sorting it out.

the P7 is using a ESPHome binary built with the esp-idf framework for lower overhead and I've seen faster performance when resetting nvram for example - it takes about 50% less time. It currently does not support Web-OTA fw installs. I also moved back to the current Oxan Stream Server - whare you can read more about the buffer size config: https://github.com/oxan/esphome-stream-server/tree/master#advanced

Thanks

pugmandan commented 9 months ago

Thanks and no apology needed at all - I want the best and I know this is it. If anything a bit of bug fixing along the way makes the end result all the more rewarding.

I've reflashed as suggested with Line 109 uncommented. I'll leave this issue open for a day as for certain, if it hasn't crashed in that period, I'd consider the issue to be closed.

pugmandan commented 9 months ago

@tube0013 - I'm sorry to say the same error has occurred - despite reflashing EspHome with thebuffer_size of 2048.

The error message has not changed:

18:20:38 | [E] | [stream_server:145] | Failed to write to client 192.168.10.153 with error 128! 18:20:38 | [W] | [stream_server:168] | Failed to read from client 192.168.10.153 with error 128!

I've attached the ESPHome code (directly copied from my ESPHome instance) to evidence I am not fat fingering anything

config.yaml.txt

tube0013 commented 9 months ago

Is the ip in the error the z2m host or the coordinator ip?

Thanks

pugmandan commented 9 months ago

Is the ip in the error the z2m host or the coordinator ip?

Thanks

The ip is the z2m host - the coordinator ip is 192.168.10.156

Interestingly, no signs of anything going wrong in the z2m host logs and when the errors are logging with the coordinator, it is possible to continue operating the zigbee devices via z2m

pugmandan commented 9 months ago

Some additional error code lines which I have not been able to see before:

image

10:58:53 | [E] | [stream_server:104] | Incoming bytes available, but outgoing buffer is full: stream will be corrupted! 10:58:53 | [W] | [stream_server:109] | Dropped 19 pending bytes for client 192.168.10.153

This is the relevant part from the Z2M logs - no other errors or warnings are in the Z2M logs: error 2023-11-24 10:55:49: Adapter disconnected, stopping info 2023-11-24 10:55:49: MQTT publish: topic 'zigbee2mqtt/bridge/state', payload '{"state":"offline"}' info 2023-11-24 10:55:49: Disconnecting from MQTT server info 2023-11-24 10:55:49: Stopping zigbee-herdsman... error 2023-11-24 10:55:49: Failed to stop Zigbee2MQTT

pugmandan commented 9 months ago

@tube0013 - something I noticed when reviewing the ESPHome yaml is that you are currently referencing (lines 18-19):

external_components:
  - source: github://oxan/esphome-stream-server

I noticed on an issue thread in that git that you remarked it is very unreliable with ESPHome > 2021.9

Is it possible to adjust your code to use the fork https://github.com/tube0013/esphome-stream-server-v2?

I have attempted but am getting compile errors so far

tube0013 commented 9 months ago

Yeah, so what happened was the Oxan component was originally used, then it became a bit unreliable with a esphome release after 2021.9. Oxan went quiet with no updates for like a year maybe longer. I hired a developer to help me fork it and get it reliable again. A few months ago Oxan came back with a big update. So I've been tracking that as I honestly don't want to be maintaining the fork if I don't have too. If you want to try my fork I'll send a yaml in a bit, tomorrow at the latest - or you could look at the cc2652p2 2023 Poe yaml for how to configure it yourself.

pugmandan commented 9 months ago

If you could look at it tomorrow I'd appreciate it. I've attempted to swap in the relevant parts of the code but am getting a compile error due to the dashboard_import: element not being happy.

Thanks again for your support with this

tube0013 commented 9 months ago

tubeszb-cc2652p7-poe-2023_esp-idf.zip tubeszb-cc2652p7-poe-2023_arduino.zip

attached are 2 versions using the esphome-streamer from my repo, one is built on the esp-idf framework and the other arduino. Please let me know if these work with out any errors like you were seeing before and are more stable. Included in each zip is the .yaml and a binary compiled on 2023.11.3

pugmandan commented 9 months ago

Great, thank you @tube0013 - I'll loaded the arduino version first.

Something I have noticed from looking at the data being collected in Home Assistant is that the error seems to be happening every 2 hours like clockwork. I've used the 'TubesZB Serial Connected' sensor as the source of data and have an automation that is in place that when 'TubesZB Serial Connected' becomes unavailable reboot it.

TubesZB Serial Connected was connected - 14:15:46 - 2 hours ago
TubesZB Serial Connected was disconnected - 14:15:46 - 2 hours ago
TubesZB Serial Connected became unavailable - 14:15:38 - 2 hours ago
TubesZB Serial Connected was connected - 12:15:43 - 4 hours ago
TubesZB Serial Connected became unavailable - 12:15:33 - 4 hours ago
TubesZB Serial Connected was connected - 10:15:37 - 6 hours ago
TubesZB Serial Connected became unavailable - 10:15:28 - 6 hours ago
TubesZB Serial Connected was connected - 08:15:34 - 8 hours ago
TubesZB Serial Connected became unavailable - 08:15:23 - 8 hours ago
TubesZB Serial Connected was connected - 06:15:28 - 10 hours ago
TubesZB Serial Connected became unavailable - 06:15:18 - 10 hours ago
TubesZB Serial Connected was connected - 04:15:24 - 12 hours ago
TubesZB Serial Connected became unavailable - 04:15:13 - 12 hours ago
TubesZB Serial Connected was connected - 02:15:18 - 14 hours ago
TubesZB Serial Connected was disconnected - 02:15:14 - 14 hours ago
TubesZB Serial Connected became unavailable - 02:15:09 - 14 hours ago
TubesZB Serial Connected was connected - 00:15:14 - 16 hours ago
TubesZB Serial Connected became unavailable - 00:15:04 - 16 hours ago

I'll feedback on the two bits of code you've sent over tomorrow as I'll need to run each one for at least 2 hours to see if the same behaviour is happening.

tube0013 commented 9 months ago

What kind of network router are you using? the 2 hours like clockwork, seems to be a dhcp lease type issue.

I had an email support issue similar to this and it was solved by uploading a binary with s static IP:

I figured out my issue - DHCP. Totally odd z2m was losing the connection every 2 hrs which was the DHCP lease time on my IOT vlan. Looking at the logs in pfSense there was a series of DHCP request DHCP ack messages for the static address I had set repeating 20+ times and in that time z2m lost the connection before the address finally settled down.

I set a fixed address in the config in ESP Home and pushed a new build and it's been rock solid since. No idea why it might have been behaving like that but figure I'd let you know in case it shows up again.

pugmandan commented 9 months ago

@tube0013 I previously had configured a static IP address from within pfsense but had not adjusted the esphome config to reference this.

I can confirm making this change resolved the issue. My apologies for wasting your time on this.

I've had 16 hours uninterrupted connection now so feel very confident the issue has been resolved.

tube0013 commented 9 months ago

No worries! Glad it's sorted. I think I'm going to figure to add a note for pfsense users to set a static ip in the ESPHome fw.

OBoudreaux commented 7 months ago

I've got the same issue but I'm running a Unifi Edgerouter X. Coordinator and host both have static IP's. Seeing the same warning and error. Is setting the static IP and flashing this still the preferred fix?

tube0013 commented 7 months ago

@OBoudreaux if you have already flashed a firmware with static IP you should be good. if still experiencing issues please open another issue. thanks!