Closed puddly closed 3 weeks ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 99.72%. Comparing base (
09cf7ce
) to head (e000846
).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
2 hours in this is working well, no more NCP failures.
Root cause of https://github.com/home-assistant/core/issues/119424.
During device joining, zigpy cancels a scheduled initialization task if the device re-joins during initialization (this is pretty common). Unfortunately, this cancellation propagates all the way down to the ASH sending task, causing the TX sequence number to increment without waiting for an acknowledgement. There is currently a firmware bug with EmberZNet and while ASH can support multiple pending frames at a time, in reality the stack crashes if the number is greater than one 😄.
This fix prevents an ASH send from being cancelled by using
asyncio.shield
, which schedules it in a task. Incrementing the TX sequence number after a frame has been sent will have similar issues because our last send may not have been ACKed. An alternative to this approach would be to avoid using coroutines entirely for sending and make the ASH protocol implementation rely on event loop callbacks for timeouts.