Open allenporter opened 2 months ago
I have also seen this issue but have not had a good way to recreate or debug. Would be open to any suggestions or even some debug improvements to figure out the cause
My honest take is this code is quite complex and needs to be simplified. I would suggest:
_async_response
so that it doesn't overwrite an existing future if the request id is already in useasyncio.ensure_future
then a sync method _async_response
which calls an async method _wait_response
which directly awaits on a future. I don't think all these levels of indirection are neededMy honest take is this code is quite complex and needs to be simplified. I would suggest:
- making breaking changes to drop all the sync APIs and focus on simpler async APIs.
- reduce the class hierarhcy so there are not multiple base classes involved, and use composition rather than inheritance.
- adding logging for each outgoing request seq id and received resopnse seq id could help show the issue happening in more detail.
- add guards to not send a message with a request id that has already been allocated. One example of a potential bug is that the choosing of a request id can pick an id that already exists since its just picking random numbers.
- Add an explicit guard here when the future is already completed.
- Add a guard in
_async_response
so that it doesn't overwrite an existing future if the request id is already in use- There are cases where a response is received and it checks against the queue and pulls out the message, but then if it doesn't already exist it continues anyway. Why would this ever be allowed to happen? seems like a bug or malformed response so it also needs logging rather than being ignored since it could hide concurrency bugs
- Remove the multiple levels of futures happening between
asyncio.ensure_future
then a sync method_async_response
which calls an async method_wait_response
which directly awaits on a future. I don't think all these levels of indirection are needed- Don't pass around tuples of (response, exception). Just use the future built in exceptions
That's probably all fair. It's just kind of been the nature of constantly reverse engineering and learning about the underlying api and most improvements being iterative opposed to a clean rewrite.
For instance, at first we didn't know how to do local api, then we figured that out. The same thing with the drastically different A01 api vs the V1 api. Things like that where slowly changes were made and the code got more and more complex.
I don't know the exact steps to reproduce this, but noticed this while changing a device settings in HomeAssistant. It seems like this is an internal consistency issue and not something that that the caller should be able to introduce, but i'm not positive.
I believe this happened when i adjusted the same property two times in a row in a short succession (with a short pause between)