jamesmunns / postcard-rpc

An RPC layer for postcard based protocols
Apache License 2.0
90 stars 21 forks source link

Handle MCU Disconnections #39

Closed maxgr0 closed 4 months ago

maxgr0 commented 4 months ago

Currently, the application needs to be pretty much restarted when the MCU disconnects while the app is running:

2024-07-07T18:27:56.683774Z  INFO nusb::platform::macos_iokit::transfer: Completion callback for transfer 0x14e4043f0, status=e00002eb, len=0    
2024-07-07T18:27:56.683785Z  INFO nusb::platform::macos_iokit::transfer: Cancelled all transfers on endpoint 81. status=0    
2024-07-07T18:27:56.683798Z  INFO postcard_rpc::host_client::raw_nusb: Cancelled all in-flight requests
2024-07-07T18:27:56.683814Z  INFO postcard_rpc::host_client::raw_nusb: Drain state: Err(Cancelled)
2024-07-07T18:27:56.683827Z  INFO postcard_rpc::host_client::raw_nusb: Drain state: Err(Cancelled)
2024-07-07T18:27:56.683842Z  INFO postcard_rpc::host_client::raw_nusb: Drain state: Err(Cancelled)
2024-07-07T18:27:56.685641Z ERROR postcard_rpc::host_client::raw_nusb: Failed to clear stall: Os { code: -536850432, kind: Uncategorized, message: "Unknown error: -536850432" }, Fatal.
2024-07-07T18:27:56.685703Z ERROR postcard_rpc::host_client::raw_nusb: Fatal Error, exiting
2024-07-07T18:27:56.685714Z  WARN postcard_rpc::host_client::util: in_worker: wire receive error, exiting
2024-07-07T18:28:00.161085Z  INFO nusb::platform::macos_iokit::transfer: Submitted OUT transfer 0x14e704580 on endpoint 01    
2024-07-07T18:28:00.161105Z ERROR nusb::platform::macos_iokit::transfer: Failed to submit transfer on endpoint 1: e00002c0    
2024-07-07T18:28:00.161124Z ERROR postcard_rpc::host_client::raw_nusb: Output Queue Error: Disconnected
2024-07-07T18:28:00.161137Z ERROR postcard_rpc::host_client::util: Output Queue Error: Transfer(Disconnected), exiting
2024-07-07T18:28:00.161255Z  INFO nusb::platform::macos_iokit::events: event loop thread exited

If still sending a request to the MCU using e.g. client.send_resp, it does not fail directly when the MCU is disconnected rather waits forever.

Are there any (recommended) ways to recover from this case? Some ideas I've had:

jamesmunns commented 4 months ago

Yep, open to PRs for this! We could add something like a CancellationToken that could be used to halt the whole HostClient machinery if the wire interface fails.

maxgr0 commented 4 months ago

Yep, open to PRs for this! We could add something like a CancellationToken that could be used to halt the whole HostClient machinery if the wire interface fails.

Good idea! Another quite simple solution could be to not stop the workers when an error comes but rather wait for x seconds and then try again to receive/send something? Timeout is anyways up to the user from what I've read in the code so this would make the behavior quite consistent and we would even not need a CancellationToken. Wdyt?