wasmCloud / wasmcloud-otp

wasmCloud host runtime that leverages Elixir/OTP and Rust to provide simple, secure, distributed application development using the actor model
Apache License 2.0
228 stars 48 forks source link

[BUG] Host supervision tree can crash if NATS connection drops in the middle of a control request #624

Closed brooksmtownsend closed 1 year ago

brooksmtownsend commented 1 year ago

Describe the bug

When we receive control interface requests, we use the built-in mechanism {:reply, msg} to reply to that request. If a NATS server were to go down in the middle of handling a request, the host supervision tree would crash and burn when trying to Gnat.pub on the way out.

We should use the HostCore.Nats.safe_pub function to publish the reply out on the control connection. Each usage of the {:reply, msg} tuple in host_server and lattice_server should use the safe_pub function instead, and the final line of the function should instead be :ok so that Gnat doesn't send a reply on its own.

To Reproduce

Steps to reproduce the behavior:

  1. Run a host with a separate NATS server
  2. Send a control interface request
  3. Kill the NATS server quickly

Expected behavior

I expect the host to be able to handle intermittent NATS connections, and if NATS goes down in the middle of handling a request then the host should be resilient and simply fail to publish the reply.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this has been closed too eagerly, please feel free to tag a maintainer so we can keep working on the issue. Thank you for contributing to wasmCloud!