LiveView causes infinite reload loop in browser when an Ecto error is raised in connected handle_params

probably-not commented 3 months ago

Environment

Elixir version (elixir -v): 1.16.3
Phoenix version (mix deps): 1.7.14
Phoenix LiveView version (mix deps): 0.20.17
Operating system: macOS 14.5
Browsers you attempted to reproduce this bug on (the more the merrier): Chrome
Does the problem persist after removing "assets/node_modules" and trying again? Yes/no: Yes (no node_modules installed)

Actual behavior

When an Ecto exception is raised within a connected?(socket) block in handle_params, LiveView goes into an infinite loop of full page browser reloads. The exception is not logged at all, and the reload loop is constant, not using any sort of backoff or retries, the entire browser page reloads over and over in an endless loop, with no logs.

This seems to happen specifically on Ecto errors, I assume due to how Ecto errors may be handled within Phoenix and LiveView?

This is a reproduction using PhoenixPlayground, you can change the different modules to see different behaviors.

Mix.install([
  {:ecto_psql_extras, "~> 0.6"},
  {:ecto_sql, "~> 3.10"},
  {:phoenix_ecto, "~> 4.4"},
  {:phoenix_playground, "~> 0.1.0"}
], force: true)

defmodule Example do
  use Ecto.Schema

  @derive Jason.Encoder
  @primary_key {:id, Ecto.UUID, autogenerate: false}
  schema "example" do
    timestamps()
  end
end

# The InfiniteReloadLoop module will cause the browser to go into an uncontrollable
# loop of reloads, all while hiding logs of the actual exception. Nowhere in the logs
# does the exception appear, however the browser gets stuck in a page reload loop.
defmodule InfiniteReloadLoop do
  use Phoenix.LiveView

  def mount(_params, _session, socket) do
    {:ok, assign(socket, count: 0)}
  end

  def handle_params(_params, _uri, socket) do
    if connected?(socket) do
      raise Ecto.NoResultsError, queryable: Example
    end
    {:noreply, socket}
  end

  def render(assigns) do
    ~H"""
    """
  end
end

# The NotFoundErrorShown module will show Phoenix Ecto's error page for the "Not Found" exception.
# The exception will be logged in the apps logs as a debug log.
defmodule NotFoundErrorShown do
  use Phoenix.LiveView

  def mount(_params, _session, socket) do
    {:ok, assign(socket, count: 0)}
  end

  def handle_params(_params, _uri, socket) do
    raise Ecto.NoResultsError, queryable: Example
    {:noreply, socket}
  end

  def render(assigns) do
    ~H"""
    """
  end
end

# The NormalExceptionShowsCrash module will show the first render (from the initial dead view call),
# and then subsequently show a crash in the logs. It will periodically try to reconnect, and each time
# the crash in the logs will be shown.
defmodule NormalExceptionShowsCrash do
  use Phoenix.LiveView

  def mount(_params, _session, socket) do
    {:ok, assign(socket, count: 0)}
  end

  def handle_params(_params, _uri, socket) do
    if connected?(socket) do
      raise "Whoops"
    end
    {:noreply, socket}
  end

  def render(assigns) do
    ~H"""
    """
  end
end

# Set PhoenixPlayground to start a specific LV module based on which behavior you want to check.
PhoenixPlayground.start(live: InfiniteReloadLoop)
# PhoenixPlayground.start(live: NotFoundErrorShown)
# PhoenixPlayground.start(live: NormalExceptionShowsCrash)

Expected behavior

Other exception types don't cause this infinite full page reload loop (as can be seen if you try using the NormalExceptionShowsCrash module in the playground script above), they instead seem to retry mounting on a retry with some sort of timer (console logs show an error that the join failed and eventually there is a retry). Additionally, app logs show the crash and error reason.

I would expect all exceptions to behave similarly, no full page reloads in an infinite loop, and crash logs being shown so that we can know why the app is in an infinite reload loop.

SteffenDE commented 3 months ago

The code that causes the reload is here:

https://github.com/phoenixframework/phoenix_live_view/blob/a86683a66f882ab21c8bd654a2d5e41eb805786d/lib/phoenix_live_view/channel.ex#L1181-L1188

LV rescues the exception and sends a reload to trigger a new dead mount. This is consistent to what is described in

https://hexdocs.pm/phoenix_live_view/error-handling.html#exceptions-during-connected-mount

We should probably log the exception and not swallow it. I think we could also add a counter + jitter failsafe mechanism to https://github.com/phoenixframework/phoenix_live_view/blob/2bb4564ec872e6e544b997a0f38151c58b696827/assets/js/phoenix_live_view/view.js#L790-L794

probably-not commented 3 months ago

@SteffenDE The docs that you pointed to do not seem to be consistent with the behavior shown here. There seems to be a difference between when an Ecto.NoResultsError is raised vs when a non-Ecto error is raised.

For example, when I use PhoenixPlayground.start(live: NormalExceptionShowsCrash) from the PhoenixPlayground example provided, the browser does not issue a reload request, and does not go into an infinite loop of reloading. Instead, the following is logged:

phx-F-gyvvrvWsZH-M9m error: unable to join -  {reason: 'join crashed'}

which looks like it's from the live socket debugging functions, and the page waits several seconds (around 30 from my count, which is consistent with what I remember about LiveView's heartbeats) before a full page refresh. On the server, the crash is logged appropriately.

When using the PhoenixPlayground.start(live: InfiniteReloadLoop) example, the browser immediately without stopping goes into a full page reload infinite loop, and all logs are swallowed up.

You can see that the exception is raised in the same place (within the connected handle_params call), however, they are treated differently and not consistently with what the docs say.

From the code that you linked in channel.ex, I assume the difference here is due to the fact that Plug.Exception.status(exception) will fallback to any, and return a 500 if the exception is not a plug exception, thus skipping the reload command being sent to the caller (since the if checks for less than 500), and simply re-raising the exception (which crashes the GenServer).

Again, this behavior isn't consistent nor is it properly documented - the docs simply say that the LiveView will reload the page, but they fail to mention that an error underneath 500 will force the page into an infinite reloading loop, while an error above 500 will reload the page after around 30 seconds.

SteffenDE commented 3 months ago

There is a difference, yes, but in both cases LiveView triggers a redirect, hoping that the issue will happen again on the dead mount, causing the regular plug exception handling. Your analysis is right, "regular" exception trigger a "join crashed" and exceptions implementing Plug.Exception with an error code >= 400 < 500 are handled differently by sending a "reload" message.

This behavior was introduced in https://github.com/phoenixframework/phoenix_live_view/commit/5bfe422c6e5903a19396765877b66ddf48c0fb63, so @chrismccord may chime in to explain the reasons behind it. I assume it's to have quicker feedback on common errors like an Ecto.NoResultsError. The code just doesn't expect this to only happen on connected mount.

So don't get me wrong. I'm not arguing that the current behavior is good, we should definitely prevent this from happening :)

dvic commented 2 months ago

Not sure if this has been discussed before: but wouldn't error boundaries be a solution to this problem? In this way we could also isolate the errors better, instead of letting the whole page crash.

phoenixframework / phoenix_live_view