phoenixframework / phoenix_pubsub_redis

The Redis PubSub adapter for the Phoenix framework
175 stars 66 forks source link

Presence issues #26

Closed hamiltop closed 5 years ago

hamiltop commented 7 years ago

Discussed a bit on IRC, opening an issue so we can better document the issues.

Presence using Phoenix.PubSub.Redis has a few issues not present when using Phoenix.PubSub.PG2.

  1. Join and leave events are sent N times, where N is the number of servers.
  2. Zombie presences happen

Is it possible the first is causing the second?

To see the N join and leave events, set up N=2 nodes (simple chat app with presence):

PORT=4000 elixir --sname=n0 -S mix phoenix.server
PORT=4001 elixir --sname=n1 -S mix phoenix.server

Go to a browser and trigger a join event. You should see 2 identical join messages.

I think this first issue might be causing other issues so I'm hesitant to weigh in on why zombie processes are still happening (even with named nodes and homogenous hardware).

hamiltop commented 7 years ago

For zombie presences, here's the Presence State:

%{broadcast_period: 1500, clock_sample_periods: 2, current_sample_count: 1,
  deltas: [%Phoenix.Tracker.State{cloud: #MapSet<[{{:channel@eced99b58fe5,
       1475712441601646}, 1}]>, context: %{}, delta: :unset, mode: :delta,
    pids: nil,
    range: {%{{:channel@eced99b58fe5, 1475712441601646} => 1},
     %{{:channel@a922f17b5f9c, 1475712442528618} => 0,
       {:channel@ac4b42b2277a, 1475712441525883} => 0,
       {:channel@eced99b58fe5, 1475712441601646} => 1,
       {:investigate@e47d34e9b784, 1475711012624954} => 0}},
    replica: {:investigate@e47d34e9b784, 1475711012624954}, replicas: %{},
    values: %{{{:channel@eced99b58fe5, 1475712441601646},
       1} => {#PID<15527.515.0>, "users:hack_demo", "Peter",
       %{online_at: "1475712812", phx_ref: "TiolORFe+vk="}}}},
   %Phoenix.Tracker.State{cloud: #MapSet<[{{:channel@eced99b58fe5,
       1475712441601646}, 1}]>, context: %{}, delta: :unset, mode: :delta,
    pids: nil,
    range: {%{{:channel@eced99b58fe5, 1475712441601646} => 1},
     %{{:channel@a922f17b5f9c, 1475712442528618} => 0,
       {:channel@ac4b42b2277a, 1475712441525883} => 0,
       {:channel@eced99b58fe5, 1475712441601646} => 1,
       {:investigate@e47d34e9b784, 1475711012624954} => 0}},
    replica: {:investigate@e47d34e9b784, 1475711012624954}, replicas: %{},
    values: %{{{:channel@eced99b58fe5, 1475712441601646},
       1} => {#PID<15527.515.0>, "users:hack_demo", "Peter",
       %{online_at: "1475712812", phx_ref: "TiolORFe+vk="}}}},
   %Phoenix.Tracker.State{cloud: #MapSet<[{{:channel@eced99b58fe5,
       1475712441601646}, 1}]>, context: %{}, delta: :unset, mode: :delta,
    pids: nil,
    range: {%{{:channel@eced99b58fe5, 1475712441601646} => 1},
     %{{:channel@a922f17b5f9c, 1475712442528618} => 0,
       {:channel@ac4b42b2277a, 1475712441525883} => 0,
       {:channel@eced99b58fe5, 1475712441601646} => 1,
       {:investigate@e47d34e9b784, 1475711012624954} => 0}},
    replica: {:investigate@e47d34e9b784, 1475711012624954}, replicas: %{},
    values: %{{{:channel@eced99b58fe5, 1475712441601646},
       1} => {#PID<15527.515.0>, "users:hack_demo", "Peter",
       %{online_at: "1475712812", phx_ref: "TiolORFe+vk="}}}}],
  down_period: 30000, log_level: false, max_delta_sizes: [100, 1000, 10000],
  max_silent_periods: 10,
  namespaced_topic: "phx_presence:Elixir.R101Channel.Presence",
  pending_clockset: [], permdown_period: 60000,
  presences: %Phoenix.Tracker.State{cloud: #MapSet<[{{:channel@f25f7d246ff5,
      1475710664658338}, 2187},
    {{:channel@90279305e724, 1475710676205113}, 842},
    {{:channel@f25f7d246ff5, 1475710664658338}, 7904},
    {{:channel@90279305e724, 1475710676205113}, 8847},
    {{:channel@f25f7d246ff5, 1475710664658338}, 5201},
    {{:channel@f25f7d246ff5, 1475710664658338}, 6383},
    {{:channel@f25f7d246ff5, 1475710664658338}, 2713},
    {{:channel@f25f7d246ff5, 1475710664658338}, 7019},
    {{:channel@f25f7d246ff5, 1475710664658338}, 3473},
    {{:channel@90279305e724, 1475710676205113}, 9132},
    {{:channel@90279305e724, 1475710676205113}, 9045},
    {{:channel@f25f7d246ff5, 1475710664658338}, 4593},
    {{:channel@90279305e724, 1475710676205113}, 929},
    {{:channel@f25f7d246ff5, 1475710664658338}, 4915},
    {{:channel@f25f7d246ff5, 1475710664658338}, 6333},
    {{:channel@f25f7d246ff5, 1475710664658338}, 7240},
    {{:channel@f25f7d246ff5, 1475710664658338}, 5938},
    {{:channel@f25f7d246ff5, 1475710664658338}, 3460},
    {{:channel@f25f7d246ff5, 1475710664658338}, 6564},
    {{:channel@f25f7d246ff5, 1475710664658338}, 4027},
    {{:channel@f25f7d246ff5, 1475710664658338}, 2329},
    {{:channel@90279305e724, 1475710676205113}, 940},
    {{:channel@90279305e724, 1475710676205113}, 9106},
    {{:channel@f25f7d246ff5, 1475710664658338}, 4561},
    {{:channel@f25f7d246ff5, 1475710664658338}, 4552},
    {{:channel@f25f7d246ff5, 1475710664658338}, 4161},
    {{:channel@f25f7d246ff5, 1475710664658338}, 2865},
    {{:channel@f25f7d246ff5, 1475710664658338}, 5098},
    {{:channel@f25f7d246ff5, 1475710664658338}, 3272},
    {{:channel@90279305e724, 1475710676205113}, 7900},
    {{:channel@f25f7d246ff5, 1475710664658338}, 7829},
    {{:channel@90279305e724, 1475710676205113}, 7987},
    {{:channel@90279305e724, 1475710676205113}, 700},
    {{:channel@f25f7d246ff5, 1475710664658338}, 5344},
    {{:channel@f25f7d246ff5, ...}, 6179}, {{...}, ...}, {...}, ...]>,
   context: %{{:channel@1230710ca876, 1475709456736068} => 636,
     {:channel@1d67fed32749, 1475709858391608} => 513,
     {:channel@2193f1e81223, 1475709458421283} => 749,
     {:channel@2193f1e81223, 1475709830534507} => 1269,
     {:channel@312cd1ab5871, 1475710199953019} => 211,
     {:channel@44a4265186d9, 1475710424384607} => 199,
     {:channel@4e1f755db374, 1475710339126267} => 198,
     {:channel@4f32a7167c98, 1475709456744647} => 1,
     {:channel@4f32a7167c98, 1475709793434868} => 1403,
     {:channel@4f32a7167c98, 1475710396881144} => 14,
     {:channel@7ea3df244f3a, 1475709876664304} => 582,
     {:channel@90279305e724, 1475711709297191} => 185,
     {:channel@ad8896bae006, 1475710374476862} => 218,
     {:channel@b3683ad2658f, 1475709856857074} => 445,
     {:channel@b3683ad2658f, 1475710179361681} => 3,
     {:channel@c4db3f690254, 1475710375161698} => 233,
     {:channel@c58a4fdba29c, 1475710508256317} => 153,
     {:channel@c60a125a16a0, 1475711674305976} => 20,
     {:channel@c60a125a16a0, 1475711704719175} => 913,
     {:channel@c7c21a1d21a0, 1475710473246353} => 440,
     {:channel@e947e1c3bbd3, 1475710210663183} => 162,
     {:channel@eced99b58fe5, 1475712441601646} => 1,
     {:channel@f9087a39a66c, 1475710258113774} => 153},
   delta: %Phoenix.Tracker.State{cloud: #MapSet<[]>, context: %{},
    delta: :unset, mode: :delta, pids: nil,
    range: {%{}, %{{:investigate@e47d34e9b784, 1475711012624954} => 0}},
    replica: {:investigate@e47d34e9b784, 1475711012624954}, replicas: %{},
    values: %{}}, mode: :normal, pids: 114735, range: {%{}, %{}},
   replica: {:investigate@e47d34e9b784, 1475711012624954},
   replicas: %{{:channel@25464af1e95d, 1475710592789017} => :down,
     {:channel@90279305e724, 1475710676205113} => :down,
     {:channel@90279305e724, 1475711709297191} => :down,
     {:channel@90279305e724, 1475711721205996} => :down,
     {:channel@a922f17b5f9c, 1475712442528618} => :up,
     {:channel@ac4b42b2277a, 1475712441525883} => :up,
     {:channel@c5eb6b7ddf3a, 1475710544610229} => :down,
     {:channel@c60a125a16a0, 1475710628965946} => :down,
     {:channel@c60a125a16a0, 1475711674305976} => :down,
     {:channel@c60a125a16a0, 1475711685946217} => :down,
     {:channel@c60a125a16a0, 1475711704719175} => :down,
     {:channel@c60a125a16a0, 1475711721272809} => :down,
     {:channel@eced99b58fe5, 1475712441601646} => :up,
     {:channel@f25f7d246ff5, 1475710664658338} => :down,
     {:channel@f25f7d246ff5, 1475711694314423} => :down,
     {:investigate@e47d34e9b784, 1475711012624954} => :up}, values: 110638},
  pubsub_server: R101Channel.PubSub,
  replica: %Phoenix.Tracker.Replica{last_heartbeat_at: nil,
   name: :investigate@e47d34e9b784, status: :up, vsn: 1475711012624954},
  replicas: %{channel@a922f17b5f9c: %Phoenix.Tracker.Replica{last_heartbeat_at: 1475712954519,
     name: :channel@a922f17b5f9c, status: :up, vsn: 1475712442528618},
    channel@ac4b42b2277a: %Phoenix.Tracker.Replica{last_heartbeat_at: 1475712953576,
     name: :channel@ac4b42b2277a, status: :up, vsn: 1475712441525883},
    channel@eced99b58fe5: %Phoenix.Tracker.Replica{last_heartbeat_at: 1475712961211,
     name: :channel@eced99b58fe5, status: :up, vsn: 1475712441601646}},
  server_name: R101Channel.Presence, silent_periods: 1,
  tracker: R101Channel.Presence,
  tracker_state: %{node_name: :investigate@e47d34e9b784,
    pubsub_server: R101Channel.PubSub,
    task_sup: R101Channel.Presence.TaskSupervisor}}

iex(investigate@e47d34e9b784)139> R101Channel.Presence.list("users:hack_demo") |> Map.get("5201")
%{metas: [%{online_at: "1475710419", phx_ref: "i4Hm4hinP0g="}]}

There are a bunch of presences from {:channel@f25f7d246ff5, 1475710664658338}, which now no longer exists. User "5201" has been disconnected, but a presence still exists

hamiltop commented 7 years ago

The zombie presences seem to only occur when a process hits OOM. Going to try to reproduce with PG2.

Also, is there a way to manually clear out zombie presences? Stopping all hosts is less than ideal. We build all our infrastructure tools to do rolling restarts and that just propogates the bad state.

mitchellhenke commented 6 years ago

@hamiltop a bit late on my part, but is this still an issue for you?

hamiltop commented 6 years ago

I haven't tried since a year ago. I'll give it a shot again and see.

mitchellhenke commented 6 years ago

@hamiltop that would be fantastic šŸ™‚

KamilLelonek commented 5 years ago

Any changes here?

mitchellhenke commented 5 years ago

@KamilLelonek Are you seeing this issue?

KamilLelonek commented 5 years ago

I used to when I was using it. Now I'm planning to do that again and I wonder whether it will happen.

mitchellhenke commented 5 years ago

28 mentions that they didn't see any zombie presences, so it may be fixed.

I unfortunately won't have time to try to reproduce this week, so I can't confirm whether or not the issue persists. If you are able to confirm, I can take a closer look on a fix šŸ™‚

mitchellhenke commented 5 years ago

Without a way to reproduce, Iā€™m going to close this issue