Load balancer and sign-in issues (a Pow issue? HAProxy issue?)

sensiblearts commented 5 years ago

I'm learning HAProxy to route to 2 backend phoenix servers, round-robin on 4000 and 4002, and I get random redirect to sign in page, after signing in.

In fact, it's the same issue described here 3 years ago: "For some reason I am having some trouble with my application after I login and it sets the session, sometimes I have to refresh to be sent to the correct page, not sure why yet and I believe it's something in the HAProxy configuration."

I'm guessing it's cookie-related and I'm wondering whether the answer is on the front end (HAProxy config) or on the back end (clustering so Pow in the backend servers works as one).

Any thoughts or experiences in this area?

danschultzer commented 5 years ago

Are you using EtsCache as the cache store (this is default, but shouldn’t be used in production)? If so, switch to Mnesia or Redis. If you’ll use Mnesia, remember to connect the nodes so you have a distributed setup.

Coherence only worked for single instance, because it only used ETS.

sensiblearts commented 5 years ago

Thanks, I am using defaults so that's probably the issue.

I'll go with Mnesia to start.

I've never connected elixir nodes, but I guess that's the whole point of beam, so time to learn :-)

danschultzer commented 5 years ago

Cool 😄 I haven't set multi-node mnesia up myself, so I would love to hear your experience with it, and help out if possible. There may be a some details to note that can improve the docs.

From my understanding, you would just need to update the :nodes config in the above readme example with the nodes you got running (e.g. ['a@127.0.0.1, 'b@127.0.0.1']), and start the nodes with the --name argument.

sensiblearts commented 5 years ago

Banging my head. Various attempts and errors.

Pow is having trouble on mnesia table_init/1:

    topologies = [
      webservers: [
        strategy: Cluster.Strategy.Epmd,
        config: [hosts: [:"a@David-HP-i7", :"b@David-HP-i7"]],
      ]
    ]
#....
    children = [
      Gjwapp.Repo,
      {Cluster.Supervisor, [topologies, [name: Gjwapp.ClusterSupervisor]]},
      supervisor(GjwappWeb.Endpoint, []),
      worker(Pow.Store.Backend.MnesiaCache, [[nodes: [:"a@David-HP-i7",:"b@David-HP-i7"]]])
    ]

Latest error,

iex --sname a -S mix phx.server gives:

** (Mix) Could not start application gjwapp: Gjwapp.Application.start(:normal, []) returned an error: shutdown: failed to start child: Pow.Store.Backend.MnesiaCache
    ** (EXIT) an exception was raised:
        ** (CaseClauseError) no case clause matching: {:aborted, {:not_active, Pow.Store.Backend.MnesiaCache, :"b@David-HP-i7"}}
            (pow) lib/pow/store/backend/mnesia_cache.ex:179: Pow.Store.Backend.MnesiaCache.table_init/1
            (pow) lib/pow/store/backend/mnesia_cache.ex:66: Pow.Store.Backend.MnesiaCache.init/1
            (stdlib) gen_server.erl:374: :gen_server.init_it/2
            (stdlib) gen_server.erl:342: :gen_server.init_it/6
            (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3

Before that I tried without explicit naming:

    topologies = [
      webservers: [
        strategy: Cluster.Strategy.Gossip
      ]
    ]

but got other errors, with :noname@:nohost

Thanks for any tips. DA

sensiblearts commented 5 years ago

In your readme, is node() in worker(Pow.Store.Backend.MnesiaCache, [[nodes: [node()]]]) defined? Or do I have to implement it or provide the list? I don't see Kernel.node/0, so I think I know...

sensiblearts commented 5 years ago

Also, for now I just set

config :mnesia, dir: './_build/prod/rel/beta1/tmp/mnesia

becuase it has write permissions (I'm building dev but just use that path hard coded for now.)

danschultzer commented 5 years ago

node() is a kernel method and just returns the current node.

You can rewrite to use the newer syntax:

    children = [
      Gjwapp.Repo,
      {Cluster.Supervisor, [topologies, [name: Gjwapp.ClusterSupervisor]]},
      supervisor(GjwappWeb.Endpoint, []),
      {Pow.Store.Backend.MnesiaCache, nodes: [:"a@David-HP-i7",:"b@David-HP-i7"]}
    ]

The error you get is because one of the nodes is not running. It looks like you have to first connect the two nodes before starting :mnesia. I'm working through it, and will let you know when I get it running. Unfortunately, documentation is lacking for mnesia, and to set up the distributed cluster, so I'll just have to go through it step by step to get it to run 😄

danschultzer commented 5 years ago

Ok, this is how I got it working. I read through this article and this one about libcluster to understand how the cluster would work.

libcluster may make it easier to work with the cluster, but I just went ahead and manually connected the nodes. First let's start out with the readme example:

defmodule PowDemo.Application do
  @moduledoc false

  use Application

  def start(_type, _args) do
    children = [
      PowDemo.Repo,
      PowDemoWeb.Endpoint,
      {Pow.Store.Backend.MnesiaCache, nodes: [node()]}
    ]

    opts = [strategy: :one_for_one, name: PowDemo.Supervisor]
    Supervisor.start_link(children, opts)
  end

  # ...
end

The above just let each node runs its own mnesia server, and is the default set up.

We'll now add an initialization method to copy the data and share between the nodes and use Node.list() to pull all the nodes connected:

defmodule PowDemo.Application do
  @moduledoc false

  use Application

  def start(_type, _args) do
    init_mnesia_cluster(node())

    children = [
      PowDemo.Repo,
      PowDemoWeb.Endpoint,
      {Pow.Store.Backend.MnesiaCache, nodes: Node.list()}
    ]

    opts = [strategy: :one_for_one, name: PowDemo.Supervisor]
    Supervisor.start_link(children, opts)
  end

  # ...

  defp init_mnesia_cluster(node) do
    connect_nodes()

    :mnesia.start()
    :mnesia.change_config(:extra_db_nodes, Node.list())
    :mnesia.change_table_copy_type(:schema, node, :disc_copies)
    :mnesia.add_table_copy(Pow.Store.Backend.MnesiaCache, node, :disc_copies)
  end

  defp connect_nodes(), do: Enum.each(nodes(), &Node.connect/1)

  defp nodes() do
    {:ok, hostname} = :inet.gethostname()

    for sname <- ["a", "b"], do: :"#{sname}@#{hostname}"
  end
end

As you can see, I've updated the :nodes to use the list of nodes. This seems to be working fine, but I've only tested this in my dev environment.

It's also worth noting that the first article does this initialization in the release task. That may make more sense.

sensiblearts commented 5 years ago

That worked like a charm, Thank you! HAProxy is now honoring the cookie and requests alternate between backend a and b.

I'm going to do some reading on erlang clustering and nail down my requirements. I don't need auto-scaling. The first thing that occurred to me was to keep a node list in postgres. I would just have an admin panel to do CRUD on snames and hosts. But that's probably a hack and I'll have to read what normal practice is.

danschultzer commented 5 years ago

Great! Postgres could work fine.

I think I would just use an env var (update it on all instances each time a new node is spinning up). Here's a post with self discovery on a single machine using the file system that may help thinking about alternatives: https://www.spectory.com/blog/Elixir%20Self%20Discovering%20Cluster

However, there was an interesting tidbit with the previous blog post I posted:

The last issue with this is... it’s manual. Yep, in an era where everything needs to be automated, our system depends on manually connecting both servers and running this function. As we don’t really like things that require human interaction to work, we needed to automate this workflow.

For this, we needed two things: the current server IP and at least one IP from the cluster.

That sounds like you only have to connect to one other host for it to be connected to the whole cluster! If that's how it works, then that makes it super easy to spin up new nodes and restarting old ones :smile:

pow-auth / pow

Load balancer and sign-in issues (a Pow issue? HAProxy issue?) #219