pow-auth / pow_site

Website for Pow
https://powauth.com
MIT License
4 stars 2 forks source link

Guide on cluster setup with the MnesiaCache #10

Open danschultzer opened 4 years ago

danschultzer commented 4 years ago

@sensiblearts has been writing a guide on cluster distribution strategies with Pow: https://github.com/danschultzer/pow/issues/220

I believe it fits better here, as I like to keep third-party library references in the docs in the Pow to a minimum (I'm also contemplating moving the swoosh/bamboo integration guide to here).

The guide @sensiblearts has been working on is already extensive and it's a good read to understand different strategies: https://github.com/sensiblearts/pow/blob/master/guides/mnesia_cache_store_backend.md

sensiblearts commented 4 years ago

Sorry for the delay. I've been asked to help start a tree planting organization and I had to set this aside.

I'm hesitant to just paste the current mnesia guide that I wrote into a new powauth site file, because I think we should first resolve the initialization logic to work with libcluster.

My understanding of the mnesia initialization logic: The first node needs know that it is alone so that it can create (rather than replicate) its mnesia data table. If it is not alone, then it replicates the table schema rather than creating it.

However, when using libcluster, I cannot think of a way to know for sure that the first-booting node is alone. The only way to know if there are any connected nodes is after the current node is connected, i.e., after the :connect callback -- which will never happen if the node is alone.

Another way to say it: There is no event that says, "you are alone." The only event we have is libcluster :connect which means you're not alone. But the first node needs know that it is alone so that it can create its mnesia data table as a basis for the others to replicate.

The only way out of this that I can think of simply to wait a bit (10 seconds) before calling Node.list(); and if it's [] then your pow code creates the table rather than replicate it.

Am I misunderstanding mnesia copying?

danschultzer commented 4 years ago

Yeah, it's how the MnesiaCache works, however you can just provide a hardcoded list of nodes for it to connect automatically. If none of the provided hosts are connected it'll start up by itself, otherwise it'll connect to the existing cluster.

Using Node.list() you have to make sure that the nodes are already connected, but with the supervisor this happens before libcluster initializes so no connected nodes yet. The way to deal with this is to ensure that libcluster has initialized first, maybe waiting for a message to be received before calling Node.list() and starting the MnesiaCache.

I'll look into libcluster and see how you can deal with this.

danschultzer commented 4 years ago

Edit: This is what works:

defmodule MyApp.MnesiaClusterSupervisor do
  use Supervisor

  def start_link(init_arg) do
    Supervisor.start_link(__MODULE__, init_arg, name: __MODULE__)
  end

  @impl true
  def init(_init_arg) do
    children = [
      {Pow.Store.Backend.MnesiaCache, extra_db_nodes: Node.list()},
      Pow.Store.Backend.MnesiaCache.Unsplit
    ]

    Supervisor.init(children, strategy: :one_for_one)
  end
end
  def start(_type, _args) do
    topologies = [
      example: [
        strategy: Cluster.Strategy.Kubernetes,
        config: [
          # ...
        ]
      ]
    ]

    # List all child processes to be supervised
    children = [
      {Cluster.Supervisor, [topologies, [name: MyApp.ClusterSupervisor]]},
      MyApp.MnesiaClusterSupervisor,
      # Start the Ecto repository
      MyApp.Repo,
      # Start the endpoint when the application starts
      MyAppWeb.Endpoint
      # Starts a worker by calling: MyApp.Worker.start_link(arg)
      # {MyApp.Worker, arg},
    ]

    # See https://hexdocs.pm/elixir/Supervisor.html
    # for other strategies and supported options
    opts = [strategy: :one_for_one, name: MyApp.Supervisor]
    Supervisor.start_link(children, opts)
  end
danschultzer commented 4 years ago

Ok, found a solution, updated above example 😄

sensiblearts commented 4 years ago

UPDATE: This is working for a single server for me. Will try load balancing and add/drop in the next few days; when confirmed, will update the guide and add it docs.

sensiblearts commented 4 years ago

@danschultzer , Since it's rough I thought I would just paste the whole thing here; please let me know if you don't want me doing this. Search for the phrase: "please check the logic in this narrative":

Mnesia cache store backend

Pow.Store.Backend.MnesiaCache implements the behaviour declared in Pow.Store.Base to use Erlang's Mnesia database as the session cache.

The reason you might use Mnesia is that, in a clustered situation, you need to enable distributed checking of Pow User sessions, regardless of which backend server is chosen (e.g., by a load balancer) to service a given http request. This stateless load balancing is possible because Mnesia, being a distributed database, can replicate the cache across all connected nodes. Hence, there is no need for stateful routing at the load balancer (a.k.a., no need for "sticky sessions"). However, there is a need to connect the nodes!

Connecting elixir nodes is straight forward, but the details can be complicated depending on the infrastructure strategy being used. For example, there is much more to learn if autoscaling or dynamic node discovery are requried. However, this is outside the scope of the current guide.

This guide will first describe installation and configuration of Pow to use the Mnesia cache store. Then, a few use cases are described, along with considerations relative to Pow User sessions:

  1. Use of the libcluster Cluster.Strategy.Gossip where all nodes broadcast UDP messages over the same port to find one another; hence, no need to specify host IPs or the erlang short names
  2. Use of the libcluster Cluster.Strategy.Epmd strategy to connect a pre-configured set of server nodes that are specified (in the elixir code) by fully qualified hostnames (e.g., "mynode@127.0.0.1")
  3. Use of the libcluster Cluster.Strategy.ErlangHosts strategy to read the list of server IPs or hostnames from the .hosts.erlang file and connect then dynamically; then, the servers automatically discover the erlang short names for each host
  4. Instead of using libcluster, iterate over an elixir list to connect a pre-configured set of servers

Configuring Pow to use Mnesia: Compile-time Configuration

Mnesia is part of OTP so there are no additional dependencies to add to mix.exs.

Depending on whether you are working in development or production mode, be sure that (in either /config/dev.exs or /config/prod.exs) you have specified an Mnesia storage location, such as:

config :mnesia, dir: to_charlist(File.cwd!) ++ '/priv/mnesia'

and that you have configured Pow to use Mnesia:

config :my_app, :pow,
  user: MyApp.Users.User,
  repo: MyApp.Repo,
  # ...
  cache_store_backend: Pow.Store.Backend.MnesiaCache

Also, you need to add :mnesia to :extra_applications in mix.exs to ensure that it's also included in the release; in mix.exs, specify:

  def application do
    [
      mod: {MyApp.Application, []},
      extra_applications: [:mnesia, ... ],
    ]
  end

Installing libcluster

libcluster offers:

Dan: please check the logic in this narrative:

Because we are using libcluster we are faced with an apparent conundrum: If an elixir node boots and then its libcluster (e.g., broadcast) calls fail to detect any other nodes, then the booting node is apparently by itself and should create the mnesia database. However, libcluster will continue to broadcast to check for any new nodes, and it might discover that there is in fact another node -- and that node could be newer or older than the node that just booted. This uncertainty occurs because of the asynchronous nature of the RPC calls to the erlang port mapping daemon (EPMD) and to other nodes while booting the tree of genservers -- your app, libcluster, mnesia, and so on. Consequently, you could imagine that two such nodes could boot, and for few seconds, there might be two copies of the mnesia database, both of which "think they are their own master," so to speak. Let's call this the "boot problem." The solution to this boot problem turns out to be the solution to a related problem: Netsplits.

Mnesia is a distributed, masterless database; masterless, in that the intent is that all nodes will eventually be consistent, and the communication protocol treats all peers equally: Any new data added to mnesia in one node will get replicated to all other nodes. However, nodes can go down! And when they come back up, there is an inconsistency if data were added during the outage. In order to heal this inconsistency, we need to figure out which node should temporarilly serve as a master from which the recovering nodes can replicate and return to a consistent state. In other words, we need to pick a node -- such as the node that booted least recently -- and copy the data from that node to all others. I.e., we need to "unsplit" the netsplit.

As it turns out, unsplitting is also the solution to the "boot problem" (above) of a newly booted node creating a new database because it "thinks" it's alone but finds out seconds later that other nodes were up first and that it should have replicated rather than created the database. Simply, the solution is that every node, when it boots, can create the database assuming that it's alone, and then let the unsplitting logic detect if otherwise; and if so, treat it as a node that needs to be healed.

Mnesia has notifications for splitting and healing events (e.g., :inconsistent_database system events), but it is up to mnesia clients, such as Pow, to monitor these events and heal. For this, Pow provides an Unsplit genserver that you can add to your supervision tree. Specifically, Unsplit is a Genserver that listens to Mnesia for :inconsistent_database events. A good way to add Unsplit to your application is to create a new module to supervise the unsplit module; for example, in the same folder as your application.ex file, add a mnesia_cluster_supervisor.ex file:

defmodule MyApp.MnesiaClusterSupervisor do
  use Supervisor

  def start_link(init_arg) do
    Supervisor.start_link(__MODULE__, init_arg, name: __MODULE__)
  end

  @impl true
  def init(_init_arg) do
    children = [
      {Pow.Store.Backend.MnesiaCache, extra_db_nodes: Node.list()},
      Pow.Store.Backend.MnesiaCache.Unsplit
    ]

    Supervisor.init(children, strategy: :one_for_one)
  end
end

Then, add your new ClusterSupervisor to your own app's supervision tree in application.ex:

def start(_type, _args) do

    topologies = [
      myapp: [
        strategy: Cluster.Strategy.Gossip
      ]
    ]
    # By default, the libcluster gossip occurs on port 45892, 
    # using the multicast address 230.1.1.251 to find other nodes

    children = [
      {Cluster.Supervisor, [topologies, [name: MyApp.ClusterSupervisor]]},
      MyApp.MnesiaClusterSupervisor,
      MyApp.Repo,
      MyAppWeb.Endpoint
    ] 

    opts = [strategy: :one_for_one, name: MyApp.Supervisor]
    Supervisor.start_link(children, opts)
  end

Now, when your app boots, it will have a supervisor to keep your node cluster intact and a supervisor to keep mnesia consistent, which relies on an unsplit genserver to automatically do any consistency healing involved.

Specifically, when your application starts it will initialize the MnesiaCache Generver, which starts the node's instance of mnesia and then connects to the other nodes in the cluster (internally, by calling :mnesia.change_config/2). Once connected to the other nodes (and their mnesia instances), the connecting node will make a remote procedure call to one of the other nodes (e.g., the one at the head of the list of the cluster nodes) to retrieve the mnesia table information. This table information is then used as a parameter when calling internally :mnesia.add_table_copy/3, which creates a local replicate of the cache table; or, does nothing if the table already exists. If, after this, an :inconsistent_database event is received, the relevant MnesiaCache.Unsplit methods will be called upon to heal the inconsistency.

Adding or removing nodes at run time

As long as at least one node in the :extra_db_nodes list is connected to the cluster, the MnesiaCache instances for all other nodes will automatically be connected and replicated. This makes it very easy to join clusters, since you won't have to update the config on the old nodes at all. And as long as the old nodes connect to at least one node that's in the cluster when restarting, it'll automatically connect to the new node as well, even without updating the :extra_db_nodes setting.

Using other libraries that also use the Mnesia instance

It's strongly recommended to take into account any libraries that will be using Mnesia for storage before using the Unsplit module.

A common example would be a job queue, where a potential solution to prevent data loss is to simply keep the job queue table on only one server instead of replicating it among all nodes. If you do this, then when a network partition occurs, the job queue table can be excluded from the tables to be flushed (and restored) during healing by setting :flush_tables to false (the default). This way, the Unsplit module can self-heal without affecting the job queue table.

Example: Using libcluster's Cluster.Strategy.Gossip strategy to discover 2 servers dynamically

We will illustrate this with two servers but you can expand this logic to any reasonable number that an erlang cluster can handle.

First, as described above, make sure that you have configured libcluster for the Gossip strategy in application.ex:

    topologies = [
      myapp: [
        strategy: Cluster.Strategy.Gossip
      ]
    ]
    # By default, the libcluster gossip occurs on port 45892, 
    # using the multicast address 230.1.1.251 to find other nodes

NOTE: If you are going to run both elixir nodes on the same physical machine, you will either have to copy your application code in two locations or change the mnesia data location before you launch the second node. However, if you are running the nodes on separate virtual machines then you can ignore the "edit" instructions in this section:

First, edit the database files location for your first node instance:

config :mnesia, dir: '/tmp/mnesia'

Then start node a:

MIX_ENV=dev PORT=4000 elixir --sname a -S mix phx.server

Next, edit the database files location for your second node instance:

config :mnesia, dir: '/tmp/mnesia2'

Then start node b:

MIX_ENV=dev PORT=4002 elixir --sname b -S mix phx.server

Now, if you run the above servers as backends with e.g., HAProxy or nginx as a round-robin load balancer, you will see that:

Example: Using libcluster's Cluster.Strategy.Epmd strategy to connect a pre-configured set of servers

Libcluster has a number of ways to detect nodes to connect. Gossip mode (above) is probably the easiest (and perhaps most flexible and reliable) approach. However you can also use Epmd:

topologies = [ myapp: [ strategy: Cluster.Strategy.Epmd, config: [hosts: [:"a@127.0.0.1", :"b@127.0.0.1"]], ] ]

If you edit your application.ex as above and try to start nodes a and b as described previously, you will get

[error] ** System NOT running to use fully qualified hostnames **

This tells us that we have to start our servers with --name rather than --sname.

So, as before, edit the database files location for your first node instance:

config :mnesia, dir: '/tmp/mnesia'

But this time, start node a with the full name:

MIX_ENV=dev PORT=4000 elixir --name a@127.0.0.1 -S mix phx.server

Next, edit the database files location for your second node instance:

config :mnesia, dir: '/tmp/mnesia2'

Then start node b:

MIX_ENV=dev PORT=4002 elixir --name b@127.0.0.1 -S mix phx.server

Again, if run behind a load balancer such as HAProxy, node b accepts the session credentials that were created if sign-in occurred on node a, and vice versa, and you can kill and restart a or b without any downtime from the perspective of the browser session.

Example: Use of the libcluster's Cluster.Strategy.ErlangHosts to connect servers specified in the .hosts.erlang file

To use libcluster's ErlangHosts strategy, we first need to create a file in the root working directory of our server(s) to specify the hosts (or IP addresses); because in this example we are testing both nodes on a single dev mode computer, this .hosts.erlang file has a single line:

'127.0.0.1'.

Then, in application.ex we configure libcluster as:

    topologies = [
      myapp: [
        strategy: Cluster.Strategy.ErlangHosts
      ]
    ]

Then, simply start the servers as in the Epmd example above (and remembering to change the write directory for the mnesia files; see above), i.e.,:

MIX_ENV=dev PORT=4000 elixir --name a@127.0.0.1 -S mix phx.server

and

MIX_ENV=dev PORT=4002 elixir --name b@127.0.0.1 -S mix phx.server

Because the .hosts.erlang file contains the IP to query for nodes, libcluster will find both node a and node b.

Once again, if run behind a load balancer such as HAProxy, node b accepts the session credentials that were created if sign-in occurred on node a, and vice versa, and you can kill and restart a or b without any downtime from the perspective of the browser session.

Further reading

libcluster

sensiblearts commented 4 years ago

@danschultzer , Actually, the above works for 1 node but does not yet work when I add the second: When I open the Gossip port and it tries to connect, both servers crash. More info later.

jwietelmann commented 4 years ago

@danschultzer @sensiblearts Any new info about whether or not this works?

sensiblearts commented 4 years ago

@jwietelmann @danschultzer , I've been away from the code for a few months but am returning next week and will try this again. I'll be upgrading from Pow v 1.0.13 to v 1.0.19 and then I'll run some experiments with Digital Ocean VPSs.

jacobwarren commented 4 years ago

@sensiblearts - is this working for you in prod yet? I'm using Gigalixir which has rolling deploys, so distributing the Mnesia cache is the only persistence option I have.

danschultzer commented 4 years ago

@jacobwarren an alternative is Nebulex or Redis. Though Mnesia has worked fine for me in production environment, I've heard that distribution issues have been pretty difficult to debug. It makes me reconsider my recommendation for Mnesia, even though it's part of standard Erlang distribution. I feel it's very under utilized in the Elixir community unfortunately, with limited documentation. Really wish Mnesia was used more 😞

sensiblearts commented 4 years ago

@jacobwarren , @danschultzer , I intended to test that but it's been a strange year, to say the least.

Mnesia is of course working for a single instance, but I have not had (and do not anticipate anytime soon) a need for more than one webserver behind my load balancer. But there is always hope :-)

Would either of you like me to test it with 2 or 3 webservers? Or, @danschultzer , do you think it would be a waste of a day for me? (Also, my test would be crude, as I have no automated client testing in place. I would just use multiple browsers, my phone etc., and see if there are any dropped or missed sessions.)

I'm happy to help (I feel that I owe @danschultzer more, for Pow), but I don't want to waste time. If you have a series of steps or protocol to test this, I welcome the suggestions.

Cheers.

jacobwarren commented 4 years ago

@danschultzer and @sensiblearts - I may be looking for the wrong solution. My issue is that I'm on Gigalixir which has rolling deploys. Whenever I push an update it logs all users out. I'm currently utilizing the Mnesia cache, but because the cache is destroyed every time the container goes down for a new one to go up, it's wiped out. Is there a solution to preserve the cache aside from distributing the application out?

Ps. I have a whole lot of guides to submit for using Pow with Absinthe! :)

montebrown commented 3 years ago

👏 I found @sensiblearts new docs and got libcluster working but got stuck trying to get mnesia to connect correctly. I was starting to untangle it when I finally found this thread. @danschultzer's MyApp.MnesiaClusterSupervisor solution above fixed my mnesia connection problems perfectly (thanks @danschultzer!) and now I have rolling deploys without logging my users out - @jacobwarren I think it would fix your problem as well.

Should it should be added to the docs as well? This seems like a pretty common situation that others are going to run into.

posilva commented 2 years ago

How did you ensure that when MnesiaClusterSupervisor starts it's childrem libcluster strategy already has de cluster setup?

I was trying with Gossip strategy (in localhost) and is not deterministic in any way 😄 . Maybe I am not getting right.