danschultzer commented 4 years ago

@sensiblearts has been writing a guide on cluster distribution strategies with Pow: https://github.com/danschultzer/pow/issues/220

I believe it fits better here, as I like to keep third-party library references in the docs in the Pow to a minimum (I'm also contemplating moving the swoosh/bamboo integration guide to here).

The guide @sensiblearts has been working on is already extensive and it's a good read to understand different strategies: https://github.com/sensiblearts/pow/blob/master/guides/mnesia_cache_store_backend.md

sensiblearts commented 4 years ago

Sorry for the delay. I've been asked to help start a tree planting organization and I had to set this aside.

I'm hesitant to just paste the current mnesia guide that I wrote into a new powauth site file, because I think we should first resolve the initialization logic to work with libcluster.

My understanding of the mnesia initialization logic: The first node needs know that it is alone so that it can create (rather than replicate) its mnesia data table. If it is not alone, then it replicates the table schema rather than creating it.

However, when using libcluster, I cannot think of a way to know for sure that the first-booting node is alone. The only way to know if there are any connected nodes is after the current node is connected, i.e., after the :connect callback -- which will never happen if the node is alone.

Another way to say it: There is no event that says, "you are alone." The only event we have is libcluster :connect which means you're not alone. But the first node needs know that it is alone so that it can create its mnesia data table as a basis for the others to replicate.

The only way out of this that I can think of simply to wait a bit (10 seconds) before calling Node.list(); and if it's [] then your pow code creates the table rather than replicate it.

Am I misunderstanding mnesia copying?

danschultzer commented 4 years ago

Yeah, it's how the MnesiaCache works, however you can just provide a hardcoded list of nodes for it to connect automatically. If none of the provided hosts are connected it'll start up by itself, otherwise it'll connect to the existing cluster.

Using Node.list() you have to make sure that the nodes are already connected, but with the supervisor this happens before libcluster initializes so no connected nodes yet. The way to deal with this is to ensure that libcluster has initialized first, maybe waiting for a message to be received before calling Node.list() and starting the MnesiaCache.

I'll look into libcluster and see how you can deal with this.

danschultzer commented 4 years ago

Edit: This is what works:

defmodule MyApp.MnesiaClusterSupervisor do
  use Supervisor

  def start_link(init_arg) do
    Supervisor.start_link(__MODULE__, init_arg, name: __MODULE__)
  end

  @impl true
  def init(_init_arg) do
    children = [
      {Pow.Store.Backend.MnesiaCache, extra_db_nodes: Node.list()},
      Pow.Store.Backend.MnesiaCache.Unsplit
    ]

    Supervisor.init(children, strategy: :one_for_one)
  end
end

  def start(_type, _args) do
    topologies = [
      example: [
        strategy: Cluster.Strategy.Kubernetes,
        config: [
          # ...
        ]
      ]
    ]

    # List all child processes to be supervised
    children = [
      {Cluster.Supervisor, [topologies, [name: MyApp.ClusterSupervisor]]},
      MyApp.MnesiaClusterSupervisor,
      # Start the Ecto repository
      MyApp.Repo,
      # Start the endpoint when the application starts
      MyAppWeb.Endpoint
      # Starts a worker by calling: MyApp.Worker.start_link(arg)
      # {MyApp.Worker, arg},
    ]

    # See https://hexdocs.pm/elixir/Supervisor.html
    # for other strategies and supported options
    opts = [strategy: :one_for_one, name: MyApp.Supervisor]
    Supervisor.start_link(children, opts)
  end

danschultzer commented 4 years ago

Ok, found a solution, updated above example 😄

sensiblearts commented 4 years ago

UPDATE: This is working for a single server for me. Will try load balancing and add/drop in the next few days; when confirmed, will update the guide and add it docs.

sensiblearts commented 4 years ago

@danschultzer , Since it's rough I thought I would just paste the whole thing here; please let me know if you don't want me doing this. Search for the phrase: "please check the logic in this narrative":

Mnesia cache store backend

Pow.Store.Backend.MnesiaCache implements the behaviour declared in Pow.Store.Base to use Erlang's Mnesia database as the session cache.

The reason you might use Mnesia is that, in a clustered situation, you need to enable distributed checking of Pow User sessions, regardless of which backend server is chosen (e.g., by a load balancer) to service a given http request. This stateless load balancing is possible because Mnesia, being a distributed database, can replicate the cache across all connected nodes. Hence, there is no need for stateful routing at the load balancer (a.k.a., no need for "sticky sessions"). However, there is a need to connect the nodes!

Connecting elixir nodes is straight forward, but the details can be complicated depending on the infrastructure strategy being used. For example, there is much more to learn if autoscaling or dynamic node discovery are requried. However, this is outside the scope of the current guide.

This guide will first describe installation and configuration of Pow to use the Mnesia cache store. Then, a few use cases are described, along with considerations relative to Pow User sessions:

Use of the libcluster Cluster.Strategy.Gossip where all nodes broadcast UDP messages over the same port to find one another; hence, no need to specify host IPs or the erlang short names
Use of the libcluster Cluster.Strategy.Epmd strategy to connect a pre-configured set of server nodes that are specified (in the elixir code) by fully qualified hostnames (e.g., "mynode@127.0.0.1")
Use of the libcluster Cluster.Strategy.ErlangHosts strategy to read the list of server IPs or hostnames from the .hosts.erlang file and connect then dynamically; then, the servers automatically discover the erlang short names for each host
Instead of using libcluster, iterate over an elixir list to connect a pre-configured set of servers

Configuring Pow to use Mnesia: Compile-time Configuration

Mnesia is part of OTP so there are no additional dependencies to add to mix.exs.

Depending on whether you are working in development or production mode, be sure that (in either /config/dev.exs or /config/prod.exs) you have specified an Mnesia storage location, such as:

config :mnesia, dir: to_charlist(File.cwd!) ++ '/priv/mnesia'

and that you have configured Pow to use Mnesia:

config :my_app, :pow,
  user: MyApp.Users.User,
  repo: MyApp.Repo,
  # ...
  cache_store_backend: Pow.Store.Backend.MnesiaCache

Also, you need to add :mnesia to :extra_applications in mix.exs to ensure that it's also included in the release; in mix.exs, specify:

  def application do
    [
      mod: {MyApp.Application, []},
      extra_applications: [:mnesia, ... ],
    ]
  end

Installing libcluster

libcluster offers:

Automatic cluster formation/healing
Choice of multiple clustering strategies out of the box:
- Standard Distributed Erlang facilities (e.g. epmd, .hosts.erlang), which supports IP-based or DNS-based names
- Multicast UDP gossip, using a configurable port/multicast address,
- Kubernetes via its metadata API using via a configurable label selector and node basename; or alternatively, using DNS.
- Rancher, via its [metadata API][rancher-api]
Easy to provide your own custom clustering strategies for your specific environment.
Easy to use provide your own distribution plumbing (i.e. something other than Distributed Erlang), by implementing a small set of callbacks. This allows libcluster to support projects like Partisan.

Installation:
```
defp deps do
[{:libcluster, "~> MAJ.MIN"}]
end
```
You can determine the latest version by running mix hex.info libcluster in your shell, or by going to the libcluster page on Hex.pm.

See: https://github.com/bitwalker/libcluster

Dan: please check the logic in this narrative:

Because we are using libcluster we are faced with an apparent conundrum: If an elixir node boots and then its libcluster (e.g., broadcast) calls fail to detect any other nodes, then the booting node is apparently by itself and should create the mnesia database. However, libcluster will continue to broadcast to check for any new nodes, and it might discover that there is in fact another node -- and that node could be newer or older than the node that just booted. This uncertainty occurs because of the asynchronous nature of the RPC calls to the erlang port mapping daemon (EPMD) and to other nodes while booting the tree of genservers -- your app, libcluster, mnesia, and so on. Consequently, you could imagine that two such nodes could boot, and for few seconds, there might be two copies of the mnesia database, both of which "think they are their own master," so to speak. Let's call this the "boot problem." The solution to this boot problem turns out to be the solution to a related problem: Netsplits.

Mnesia is a distributed, masterless database; masterless, in that the intent is that all nodes will eventually be consistent, and the communication protocol treats all peers equally: Any new data added to mnesia in one node will get replicated to all other nodes. However, nodes can go down! And when they come back up, there is an inconsistency if data were added during the outage. In order to heal this inconsistency, we need to figure out which node should temporarilly serve as a master from which the recovering nodes can replicate and return to a consistent state. In other words, we need to pick a node -- such as the node that booted least recently -- and copy the data from that node to all others. I.e., we need to "unsplit" the netsplit.

As it turns out, unsplitting is also the solution to the "boot problem" (above) of a newly booted node creating a new database because it "thinks" it's alone but finds out seconds later that other nodes were up first and that it should have replicated rather than created the database. Simply, the solution is that every node, when it boots, can create the database assuming that it's alone, and then let the unsplitting logic detect if otherwise; and if so, treat it as a node that needs to be healed.

Mnesia has notifications for splitting and healing events (e.g., :inconsistent_database system events), but it is up to mnesia clients, such as Pow, to monitor these events and heal. For this, Pow provides an Unsplit genserver that you can add to your supervision tree. Specifically, Unsplit is a Genserver that listens to Mnesia for :inconsistent_database events. A good way to add Unsplit to your application is to create a new module to supervise the unsplit module; for example, in the same folder as your application.ex file, add a mnesia_cluster_supervisor.ex file:

defmodule MyApp.MnesiaClusterSupervisor do
  use Supervisor

  def start_link(init_arg) do
    Supervisor.start_link(__MODULE__, init_arg, name: __MODULE__)
  end

  @impl true
  def init(_init_arg) do
    children = [
      {Pow.Store.Backend.MnesiaCache, extra_db_nodes: Node.list()},
      Pow.Store.Backend.MnesiaCache.Unsplit
    ]

    Supervisor.init(children, strategy: :one_for_one)
  end
end

Then, add your new ClusterSupervisor to your own app's supervision tree in application.ex:

def start(_type, _args) do

    topologies = [
      myapp: [
        strategy: Cluster.Strategy.Gossip
      ]
    ]
    # By default, the libcluster gossip occurs on port 45892, 
    # using the multicast address 230.1.1.251 to find other nodes

    children = [
      {Cluster.Supervisor, [topologies, [name: MyApp.ClusterSupervisor]]},
      MyApp.MnesiaClusterSupervisor,
      MyApp.Repo,
      MyAppWeb.Endpoint
    ] 

    opts = [strategy: :one_for_one, name: MyApp.Supervisor]
    Supervisor.start_link(children, opts)
  end

Now, when your app boots, it will have a supervisor to keep your node cluster intact and a supervisor to keep mnesia consistent, which relies on an unsplit genserver to automatically do any consistency healing involved.

Specifically, when your application starts it will initialize the MnesiaCache Generver, which starts the node's instance of mnesia and then connects to the other nodes in the cluster (internally, by calling :mnesia.change_config/2). Once connected to the other nodes (and their mnesia instances), the connecting node will make a remote procedure call to one of the other nodes (e.g., the one at the head of the list of the cluster nodes) to retrieve the mnesia table information. This table information is then used as a parameter when calling internally :mnesia.add_table_copy/3, which creates a local replicate of the cache table; or, does nothing if the table already exists. If, after this, an :inconsistent_database event is received, the relevant MnesiaCache.Unsplit methods will be called upon to heal the inconsistency.

Adding or removing nodes at run time

As long as at least one node in the :extra_db_nodes list is connected to the cluster, the MnesiaCache instances for all other nodes will automatically be connected and replicated. This makes it very easy to join clusters, since you won't have to update the config on the old nodes at all. And as long as the old nodes connect to at least one node that's in the cluster when restarting, it'll automatically connect to the new node as well, even without updating the :extra_db_nodes setting.

Using other libraries that also use the Mnesia instance

It's strongly recommended to take into account any libraries that will be using Mnesia for storage before using the Unsplit module.

A common example would be a job queue, where a potential solution to prevent data loss is to simply keep the job queue table on only one server instead of replicating it among all nodes. If you do this, then when a network partition occurs, the job queue table can be excluded from the tables to be flushed (and restored) during healing by setting :flush_tables to false (the default). This way, the Unsplit module can self-heal without affecting the job queue table.

Example: Using libcluster's `Cluster.Strategy.Gossip` strategy to discover 2 servers dynamically

We will illustrate this with two servers but you can expand this logic to any reasonable number that an erlang cluster can handle.

First, as described above, make sure that you have configured libcluster for the Gossip strategy in application.ex:

    topologies = [
      myapp: [
        strategy: Cluster.Strategy.Gossip
      ]
    ]
    # By default, the libcluster gossip occurs on port 45892, 
    # using the multicast address 230.1.1.251 to find other nodes

NOTE: If you are going to run both elixir nodes on the same physical machine, you will either have to copy your application code in two locations or change the mnesia data location before you launch the second node. However, if you are running the nodes on separate virtual machines then you can ignore the "edit" instructions in this section:

First, edit the database files location for your first node instance:

config :mnesia, dir: '/tmp/mnesia'

Then start node a:

MIX_ENV=dev PORT=4000 elixir --sname a -S mix phx.server

Next, edit the database files location for your second node instance:

config :mnesia, dir: '/tmp/mnesia2'

Then start node b:

MIX_ENV=dev PORT=4002 elixir --sname b -S mix phx.server

Now, if you run the above servers as backends with e.g., HAProxy or nginx as a round-robin load balancer, you will see that:

Http requests are routed in a balanced way to the two backends.
Node b accepts the Pow session credentials that were created if sign-in occurred on node a, and vice versa
you can kill and restart a or b without any downtime from the perspective of the browser session: When you restart a killed server Mnesia will heal (unsplit) the cached schema so that each backend node again has a copy of the current cache.

Example: Using libcluster's `Cluster.Strategy.Epmd` strategy to connect a pre-configured set of servers

Libcluster has a number of ways to detect nodes to connect. Gossip mode (above) is probably the easiest (and perhaps most flexible and reliable) approach. However you can also use Epmd:

topologies = [ myapp: [ strategy: Cluster.Strategy.Epmd, config: [hosts: [:"a@127.0.0.1", :"b@127.0.0.1"]], ] ]

If you edit your application.ex as above and try to start nodes a and b as described previously, you will get

[error] ** System NOT running to use fully qualified hostnames **

This tells us that we have to start our servers with --name rather than --sname.

So, as before, edit the database files location for your first node instance:

config :mnesia, dir: '/tmp/mnesia'

But this time, start node a with the full name:

MIX_ENV=dev PORT=4000 elixir --name a@127.0.0.1 -S mix phx.server

Next, edit the database files location for your second node instance:

config :mnesia, dir: '/tmp/mnesia2'

Then start node b:

MIX_ENV=dev PORT=4002 elixir --name b@127.0.0.1 -S mix phx.server

Again, if run behind a load balancer such as HAProxy, node b accepts the session credentials that were created if sign-in occurred on node a, and vice versa, and you can kill and restart a or b without any downtime from the perspective of the browser session.

Example: Use of the libcluster's `Cluster.Strategy.ErlangHosts` to connect servers specified in the .hosts.erlang file

To use libcluster's ErlangHosts strategy, we first need to create a file in the root working directory of our server(s) to specify the hosts (or IP addresses); because in this example we are testing both nodes on a single dev mode computer, this .hosts.erlang file has a single line:

'127.0.0.1'.

Then, in application.ex we configure libcluster as:

    topologies = [
      myapp: [
        strategy: Cluster.Strategy.ErlangHosts
      ]
    ]

Then, simply start the servers as in the Epmd example above (and remembering to change the write directory for the mnesia files; see above), i.e.,:

MIX_ENV=dev PORT=4000 elixir --name a@127.0.0.1 -S mix phx.server

and

MIX_ENV=dev PORT=4002 elixir --name b@127.0.0.1 -S mix phx.server

Because the .hosts.erlang file contains the IP to query for nodes, libcluster will find both node a and node b.

Once again, if run behind a load balancer such as HAProxy, node b accepts the session credentials that were created if sign-in occurred on node a, and vice versa, and you can kill and restart a or b without any downtime from the perspective of the browser session.

pow-auth / pow_site

Guide on cluster setup with the MnesiaCache #10

Mnesia cache store backend

Configuring Pow to use Mnesia: Compile-time Configuration

Installing libcluster

Adding or removing nodes at run time

Using other libraries that also use the Mnesia instance

Example: Using libcluster's `Cluster.Strategy.Gossip` strategy to discover 2 servers dynamically

Example: Using libcluster's `Cluster.Strategy.Epmd` strategy to connect a pre-configured set of servers

Example: Use of the libcluster's `Cluster.Strategy.ErlangHosts` to connect servers specified in the .hosts.erlang file

Further reading

pow-auth / pow_site

Guide on cluster setup with the MnesiaCache #10

Mnesia cache store backend

Configuring Pow to use Mnesia: Compile-time Configuration

Installing libcluster

Adding or removing nodes at run time

Using other libraries that also use the Mnesia instance

Example: Using libcluster's Cluster.Strategy.Gossip strategy to discover 2 servers dynamically

Example: Using libcluster's Cluster.Strategy.Epmd strategy to connect a pre-configured set of servers

Example: Use of the libcluster's Cluster.Strategy.ErlangHosts to connect servers specified in the .hosts.erlang file

Further reading

Example: Using libcluster's `Cluster.Strategy.Gossip` strategy to discover 2 servers dynamically

Example: Using libcluster's `Cluster.Strategy.Epmd` strategy to connect a pre-configured set of servers

Example: Use of the libcluster's `Cluster.Strategy.ErlangHosts` to connect servers specified in the .hosts.erlang file