RFC: Tribe - subtribes, policy, message encryption and calling remote plugins

jcooklin commented 8 years ago

This spec proposes the following features and enhancements to tribe.

Group members of the global tribe into named subtribes replacing what today is called agreements
Tribe policies to dynamically add nodes to subtribes
Enable calling remote plugins through sharing plugin meta via tribe
Encryption of tribe messages
Subtribes

To improve the operational experience what is currently known as an ‘agreement’ will be replaced with the term ‘tribe’ and actions that affect a tribe will be made explicit. Let’s start with an example.

Instead of creating a named agreement we will create a named tribe and and join all the members to it.

i1> snapctl tribe create core
i1> snapctl tribe join core i1
i1> snapctl tribe join core i2
i1> snapctl tribe join core i3
i1> snapctl tribe join core i4

Let’s imagine that i1 and i2 are somehow special and should have additional plugins loaded and tasks running beyond what is defined by the core tribe.

i1> snapctl tribe create storage
i1> snapctl tribe join storage i1
i1> snapctl tribe join storage i2

On our core tribe we will load an influxdb and psutil plugin and start a task capturing basic OS utilization details. On our storage tribe we will load the smart plugin and start a task capturing disk IO.

i4> snapctl plugin load influxdb --tribe core
i4> snapctl plugin load psutil --tribe core
i4> snapctl task create -t psutil-influx.json --tribe core
i4> snapctl plugin load smart --tribe storage
i4> snapctl plugin task create -t disk-io.json --tribe storage

Explicitly referring to the tribe when loading plugins or tasks reduces the risk that a user accidentally affects the entire tribe when they perform actions that are intended for an individual node. It also more effectively supports multiple potentially overlapping tribes.

Dynamically adding nodes to tribes through policies

When started in tribe mode, snap will establish a list of facts collected from the node it is running on as well as arbitrary key/value pairs provided on startup. These facts will then be used to evaluate tribe policies. When a policy is evaluated positively it will result in the node being added to a tribe.

Example facts: architecture, default_ipv4, default_ipv6, devices, os_dist, os_dist_release, os_dist_version, processor_type, processor_features, memtotal,…

Adding a policy:

i1> snapctl tribe policy create ubuntu_policy “{{os_dist}} == Ubuntu” --tribe core
i1> snapctl tribe policy create storage_policy “{{os_dist}} == Ubuntu && {{storage_tier}} == True” --tribe storage

When snap is started in tribe mode on an Ubuntu host with the policy above configured, it will automatically join the core tribe. If it has the fact storage_tier=True it will also be added to the storage tribe.

Other affected components

Global config will need to support arbitrary facts.
Process and publish through remote nodes (calling remote plugins)

Tribe enables the ability to reference a named tribe in the process and/or publish portion of a task definition. When a plugin is loaded on a node that is associated to a tribe, it will share the connection details to the global tribe as part of its metadata. This enables each snap node in the tribe to call remote plugins.

Other affected components

The scheduler will need to be extended to accept a task manifest with named tribe details
Control will need to deal with remote plugin subscriptions
Encrypt tribe messages

Protect tribe communication by supporting symmetric key encryption.

Tribe encryption will require the encryption key when starting
A helper for generating the encryption key will be provided (example: snapd keygen)
The global config will support the tribe encryption key

pittma commented 8 years ago

This looks good, but I think we should break out the encryption piece. In my mind, there are varying degrees of complexity on how this should work. This variance is dependent on how we manage the remote communication between nodes.

For example, I don't believe it to be safe to have a symmetric key sitting on the file system, if we are going to go the payload-route, i think it will require some 2-step handshaking, not unlike how snapd handles encryption between itself and plugins.

However, if these calls are going to be http based, we could just go the transport-route and use the existing https code in the API.

jcooklin commented 8 years ago

Tribe should/could also support high availability as suggested in #773.

obourdon commented 8 years ago

Very interesting. However some questions come to my mind:

how is the global tribe created, I guess just calling the 1st snapd instance with --tribe and then other instances with --tribe-seed ?
once a 1st subtribe has been created but no snap nodes assigned yet, would it still be possible to load plugins, tasks and workflows to it so that whenever a node joins it inherits those ?
will it be possible to load a plugin/create a task onto a specific node and choose whether this gets propagated or not on all tribes or none ?
will it be easily possible to extend policies (and potentially even dynamically like plugins) ?

Thanks

elemoine commented 8 years ago

Good questions Olivier :)

Another question from me, related to the dynamic addition of nodes to tribes through policies. I am wondering if we could go one step further and actually dynamically create the tribe if doesn't exist. E.g. snapd is started on a node, the tribe policy adds it to tribe "foo", tribe "foo" is not known so it is created (and gossip will take care of making the other nodes aware of that tribe). In this way, the initial step of pre-creating the tribes would be unecessary. I think have a use-case where this functionality could be very useful. Would that make some sense?

Thanks.

intelsdi-x / snap

RFC: Tribe - subtribes, policy, message encryption and calling remote plugins #640

Subtribes

Dynamically adding nodes to tribes through policies

Process and publish through remote nodes (calling remote plugins)

Encrypt tribe messages