intelsdi-x / snap

The open telemetry framework
http://snap-telemetry.io
Apache License 2.0
1.8k stars 295 forks source link

RFC: Tribe - subtribes, policy, message encryption and calling remote plugins #640

Open jcooklin opened 8 years ago

jcooklin commented 8 years ago

This spec proposes the following features and enhancements to tribe.

To improve the operational experience what is currently known as an ‘agreement’ will be replaced with the term ‘tribe’ and actions that affect a tribe will be made explicit. Let’s start with an example.

Instead of creating a named agreement we will create a named tribe and and join all the members to it.

i1> snapctl tribe create core
i1> snapctl tribe join core i1
i1> snapctl tribe join core i2
i1> snapctl tribe join core i3
i1> snapctl tribe join core i4

s1

Let’s imagine that i1 and i2 are somehow special and should have additional plugins loaded and tasks running beyond what is defined by the core tribe.

i1> snapctl tribe create storage
i1> snapctl tribe join storage i1
i1> snapctl tribe join storage i2

On our core tribe we will load an influxdb and psutil plugin and start a task capturing basic OS utilization details. On our storage tribe we will load the smart plugin and start a task capturing disk IO.

i4> snapctl plugin load influxdb --tribe core
i4> snapctl plugin load psutil --tribe core
i4> snapctl task create -t psutil-influx.json --tribe core
i4> snapctl plugin load smart --tribe storage
i4> snapctl plugin task create -t disk-io.json --tribe storage

Explicitly referring to the tribe when loading plugins or tasks reduces the risk that a user accidentally affects the entire tribe when they perform actions that are intended for an individual node. It also more effectively supports multiple potentially overlapping tribes.

s2

Dynamically adding nodes to tribes through policies

When started in tribe mode, snap will establish a list of facts collected from the node it is running on as well as arbitrary key/value pairs provided on startup. These facts will then be used to evaluate tribe policies. When a policy is evaluated positively it will result in the node being added to a tribe.

Example facts: architecture, default_ipv4, default_ipv6, devices, os_dist, os_dist_release, os_dist_version, processor_type, processor_features, memtotal,…

Adding a policy:

i1> snapctl tribe policy create ubuntu_policy “{{os_dist}} == Ubuntu” --tribe core
i1> snapctl tribe policy create storage_policy “{{os_dist}} == Ubuntu && {{storage_tier}} == True” --tribe storage

When snap is started in tribe mode on an Ubuntu host with the policy above configured, it will automatically join the core tribe. If it has the fact storage_tier=True it will also be added to the storage tribe.

Other affected components

Tribe enables the ability to reference a named tribe in the process and/or publish portion of a task definition. When a plugin is loaded on a node that is associated to a tribe, it will share the connection details to the global tribe as part of its metadata. This enables each snap node in the tribe to call remote plugins.

s3

Other affected components

Protect tribe communication by supporting symmetric key encryption.

pittma commented 8 years ago

This looks good, but I think we should break out the encryption piece. In my mind, there are varying degrees of complexity on how this should work. This variance is dependent on how we manage the remote communication between nodes.

For example, I don't believe it to be safe to have a symmetric key sitting on the file system, if we are going to go the payload-route, i think it will require some 2-step handshaking, not unlike how snapd handles encryption between itself and plugins.

However, if these calls are going to be http based, we could just go the transport-route and use the existing https code in the API.

jcooklin commented 8 years ago

Tribe should/could also support high availability as suggested in #773.

obourdon commented 8 years ago

Very interesting. However some questions come to my mind:

Thanks

elemoine commented 8 years ago

Good questions Olivier :)

Another question from me, related to the dynamic addition of nodes to tribes through policies. I am wondering if we could go one step further and actually dynamically create the tribe if doesn't exist. E.g. snapd is started on a node, the tribe policy adds it to tribe "foo", tribe "foo" is not known so it is created (and gossip will take care of making the other nodes aware of that tribe). In this way, the initial step of pre-creating the tribes would be unecessary. I think have a use-case where this functionality could be very useful. Would that make some sense?

Thanks.