snabbco / snabb

Snabb: Simple and fast packet networking
Apache License 2.0
2.96k stars 298 forks source link

The Snabb side of a NETCONF-managed system #987

Open wingo opened 8 years ago

wingo commented 8 years ago

Greetings, Snabbfolk. As you know we don't really have a story yet for NETCONF, something that a lot of operators would like. We discussed this back in January on #696, but since then there have been a few developments on all sides. Below is what we in Igalia see as a good way forward; it's the result of talking with operators and NETCONF experts, seeing how we can provide what they need while also remaining responsible for data plane performance.

YANG data modelling for Snabb

Big picture

                                +--------------------+
                                | on-disk persistent |
                                | configuration      |
                                +--------------------+
                                          /\
       +---------------+ counters +-------\/------+          +---------------+
       | dataplane     | shm tree | snabb config  |          | sysrepod      |
       |               |~~~~~~~~~~>               |   get    +----------+    |
       |               |          |               |   set    | snabb    |    |
       |          ring | commands |               <--listen--> config   |    |
IPv4<->= NIC0   buffer O<----------               |  delete  | endpoint |    |
       |               |          |               |   add    +----------+    |
IPv6<->= NIC1          ---------->O               |          |               |
       +---------------+   ack    +---------------+          +---------------+

The three boxes in a row represent processes. The on-disk persistent configuration is a directory somewhere on the file system, not a process.

The NETCONF agent, sysrepod in this case, communicates with the Snabb data plane by running the snabb config program. snabb config can read "state data" from the data plane via SHM counters, just as now. Configuration queries are served by reading the state of the data plane from disk, without actually talking to the data plane. The data plane doesn't touch disk of course. snabb config can also send commands to the data plane to modify the data plane's configuration. When the data plane ACKs the change, the snabb config program updates a persistent configuration store on disk so that the next snabb config invocation can know the state of the data plane without asking. Access to snabb config is serialized, so that the disk state is in sync with the in-memory state of the data plane. The protocol between snabb config and the data plane is private to the particular network function in question.

snabb config will speak a data model that is based on YANG, for minimum impedance mismatch between NETCONF agents and Snabb applications.

Initially, and perhaps for an indefinite period of time, there will be ad-hoc implementation for each YANG model supported by snabb config. In this design document we'll use the specific example of the lwAFTR, but I think these considerations apply to all Snabb network functions.

For the lwAFTR we plan to implement include support for the YANG module for lwAFTR systems in general and also support for a custom YANG module including all of the additional functionality of the Snabb lwAFTR that is not present in the standard model. We expect that this code can serve as a base for implementations of other YANG modules for other Snabb network functions such as the NFV virtual switch.

snabb config

snabb config is a Snabb command, which will be present in all Snabb applications. It has five sub-commands:

  1. snabb config get: read configuration or state data
  2. snabb config set: set configuration data
  3. snabb config add: add a YANG data node to a YANG container in the network function’s existing configuration; for example, add a new softwire to the binding table
  4. snabb config delete: delete an item from the network function’s existing configuration; for example, delete a binding table entry
  5. snabb config listen: provide an interface to the get/set/add/delete snabb config functionality over a persistent socket, to minimize per query cost

We expect that the snabb config get et al commands become the normal way that Snabb users interact with Snabb applications in an ad-hoc fashion via the command line, and that snabb config listen becomes the standard way that a NETCONF agent like Sysrepo interacts with a Snabb network function.

The snabb config commands are invoked in a uniform way:

snabb config [get|set|add|delete|listen] -m MODULE ID PATH [VALUE]

The -m MODULE option allows the caller to indicate the YANG module that they want to use, and for the purposes of the lwAFTR might be -m ietf-softwire.

ID identifies the particular Snabb instance to talk to, and PATH specifies a subset of the configuration tree to operate on.

The ID denotes a unique name for the data plane in question. As part of this work, we will add support for Snabb programs to be invoked with names. No two Snabb programs on one machine can have the same name, and it will be possible to enumerate the named Snabb programs that are running on a machine, find their PIDs and their configuration, send them messages over a private channel to update their configuration, and persist their configuration in named directories in the file system.

In the following descriptions and examples, we continue to focus on the ietf-softwire YANG module, defined in https://tools.ietf.org/html/draft-sun-softwire-yang-05. The tree of data that we need to export is defined in in the draft-sun-softwire-yang-05 Internet Draft 5 , and goes like this:

 +--rw softwire-config
 |  +--rw binding
 |     +--rw br
 |        +--rw br-instances
 |           +--rw br-instance* [id]
 |              +--rw binding-table-versioning
 |              |  +--rw binding-table-version?  uint64
 |              |  +--rw binding-table-date?     yang:date-and-time
 |              +--rw id                         uint32
 |              +--rw name?                      string
 |              +--rw softwire-num-threshold     uint32
 |              +--rw tunnel-payload-mtu         uint16
 |              +--rw tunnel-path-mru            uint16
 |              +--rw binding-table
 |                 +--rw binding-entry* [binding-ipv6info]
 |                    +--rw binding-ipv6info     union
 |                    +--rw binding-ipv4-addr    inet:ipv4-address
 |                    +--rw port-set
 |                    |  +--rw psid-offset       uint8
 |                    |  +--rw psid-len          uint8
 |                    |  +--rw psid              uint16
 |                    +--rw br-ipv6-addr         inet:ipv6-address
 |                    +--rw lifetime?            uint32
 +--ro softwire-state
    +--ro binding
       +--ro br
          +--ro br-instances
             +--ro br-instance* [id]
                +--ro id                         uint32
                +--ro name?                      string
                +--ro sentPacket?                yang:zero-based-counter64
                +--ro sentByte?                  yang:zero-based-counter64
                +--ro rcvdPacket?                yang:zero-based-counter64
                +--ro rcvdByte?                  yang:zero-based-counter64
                +--ro droppedPacket?             yang:zero-based-counter64
                +--ro droppedByte?               yang:zero-based-counter64
                +--ro active-softwire-num?       uint32
                +--ro binding-table
                   +--ro binding-entry* [binding-ipv6info]
                      +--ro binding-ipv6info     union
                      +--ro active?              boolean

snabb config will operate on exactly this tree of data.

For queries, here is an example. The lines prefixed by “$” are the commands and what follows is what is printed out on the console.

$ snabb config get -m ietf-softwire ID /softwire-state/binding/br/br-instances/0/
br-instance {
  id 0;
  name "foo";
  sentPacket 100;
  sentByte 100;
  // ...
  binding-table {
    binding-entry {
      binding-ipv6info 127:10:20:30:40:50:60:128;
      active true;
    }
    binding-entry {
      binding-ipv6info 127:24:35:46:57:68:79:128;
      active true;
    }
  }
}

snabb config will begin by locating and obtaining exclusive access to the persistent configuration directory. Once it has the lock, it can read in the state of that data-plane from that directory; the persistent configuration directory is managed by snabb config in a serial fashion, so the copy on disk will always reflect the data plane’s internal state. Any snabb config get operation can then be serviced entirely from the persistent state in that directory, without involving the data plane.

Changes to the data plane (via a snabb config set operation, for example) will be encoded as messages and sent over a shared memory channel to the data plane. When the data plane indicates that it has processed the change, snabb config will update the on-disk state. It is possible to have many changes in flight; the invariant is that the state on disk reflects the state of the data plane only when snabb config has acquired exclusive access.

It would be possible to standardize a mapping of configuration and state data to persistent objects in the file system, and how that relates to YANG modules in general. However that approach would require a lot of discussion with upstream Snabb and other network function authors. Instead we plan to require code support for each YANG module that snabb config supports, including ietf-softwire, and have that Lua code have a private interface to the data plane. In that way we can move forward without the risk of premature standardization.

From the user’s perspective, configuration and state data will be represented as text with a simple grammar. The NETCONF agent will have a parser and a serializer for this grammar, which is specified by the following ABNF fragment:

Data = LeafData / ContainerData
LeafData = Key Value ";"
ContainerData = Key "{" *Data "}"
Key = yang:string
Value = yang:string

The interpretation of the value string depends on the type of the key. Snabb would know the key types and always make sure to serialize and parse integers, IPv6 addresses, IPv4 addresses, and strings in the normal way. Likewise the NETCONF agent knows these types because of the YANG schemas. The containers and keys map to paths in the obvious way. Trailing slashes are meaningless. Whitespace in Data productions is insignificant.

For example, here we see that lists are keyed in the path by their key (the ”binding-entry* [binding-ipv6info]” in the tree above indicates that the binding-ipv6info is the key):

$ snabb config get -m ietf-softwire ID /softwire-state/binding/br/br-instances/0/binding-table/127:10:20:30:40:50:60:128/active
active true;

Here we show a configuration query:

$ snabb config get -m ietf-softwire ID /softwire-config/binding/br/br-instances/0/binding-table/127:10:20:30:40:50:60:128
binding-entry {
  binding-ipv6info 127:10:20:30:40:50:60:128;
  binding-ipv4-addr 178.79.150.1;
  port-set { psid-offset 0; psid 0; psid-len 0; }
  br-ipv6-addr 8:9:a:b:c:d:e:f;
}

Finally, here we print everything, from the root:

$ snabb config get -m ietf-softwire ID /
softwire-config {
  // everything....
}

For configuration, we have:

$ snabb config set -m ietf-softwire ID /softwire-config/binding/br/br-instances/0/binding-table/127:10:20:30:40:50:60:128/port-set/psid 1

There is no output if the command succeeds, as the value is given as the last argument. If no last argument is given, snabb config will read it from stdin:

$ snabb config set -m ietf-softwire ID /softwire-config/binding/br/br-instances/0/binding-table/127:10:20:30:40:50:60:128
binding-entry {
  binding-ipv6info 127:10:20:30:40:50:60:128;
  binding-ipv4-addr 178.79.150.2;
  port-set { offset 0; psid 0; psid-len 0; }
  br-ipv6-addr 8:9:a:b:c:d:e:f;
}

snabb config can also delete a key, but only on containers that can have a variable number of children:

$ snabb config delete -m ietf-softwire ID /softwire-config/binding/br/br-instances/0/binding-table/127:10:20:30:40:50:60:128

Or add a new one:

$ snabb config add -m ietf-softwire ID /softwire-config/binding/br/br-instances/0/binding-table
binding-entry {
binding-ipv6info 127:14:25:36:47:58:69:128;
binding-ipv4-addr 178.79.150.3;
port-set { offset 0; psid 4; psid-len 6; }
br-ipv6-addr 1E:2:2:2:2:2:2:af;
}

The listen interface will support all of these operations with a simple JSON protocol. Each request will be one JSON object with the following properties:

Each response from the server will also be one JSON object, with the following properties:

Error messages may have additional properties which can help diagnose the reason for the error. These properties will be defined in the future.

$ snabb config listen -m ietf-softwire ID
{ "counter": 0, "verb": "get", "path": "/softwire-state/binding/br/br-instances/0/binding-table/127:10:20:30:40:50:60:128/active" }
{ "counter": 1, "verb": "get", "path": "/softwire-state/binding/br/br-instances/0/binding-table/127:24:35:46:57:68:79:128/active" }
{ "counter": 0, "status": "ok", "value: "active true;" }
{ "counter": 1, "status": "ok", "value: "active true;" }

The above transcript indicates that requests may be pipelined: the client to snabb config listen may make multiple requests without waiting for responses. (For clarity, the first two JSON objects in the above transcript were entered by the user, in the console in this case; the second two are printed out by snabb config in response.)

Since the snabb config listen program will acquire exclusive access to the data plane, there is no need to provide for notifications of changes made by other configuration clients.

The snabb config integration for the lwAFTR will have special support for the binding table, to allow snabb config to read the compiled binding table in-place without requiring round-trips to the data plane. Likewise we will ensure that the binding table can be updated in-place, without requiring the compilation and reload of an entirely new binding table. This will minimize the cost of adding binding table entries. Having custom code for each YANG module allows us to make this optimization in a straightforward way, as appropriate to each network function.

Additionally, there are other configuration options that are currently in the lwAFTR config file which are not covered by the ietf-softwire module: ICMP policies, ingress/egress filters, fragmentation controls, and the like. We will also include a custom YANG schema for these controls, to expose the full Snabb lwAFTR configuration space to compatible NETCONF agents. We plan on switching the textual form of Snabb lwAFTR configuration over to this new format, defined by a YANG schema, so that we have the same configuration syntax for the lwAFTR as a whole as we do when using snabb config set and the like.

When running a lwAFTR instance by name, it will be possible to omit the configuration and instead use the stored persistent configuration with the given name, including any modifications that may have been made at runtime to that previous named lwAFTR instance via snabb config.

Status and next steps

Comments welcome! If this plan works for everyone, we'd be happy to maintain this upstream together with other interested Snabb contributors. Our timeline is to have all of this implemented in a performant, reliable way for the lwAFTR by about November or so. Let's talk about it :)

mwiget commented 8 years ago

Hi @wingo. Great work!

What is the exact content of the yang module specified via -m in snabb config add -m ietf-softwire? Do you parse an actual yang schema, extracted from the rfc?

Specifically on softwire, the draft doesn't have all the parameters to fully provision lwaftr. I ended up augmenting and deviate the schema (WIP):

https://github.com/mwiget/vmxlwaftr/blob/master/yang/jnx-softwire.yang https://github.com/mwiget/vmxlwaftr/blob/master/yang/jnx-softwire-dev.yang

The resulting config looks then like this, covering two snabb instances (WIP), with the second one using a global binding table read in from a file:

https://github.com/mwiget/vmxlwaftr/blob/master/tests/lwaftr1.txt#L158-L247

Its pretty much a match now to what lwaftr requires in the config file, augmented by mtu's, vlan's and IPv4 and IPv6 addresses plus cache refresh timers used by snabbvmx.

lukego commented 8 years ago

@wingo Awesome!

I am really looking forward to having all of the Snabb configuration and operational data modeled in YANG.

The first topic I would like to thrash out here is how to ensure reliability i.e. what classes of error are we potentially vulnerable to and which ones can we eliminate by design.

The risk I see is that snabb config is implementing a database, the database state is being managed imperatively by a series of set operations, and the precious master copy of the configuration only exists in an internal format that depends on the exact software version(s) that processed the commands. In this case we might have to deal with all the classic database issues... backup/restore procedures, schema evolution when upgrading and downgrading the software, recovery from bugs in application-specific encoding/decoding routines, etc.

Do those concerns make sense? If so then is there some way that we could take some of them off the table?

I see the nix model as a potential inspiration here. Nix has a stable master format (source code), a means of converting into an operational format (a compiler), and a cache of precompiled objects (the store). You can always recover your operational objects if they are corrupted somehow. You also know when it is safe to reuse operational objects (created from the same source and same compiler) vs when they need to be rebuilt. You don't have to worry about backing up operational objects (they are reproducible), about changes in the operational format (each software version can have a completely different one), or about trashing operational objects due to software bugs (switch to a new software version and the bad state is automatically discarded).

Should we consider doing something similar? Might snabb config maintain both a master configuration in a stable format and an (optional) store of "compiled" configurations that are treated as disposable and version-specific?

Other thought to ponder: I wonder whether snabb config could also backend other operation-and-maintenance tools e.g. snabb top and so on. This might clash with the exclusivity requirement on snabb config listen?

lukego commented 8 years ago

Restated more concisely: Can we and should we create these two invariants: that the master configuration always exists in a well-defined format, and that when a process starts it always uses an operational configuration that is identical what the running software would bootstrap from the master configuration file?

This way your configuration is never corrupted and you never have to rm -rf /var/run/snabb/foo as part of your routine troubleshooting activities.

(Related anecdote: I have recently struggled with an error from a network card for over a week. Have tried all kinds of things to make it work including PCI resets, machine reboots, etc. Turns out that what I needed to do is remove the power from the server. Presumably there was some bad operational state in the card firmware and none of the simple reset methods genuinely return to a pristine state. This is annoying to me as a user: now every time I have a hard-to-resolve error I will need to cold-start the machine because I don't have a simpler method that I can trust. Contrast this with the beloved 82599 where I can fully reset the card to a pristine state with a single MMIO register poke.)

mwiget commented 8 years ago

@lukego on your anecdote. I've suffered also from 82599 cards suddenly not working properly and only a cold boot resolved it. In my case I typically got packets from the NIC shifted by 2 bytes, rendering them unusable. Strange that a full reset doesn't solve this. Probably experienced this 4 or 5 times so far, though its a while since it last happened.

lukego commented 8 years ago

Process structure thought:

In this architecture we have a "configuration server" process that is interfacing between the dataplane (private interface) and configuration agents (public interface). The invariants we want to maintain are that there is at most one such process (exclusive access to configuration state) and the process must run exactly the same software version as the dataplane (interface between them is private).

Perhaps this would be easier if the configuration server process were a sibling of the dataplane rather than something that is invoked separately?

  1. Dataplane fork()s the configuration server during startup.
  2. snabb config is a thin client that issues commands to the server via a unix socket.

This way we ensure that there is always exactly one configuration server running and that its software version is identical to the dataplane that it is controlling.

Having a persistent singleton configuration server may also make it easier to support multiple simultaneous clients e.g. sysrepo and snabb top both accessing YANG-modelled data via the same interface at the same time (but in non-conflicting ways).

(Since #930 we are already forking a process to cleanup the shm objects upon shutdown. Perhaps this process could evolve to handle most of the snabb config functionality?)

plajjan commented 8 years ago

@lukego FYI there was a discussion on slack around this. I think my viewpoint aligned rather well with yours, like not implementing a complete database. Avoid version specific binary data blobs and so forth. I didn't feel this resonated with everyone though.

wingo commented 7 years ago

I like the idea of the config process being a sibling to the dataplane. Incidentally the supervisor/worker/config process separation probably works even better with a --cpu argument instead of numactl et al, because you only want to have the data plane bound to the CPU and not the config/supervisor processes, I think.

lukego commented 7 years ago

@wingo I have implemented the "manager process spawns CPU-locked workers" model over on #1021 as a work in progress. Plan is to use this for supporting snabbnfv with many processes sharing a 100G Mellanox NIC.

lukego commented 7 years ago

@wingo (Quirk of the Mellanox driver is that it wants to run one app in the manager process to configure the NIC and then I/O apps in the worker processes to attach to tx/rx queues.)

plajjan commented 7 years ago

What are the data formats from/to snabb config ? JSON is mentioned but some of the stats / config examples above is another format.

I recently wrote a quick hack integration between sysrepo and SnabbAFTR and noticed that the current binding table config is in a proprietary format which meant I had to write code to write my data in that format. What's the rationale for creating a new format when there are so many other formats readily available (and that already have libs for many many other languages). Can you clarify what the design is here going forward?

lukego commented 7 years ago

One apparent difference between the model here vs on #1021 is that I imagine snabb config being a "thin client" i.e. a very simply and generic program that sends a request to a long-running Snabb "manager" process to do the real work like operating on configurations and communicate updates to workers. Dunno what ally'all think about that.

wingo commented 7 years ago

@lukego very much agreed, the version skew problems between a long-running data plane and a sometimes-updated-in-strange-ways "snabb" binary means that a long-running mostly-idle config process is guaranteed (modulo fork/exec race conditions at start!) to speak the same low-level language as the data plane, and that a simple protocol between the config client and the manager will last.

wingo commented 7 years ago

@kll not ignoring you, just that the answer is a bit long. to specify an entire configuration it will look similar to the vmx's configuration (https://raw.githubusercontent.com/mwiget/vmxlwaftr/igalia/tests/lwaftr3.txt). specifically for the lwaftr we have to have a custom yang module because our tunables aren't precisely the same as the ones in the IETF draft spec so it will look like http://sprunge.us/QUQJ we think. we have a yang schema, to publish shortly.

plajjan commented 7 years ago

Ok so that's "JUNOS style config" - I don't think there's a standard for that. Why do you want to use this over say JSON - which does have libs for a tonne of languages and there's a standard for how to use JSON for YANG modelled data.

This is an orthogonal issue to which YANG model you are using. I am interested in hearing what tuneables don't align that forces you to write your own model. Have you discussed this with Ian?

wingo commented 7 years ago

We will support both the standard and an extended yang model. I am not sure why you are asking these questions -- evidently you have generated lwaftr configurations and you are aware that there are settings that do not have corresponding knobs in the standard yang model. I have discussed this with Ian, yes, at length, over many months. After six months of discussion I am sure you can understand that I am happy to move beyond design choices and get into implementation :)

plajjan commented 7 years ago

There are two topics at hand;

For the first one, I am asking because I want to know why you want to use what is seemingly a not so common data format instead of using something that is very popular and wirespread. Yes, I have generated lwaftr configuration and that's when I first noticed you had implemented your own data format. Before that I had assumed it was JSON but that was an obviously an incorrect assumption on my part.

For the second one I am asking because I want Ian, who as I understand it, took part in writing that YANG model so I imagine feedback would be important - like should an updated RFC be released... I implemented my own YANG model that closely mimicks the binding table config file, so I never noticed discrepancies with the standard YANG model.

wingo commented 7 years ago

I think in the first question you are conflating syntax and data model. The lwaftr currently uses an ad-hoc configuration syntax, and an ad-hoc data model. It is ad-hoc because adhering to standards is costly and not the first thing you build. We are going to migrate the lwaftr to a YANG data model, and try to do so in a way that other Snabb programs will want to migrate too. Given the model, the syntax is the least important thing. I have an agreement with my users to deliver the syntax described above; but expressing it as XML or json is possible to build.

About data models. The set of configurables in the standard model is not the same as the set of configurables for the lwaftr; check the lwaftr documentation and compare to ietf-softwire. The lwaftr configurables are largely but not entirely a superset; there are schema-valid configurations in the ietf-softwire model that are not valid snabb lwaftr configurations.

We will deliver both a custom and the standard yang model. The custom one will be delivered first because it reflects how the lwaftr operates. The IETF chose to standardize a suboptimal data model and that will have to be shimmed on afterwards. I have raised these concerns with the softwire working group.

plajjan commented 7 years ago

I was trying my very best not to conflate, which is why I said "data format" and not "model". I meant a choice like JSON vs XML. I understand your reasoning for non-standard syntax/model and I have no problem with that nor did I try to complain about it. I was asking about the data format. Like why not do the equivalent of "require ("json"); json.decode(config)". It irked me a bit since I got to be producer of config in a, to me, unknown format.

Excellent on the YANG model topic. Exactly the response I was looking for. Thank you! :) I am not keeping up to date internally on all the details of the AFTR so this kind of information doesn't flow naturally to me, which is why I'm asking you about it. That means you might get seemingly weird question - apologies for that.

wingo commented 7 years ago

New design document here, updated for developments over the last few months and put in the form of documentation: https://github.com/Igalia/snabb/tree/yang/src/program/config/README.md

That document links to the new internals document: https://github.com/Igalia/snabb/blob/yang/src/apps/config/README.md

This will make it to the lwaftr branch shortly and upstream for the next release. Some parts are still falling into place, but the document describes where we are going.