openconfig / gnmi

gRPC Network Management Interface
Apache License 2.0
459 stars 196 forks source link

Config Subscription gNMI Extension #169

Open hellt opened 6 months ago

hellt commented 6 months ago

Problem Statement

Performing configuration management and handling configuration drift is one of the main features of a higher-level management system or orchestrator. The configuration management tasks are not concerned about the state data and focus on effective retrieval and push of the configuration values.

Thus, having a synchronized configuration view between the management system and the network elements is key to enabling robust and near real-time configuration management.
To enable this synchronization of configuration data, gNMI Subscribe RPC can be used. The bidirectional streaming nature of this RPC enables fast and reliable sync between the management system and the devices it manages.

Unfortunately, gNMI Subscribe RPC does not have an embedded mechanism to stream updates for the configuration values only as opposed to the Get RPC, which makes this RPC rather ineffective on YANG schemas that do not employ a separation of config and state elements by using distinct containers.

This proposal introduces the Config Subscription extension that allows clients to indicate to servers that they are interested in configuration values only.

Specification

A new ConfigSubscription extension is added to the extensions list and modeled as follows:

// ConfigSubscription extension allows clients to subscribe to configuration
// schema nodes only.
message ConfigSubscription {
  oneof action {
    // ConfigSubscriptionStart is sent by the server in the SubscribeRequest
    ConfigSubscriptionStart start = 1;
    // ConfigSubscriptionSyncDone is sent by the server in the SubscribeResponse
    ConfigSubscriptionSyncDone sync_done = 2;
  }
}

// ConfigSubscriptionStart is used to indicate to a target that for a given set
// of paths in the SubscribeRequest, the client wishes to receive updates
// for the configuration schema nodes only.
message ConfigSubscriptionStart {}

// ConfigSubscriptionSyncDone is sent by the server in the SubscribeResponse
// after all the updates for the configuration schema nodes have been sent.
message ConfigSubscriptionSyncDone {
  // ID of a commit confirm operation as assigned by the client
  // see Commit Confirm extension for more details.
  string commit_confirm_id = 1;
  // ID of a commit as might be assigned by the server
  // when registering a commit operation.
  string server_commit_id = 2;
}

The ConfigSubscription extension message is meant to be sent in SubscribeRequest message with the action of ConfigSubscriptionStart and in SubscribeResponse message with the action of ConfigSubscriptionSyncDone.

ConfigSubscriptionStart

The ConfigSubscription message has a oneof action field that is used to decouple request and response messages. When a client wants to initiate a config subscription, it sends a SubscribeRequest message with the ConfigSubscriptionStart action.

ConfigSubscriptionSyncDone

The server sends the ConfigSubscriptionSyncDone message in the SubscribeResponse message after all the updates for the configuration schema nodes have been sent. This message indicates to the client that the server has sent all the updates for the configuration schema nodes and the client can now start processing the updates knowing that it received the full configuration set.

With the commit_confirm_id and/or server_commit_id fields, the ConfigSubscriptionSyncDone clearly sets the boundary of the configuration changes of a given commit operation. This allows a management system to

The ConfigSubscriptionSyncDone message has two fields:

Workflow

Scenario 1. Configuration changes without Commit Confirm

In this scenario, the following sequence of events happens:

  1. The client subscribes to path P1 with the ConfigSubscription extension present with the action ConfigSubscriptionStart.
  2. The server processes the subscription request as usual but will only send updates for the configuration schema nodes under the path P1.
  3. The client sends a Set RPC with the configuration changes to the path P1 and without the CommitConfirm extension.
  4. The server processes the Set RPC as usual and sends the updates for the configuration schema nodes under the path P1.
  5. As all the configuration updates are sent, the server sends the ConfigSubscriptionSyncDone message to the client in a SubscribeResponse message.

Scenario 2. Configuration changes with Commit Confirm

  1. The client subscribes to the path P1 with the ConfigSubscription extension present with the action ConfigSubscriptionStart.
  2. The server processes the subscription request as usual but will only send updates for the configuration schema nodes under the path P1.
  3. The client sends a Set RPC with the configuration changes to the path P1 and with the CommitConfirm extension present.
  4. The server processes the Set RPC as usual and sends the updates for the configuration schema nodes under the path P1.
  5. As all the configuration updates are sent, the server sends the ConfigSubscriptionSyncDone message to the client in a SubscribeResponse message.
  6. When the client sends the commit confirm message and is processed by the server, the latter does not send any extra SubscribeResponse messages with the ConfigSubscriptionSyncDone message.

Scenario 3. Configuration changes with Commit Confirm and rollback/cancellation

  1. The client subscribes to path P1 with the ConfigSubscription extension present with the action ConfigSubscriptionStart.
  2. The server processes the subscription request as usual but will only send updates for the configuration schema nodes under the path P1.
  3. The client sends a Set RPC with the configuration changes to the path P1 and with the CommitConfirm extension present.
  4. The server processes the Set RPC as usual and sends the updates for the configuration schema nodes under the path P1.
  5. As all the configuration updates are sent, the server sends the ConfigSubscriptionSyncDone message to the client in a SubscribeResponse message.
  6. When the commit confirmed rollback timer expires or a commit cancel message is sent, the server
    1. rolls back the configuration changes as per the Commit Confirm extension specification
    2. sends the new configuration updates for the path P1 as the configuration has changed/reverted
    3. sends the ConfigSubscriptionSyncDone message to the client in a SubscribeResponse message.
robshakir commented 6 months ago

Thanks for the contribution.

It seems like there are two separate proposals here (as I see it, YMMV). The first is to allow there to be a way to filter to "config" nodes in Subscribe, the second is a way to show some "sync" state within a particular subscription. The latter is important because today we only have one idea of "sync" in Subscribe, which is about whether the target has actually updated all paths matching the subscription, and we are now in some steady state.

Based on these two, I have two sets of questions.

On the base "subscribe to config" idea:

On the "synchronisation idea":

It feels useful to me to understand a bit about what use cases a theoretical system using this approach does w.r.t a network operation.

dplore commented 1 month ago

@hellt please also add a markdown doc for the complete documentation with use cases (as you have in the comments here). This should go into the gnmi refererence repo (you can link that PR to this PR to make it easier for us to track).

dplore commented 1 month ago

This was reviewed in July 16, 2024 OC operators meeting. No outstanding comments at the moment. Operators will review over the next week. Setting last call for July 30, 2024.

ubaumann commented 1 month ago

+1 for this extension

Configuration drift and reconciliation is a critical topic if we want to bring automated network operations to the next level. Nice work.

MrHamel commented 1 month ago

Why not have the ConfigSubscription report when a "commit confirm(ed)" is happening? An NMS would be completely blind to that specific kind of action taking place unless they specify that "persist ID", and it wouldn't know if a change is rolled back other than compare the delta changes with its own database over time.

jarrodb commented 1 month ago

+1 for this! Thanks for submitting!

mcanatella commented 1 month ago

+1 I definitely need this

yunheL commented 1 month ago

+1 thanks for the detailed problem statement and use case, I think this is going to be useful

wendall-robinson commented 1 month ago

+1 I like having this kind of granularity

ezobn commented 1 month ago

+1. F.E., Cisco NSO currently is using sync-from requests to initialize sync from devices to internal XML DB (CDB). And this feature is very useful, because you can subscribe to the needed xpath for the changes. It by itself create sync-event based fidelity of living XML DB for configs of the whole network. It gives you a nice 100% integrity of the Network, as the source of truth. You can subscribe of the netconf-config-change events indeed, but this requires to handle a lot of tcp sessions, because of nature of netconf. But if, devices, can push this information, by them-selfs, because you can subscribe to the whole config change, and even give you indicators that you get everything! It sounds very good. if, everybody starts to support this, we can have 100% integrity of the network configuration DB. That itself, one of the building block of all automation journey...

ryanmerolle commented 1 month ago

This backup and config diffing vs intended is a popular workflow in the network operator community. It would make so much sense to implement like @hellt proposed.

dplore commented 1 month ago

@robshakir all your comments have responses. Can you take another look?