openconfig / reference

This repository contains reference implementations, specifications and tooling related to OpenConfig-based network management.
Apache License 2.0
155 stars 88 forks source link

Is the sample or onchange mode, a gNMI client's prerogative? #163

Closed ashu-ciena closed 2 years ago

ashu-ciena commented 2 years ago

Hello gNMI Team,

After reading the gNMI specification for streaming modes, I have the following queries:

  1. Sample / On-change mode - Does the Xpath to which the gNMI client subscribes, have to be supported by the gNMI server for both sample as well as on-change modes of streaming? Is it the client which decides whether the streaming response be delivered on an on-change basis or periodic basis as a sample subscription? Is it not the device that supports the XPath, that has a right to design the Xpath as either on-change or sample sensor. I had expected, the device would say "No" to an on-change subscription if the sensor XPath is designed as a sample sensor in the device and vice versa. Hence, should return an error to the client. Let me know if my understanding is not aligned with the specification.

  2. target-defined mode - Is this the only option when the target has a right to decide the subscription mode based on the type of information the XPath is referring to? I had expected the use case is only to allow a bunch of XPath subscriptions in a single subscription request with target-defined mode, to allow the device to create subscription for each one of them as either sample or on-change.

  3. Referring to the text for on-change sensor --"_For all ONCHANGE subscriptions, the target MUST first generate updates for all paths that match the subscription path(s), and transmit them. Following this initial set of updates, updated values SHOULD only be transmitted when their value changes." -- if we have an XPath a/b/ and we have c and d as underneath leaf nodes under b, then if the value of only c is changed after the initial transmit, should the outgoing update response carry the values of both c and d(not changed) or only c leaf?

4.. Do you think a sample subscription with suppress_redundant flag enabled nets the same effect as an on-change subscription for the same XPath? Conversely, does an on-change subscription with heartbeat_interval nets have the same effect as a sample subscription?

Can you please clarify?

cc: @gcsl @aashaikh

gcsl commented 2 years ago

The following is copied from a mailing list thread that I did not realize had the exact same questions. Posting here for broader visibility.

  1. A client can request either mode for any path, but as you suggest, if a device doesn't support a particular mode for a path, it should return an error. The expectation is that all paths are supported as Sample, though the period may be limited by a given device. On-change refers to event-driven-telemetry, whereby an update should be delivered immediately upon change, e.g. an interface oper-status. The goal with streaming telemetry is to move toward event-driven-telemetry anywhere it can be supported as this enables the lowest-latency and lowest-throughput way to keep clients synchronized with device state.

  2. Target-defined is a shorthand for "as fast as possible" whereby all data supported as event-driven-telemetry is delivered on-change and the remaining sample data is delivered with the smallest period possible.

Follow Up Questions:

A. To ensure the delivery in smallest period possible, we have an option to specify sample_interval as 0. My understanding is this option can be exercised with sample as well as target-defined mode. Is it correct understanding?

A sample interval of 0 is indistinguishable from the sample_interval not being set as 0 is the default value in proto. Thus, this would seem to be counter to the requirement that sample_interval be supplied for sample_mode.

B. The spec also says with sample mode, sample_interval must be specified, so with target-defined mode, can this be discounted? Shall we honour the subscription request without sample_interval flag in target-defined mode or reject with an error?

Target defined was added specifically to handle both the sample as fast as possible for sample data, but deliver anything as event-driven that can be. E.g. for the interfaces model, we would expect that counters are sampled but state changes in admin-status, oper-status, last-change, etc. could all be event-driven. For target-defined, the sample_interval is not applicable.

  1. Follow Up Question: Did not receive the answer to the above query on what to send for Xpaths a/b/c and a/b/d , when the subscription is for a/b only in on-change mode and the value of c changes? Should the "update" packet carry all the nodes information under a/b because that is the Xpath user has subscribed for? What should be done if suppress_redundant with on-change mode is specified in this use case? We had expected that suppress_redundant is meant to be used with sample subscription only.

The goal would be to support delivering only the changed leaves regardless of the level of subscription for any on-change subscriptions as well as any sampled or target defined subscriptions with suppress_redundant set. E.g. An on-change subscription to interface config should only return updates to the changed config leaves, thus if only the description is changed, only the description should be sent. A sampled or target defined subscription with suppress_redundant set to "interfaces" should deliver all counters initially, but if the error_counters are not incrementing they can be elided in a given sample. The same would apply to QoS queues. Some queues may not be configured to be in use and thus their zero value counters would be sent during the sync, but not updated afterwards, (unless the queues were subsequently configured and their counters begin incrementing). We have enabled this suppress_redundant feature by default in the collector code published in the gnmi repo which is how we operate our collection in our production environment to simulate this behavior for all devices even if they don't yet support it natively. Glancing at a sample device I see that we are suppressing more than 70% of its updates from downstream clients.

Suppress redundant should have no effect on on-change subscriptions as they should only be sent when they are changed and thus already have this behavior. Suppress redundant applies to anything sampled either in sample mode or in target defined mode.

hellt commented 2 years ago

@gcsl was there ever a wish for Network Elements to support suppress_redundant? I heard that it was designed primarily for SW collectors acting as a gnmi proxy where mem/cpu is not a (largely) constrained resource.

gcsl commented 2 years ago

Suppress redundant is not designed primarily for implementation outside a network element. It was designed generically to enable reduction in gNMI traffic for counters that rarely change. Several vendors had concerns about sending all counters all the time. There are many error counters that are rare and the periodic retransmission of unchanged counter values takes CPU/Memory and Network resources that could be better spent on conveying changing state.

vikas-ciena commented 1 year ago

@gcsl I'm bit confused between your reply to @ashu-ciena's query :-

A. To ensure the delivery in smallest period possible, we have an option to specify sample_interval as 0. My understanding is this option can be exercised with sample as well as target-defined mode. Is it correct understanding?

A sample interval of 0 is indistinguishable from the sample_interval not being set as 0 is the default value in proto. Thus, this would seem to be counter to the requirement that sample_interval be supplied for sample_mode.

and gNMI spec (3.5.1.5.2 STREAM Subscriptions ):-

Sampled (SAMPLE) - a subscription that is defined to be sampled MUST be specified along with a sample_interval encoded as an unsigned 64-bit integer representing nanoseconds between samples. The value of the data item(s) MUST be sent once per sample interval to the client. If the target is unable to support the desired sample_interval it MUST reject the subscription by closing the Subscribe RPC specifying an InvalidArgument (3) error code. If the sample_interval is set to 0, the target MUST create the subscription and send the data with the lowest interval possible for the target.

If sample interval of 0 is indistinguishable from the sample_interval not being set , can you please clarify "If the sample_interval is set to 0, the target MUST create the subscription and send the data with the lowest interval possible for the target".? IMO not setting sample_interval and setting sample_interval 0 are two different scenario.

gcsl commented 1 year ago

The wording of the specification does not precisely match the details of the proto3 implementation in that there is over-specification for a condition that is not possible with proto3.

https://github.com/openconfig/gnmi/blob/master/proto/gnmi/gnmi.proto#L16

The English description, as you suggest, differentiates between unset and setting to 0. The protocol buffer libraries make no distinction when proto3 is specified and thus we can consider that the unset condition is not possible with this version.

proto2 on the other hand does differentiate between a default value and an unset value. At the time of writing the specification there was a lot of debate as to whether to use proto2 or proto3 for various reasons and the specification covers the possibility of an unset value generally, although potentially at the detriment of absolute clarity given the ultimate choice of proto3.

ashu-ciena commented 1 year ago

@gcsl Thanks for the clarification. So, are we concluding here that we need to consider sample interval as 0 as a special case, where whether unset or specified with 0 value, we need to go ahead and accept this as a valid subscription and create the subscription with the minimum sample interval supported / as soon as possible? If yes, we are imitating the behavior of target-defined mode at the server side.

vikas-ciena commented 1 year ago

In proto3 if a element value is default that element won't be encoded and while parsing at server end if the encoded message does not contain a particular singular element, the corresponding field in the parsed object is set to the default value for that field . So unset and default value is same for server perspective. We should go ahead with creating a subscription with minimum supported interval as per gNMI spec.
@ashu-ciena IMO target_defined serves a different purpose . As mentioned by @gcsl
Target defined was added specifically to handle both the sample as fast as possible for sample data, but deliver anything as event-driven that can be. E.g. for the interfaces model, we would expect that counters are sampled but state changes in admin-status, oper-status, last-change, etc. could all be event-driven. For target-defined, the sampleinterval is not applicable.

Sample_interval is only applicable to sample subscription. So a sample subscription with sample_interval 0 to interface model will send all leaf as sampled(whether its counter or event-driven data).

root + |

A target_defined Subscription on root node (/) will send update for /a/c as sample (minimum interval supported by NE ), while /a/b , /d/e , /d/f will be event-driven. But a sample subscription with sample_interval 0 on root will send update as sample for all child node. I think behavior imitation will only be in case when target_defined or sample subscription with 0 sample_interval is created at /a/c.