openconfig / gnmi

gRPC Network Management Interface
Apache License 2.0
459 stars 196 forks source link

Depth gNMI Extension #166

Closed hellt closed 4 months ago

hellt commented 7 months ago

Hi all,

This PR proposes to add a new gNMI Extension called Depth to the list of well-known gNMI extensions.

/cc @dplore @robshakir

1 Rationale

gNMI specification never had a subtree "filtering" feature because one of the foundational principles is "to keep a server implementation simple".

While maintaining a simple server implementation is good, one particular feature of gNMI - implicit recursiveness of the requested data - may be considered a limitation for a number of gNMI users and systems.

This proposal is to add an Extension to the gNMI that would allow a client to control the depth of the recursion when the server evaluates a group of paths in the Subscribe or Get RPC.

By virtue of being an extension, implementing this feature is optional for the vendors and won't affect compliance with the specification.

Orchestration, Network Management and Monitoring Systems can benefit from this extension as it

  1. reduces the load on the server when data is to be fetched from the Network OS during the recursive data extraction
  2. reduces the bytes on the wire payload, by sending less data

2 Demo model

To help us explain the concept of the depth-based filtering, consider the following model that we will use when showing implementation examples:

    container basket {
      leaf name {
        type string;
      }

      leaf-list contents {
        type string;
      }

      list fruits {
        key "name";

        leaf name {
          type string;
        }

        leaf-list colors {
          type string;
        }

        leaf size {
          type string;
        }

        container origin {
          leaf country {
            type string;
          }

          leaf city {
            type string;
          }
        }
      }

      container description {
        leaf fabric {
          type string;
        }

      }
      container broken {
        presence "This container is broken";
        leaf reason {
          type string;
        }
      }
    }

It's tree representation:

module: app
  +--rw basket
     +--rw name?          string
     +--rw contents*      string
     +--rw fruits* [name]
     |  +--rw name      string
     |  +--rw colors*   string
     |  +--rw size?     string
     |  +--rw origin
     |     +--rw country?   string
     |     +--rw city?      string
     +--rw description
     |  +--rw fabric?   string
     +--rw broken!
        +--rw reason?   string

We populate this data schema with the following values:

    basket {
        contents [
            fruits
            vegetables
        ]
        fruits apples {
            size XL
            colors [
                red
                yellow
            ]
            origin {
                country NL
                city Amsterdam
            }
        }
        fruits orange {
            size M
        }
        description {
            fabric cotton
        }
        broken {
            reason "too heavy"
        }
    }

3 Concepts

The Depth extension allows clients to specify the depth of the subtree to be returned in the response. The depth is specified as the number of levels below the specified path.

The extension itself has a single field that controls the depth level:

message Depth {
  uint32 level = 1;
}

3.1 Depth level values

3.1.1 Value 0

Depth value of 0 means no depth limit and behaves the same as if the extension was not specified at all.

3.1.2 Value 1

Value of 1 means only the specified path and its direct children will be returned. See Children section for more info.

3.1.2 Value of N+

Value of N+ where N>1 means all elements of the specified path up to N level and direct children of N-th level.

3.2 Children nodes

The Depth extension operates the value of "direct children of a schema node". What we understand by direct children:

  1. leafs
  2. leaf-lists

Only these elements are to be returned if depth extension with non-0 value is specified for a specified depth level.

3.3 RPC support

The Depth extension applies to Get and Subscribe requests only. When used with Capability and Set RPC the server should return an error.

4 Examples

Using the data model from Section 2 we will run through a set of examples using the patched version of openconfig/gnmic client with the added Depth extension support. We can provide the patched gnmic binary for Linux x86_84 if you want to try it out.

4.1 depth 1, path /basket

The most common way to use the depth extension (as we see it) is to use it with level=1. This gets you the immediate child nodes of the schema node targeted by a path.

Consider the following gnmic command targeting /basket path:

$ gnmic -e json_ietf get --path /basket --depth 1
[
  {
    "contents": [
      "fruits",
      "vegetables"
    ]
  }
]

As per the design, only the leaf and leaf-list nodes are returned. Since our /basket container has only leaf-list elements (no leafs) a single element contents is returned.

You can see how this makes it possible to reduce the amount of data extracted by the server and sent over the wire. Many applications might require fetching only leaf values of a certain container to make some informed decision without requiring any of the nested data.

4.2 depth 1, path /basket/fruits

When the path targets the list schema node, all elements of this list is returned with their children nodes

$ gnmic -e json_ietf get --path /basket/fruits --depth 1
[
  {
    "fruits": [
      {
        "colors": [
          "red",
          "yellow"
        ],
        "name": "apples",
        "size": "XL"
      },
      {
        "name": "orange",
        "size": "M"
      }
    ]
  }
]

Again, please keep in mind that only leafs and leaf-lists are returned for every list element.

4.3 depth 2, path /basket

When the depth level is set to values >1, all elements from the path to the provided level value are returned in full with the last level including only leafs and leaf-lists.

$ gnmic -e json_ietf get --path /basket --depth 2
[
  {
    "broken": {
      "reason": "too heavy"
    },
    "contents": [
      "fruits",
      "vegetables"
    ],
    "description": {
      "fabric": "cotton"
    },
    "fruits": [
      {
        "colors": [
          "red",
          "yellow"
        ],
        "name": "apples",
        "size": "XL"
      },
      {
        "name": "orange",
        "size": "M"
      }
    ]
  }
]

Here is what happens:

image

The 1st level elements are returned, since depth level is 2. On the 2nd level we return only leafs and leaf-lists, hence the .fruits.origin is not present.

5 Prior art

Netconf standardized max-depth in RFC 85226:

The "max-depth" parameter can be used by the client to limit the number of subtree levels that are returned in the reply.

The NETCONF way of using the max-depth differs in a sense that depth=1 returns the element pointed by the path, but not its children. depth=2 returns children of the element pointed by the path.

I find this behavior strange, as I don't see an operational reason to return the element itself when depth is 1.

6 Summary

We believe that the Depth extension has a generic applicability whilst not bein a burden for the implementation (henceforth no subtree filtering with XPath or anything of sorts).

Yet it delivers important quality of life improvements for consuming systems that may get the required data nodes faster and with less processing time spent.

This is assuming that cumulative time of fetching only leaf/leaf-lists values by the server is smaller than the recursive data retrieval combined with payload unmarshalling on the client side.

ccole-juniper commented 7 months ago

"By virtue of being an extension, implementing this feature is optional for the vendors and won't affect compliance with the specification."

Why not make it part of the specification (not as an extension but as a new message field) but specify that it is optional? That would IMO improve discoverability.

ccole-juniper commented 7 months ago

For subscriptions (which can contain multiple paths in the subscription list), is the intention that depth be applied to each path? Or would it make sense to instead add it to the "Subscription" (per path) message as a new field?

hellt commented 7 months ago

For subscriptions (which can contain multiple paths in the subscription list), is the intention that depth be applied to each path? Or would it make sense to instead add it to the "Subscription" (per path) message as a new field?

@ccole-juniper yes, since the extension is a per-RPC message, it applies to all paths in the request.

To make depth applicable on a per path level it would entail creating either

  1. path extensions (or Path spec change)
  2. embed a map of paths inside the depth extension that would set which depth level each path requests.

But we didn't consider this particular feature of a per-path depth level to be critical to warrant added complexity or spec changes.

This is, of course, up for a discussion.

dplore commented 7 months ago

I'd like to see comments from network operators on the operational use cases and business need for this feature.

In addition, it would be useful to know if there is any precedent such as existing NOS implementations (perhaps not using gNMI) that support a capability like this? (Note: existing implementation is not a hard requirement for gNMI extensions).

ashu-ciena commented 7 months ago

If we compare with NETCONF, the expectation is to return everything by default underneath the requested node hierarchy. Isn't it the same expectation with gNMI too ?

hellt commented 7 months ago

In addition, it would be useful to know if there is any precedent such as existing NOS implementations (perhaps not using gNMI) that support a capability like this? (Note: existing implementation is not a hard requirement for gNMI extensions).

I have added section 5 with a reference to NETCONF RFC 8526 that standardizes max-depth parameter (as a netconf extension in the NMDA context).

Rob Wilton is one of the co-authors (sorry don't know his handle to mention him here), maybe he can add more clarity if that is adopted by many netconf servers.

earies commented 7 months ago

In addition, it would be useful to know if there is any precedent such as existing NOS implementations (perhaps not using gNMI) that support a capability like this? (Note: existing implementation is not a hard requirement for gNMI extensions).

Speaking for JUNOS/EVO - we do not support NETCONF NMDA extension RPCs (e.g. get-data) nor do we support the "max-depth" concept of filtering across other public management APIs (as of today in latest shipping code).

As far as extensions, while a great method to augment and provide a demarcation of compliance/support, the one drawback seen is that currently extension fields in gNMI messages are mostly defined at the utmost top-level of a request/response message. This limits how an extension can be used w/o vastly over-complicating or replicating message structs in the extension proto (your 2nd point). Something like this could come back into the base spec/IDL as a backwards compatible addition or we could warrant distributing extension fields further down into various child messages as fit - all an option to not restrict the this capability from day-one.

But, I think this raises a higher level topic towards where we should take gNMI "filtering" in general. Currently, we are lacking filtering techniques that also pose more complexity and potential resource usage onto the network-element. My thoughts around this (and am curious of others opinions) are the initial design of gNMI was less filtering/burden on the element, but rather stream as much data off the box as possible and run your complex queries/filters against your DB/TSDBs (common in compute/application-land vs. ad-hoc clients only looking for slices of data off the ultimate producer)

That's not to say additional filtering capabilities are not useful. There are different classes of consumers that actually only do want certain slices (data trees will continue to grow infinitely both OC, native and any other 3rd party) and currently we are limited to precise key matches or wildcards, path wildcards (*, ...) and this would bring in "depth" - there are various other filters that could be useful.... tagging data nodes w/ metadata to provide a metadata filters, "config vs. state" (which exists in get() but not subscribe()) - just making note its probably worthy of a topic in an upcoming community meeting to discuss useful filtering capabilities.

I have added section 5 with a reference to NETCONF RFC 8526 that standardizes max-depth parameter (as a netconf extension in the NMDA context).

And I see @jsterne is currently in the midst of clarifying some behaviors/expectations which would apply here as well:

https://mailarchive.ietf.org/arch/msg/netconf/zwCce8cDEMeVnl2W0JpgR9TilOk/

Rob Wilton is one of the co-authors (sorry don't know his handle to mention him here), maybe he can add more clarity if that is adopted by many netconf servers.

@rgwilton

hellt commented 7 months ago

But, I think this raises a higher level topic towards where we should take gNMI "filtering" in general. Currently, we are lacking filtering techniques that also pose more complexity and potential resource usage onto the network-element. My thoughts around this (and am curious of others opinions) are the initial design of gNMI was less filtering/burden on the element, but rather stream as much data off the box as possible and run your complex queries/filters against your DB/TSDBs (common in compute/application-land vs. ad-hoc clients only looking for slices of data off the ultimate producer)

Yes, keeping in mind the core gNMI design intentions to be "simple", we though a minimal Depth extension is still applicable to the protocol, as it doesn't overload the server with the complicated logic.

But at the same time, it opens the door to exploring more elaborated filtering extensions if operators would see them as valuable. Thinking subtree filtering, content match nodes, and other advanced filtering options coming from the netconf land.

We could go with the proprietary extension for the Depth feature, but it seemed to us that this extensions is of generic value (as valuable as the two other current extensions History/Master Arbitration)

dplore commented 7 months ago

So far at Google, we have recently encountered a use case for filtering at the device vs. in the network management system. The case is to filter notifications from a SUBSCRIBE to /network-instances/network-instance/afts/ for a subset of ipv4/ipv6 prefixes. The reasons to do this include scaling consideration on the devices and also scaling of transmit/storage of the data (since aft datasets can be O(1M) entries). Because the prefix filter list is O(10,000) entries and doesn't change often, it seems efficient to configure this filter versus sending it with each subscribe. I don't know of any Google operational use case to filter using 'depth'.

I'd like to hear about additional operator driven / operational use cases for filtering based on depth or other criteria.

ccole-juniper commented 7 months ago

For subscriptions (which can contain multiple paths in the subscription list), is the intention that depth be applied to each path? Or would it make sense to instead add it to the "Subscription" (per path) message as a new field?

@ccole-juniper yes, since the extension is a per-RPC message, it applies to all paths in the request.

To make depth applicable on a per path level it would entail creating either

  1. path extensions (or Path spec change)
  2. embed a map of paths inside the depth extension that would set which depth level each path requests.

But we didn't consider this particular feature of a per-path depth level to be critical to warrant added complexity or spec changes.

This is, of course, up for a discussion.

You wouldn't need to do either as far as I can tell and could instead extend the "Subscription" message.

message Subscription {
  Path path = 1;               // The data tree path.
  SubscriptionMode mode = 2;   // Subscription mode to be used.
  uint64 sample_interval = 3;  // ns between samples in SAMPLE mode.
  // Indicates whether values that have not changed should be sent in a SAMPLE
  // subscription.
  bool suppress_redundant = 4;
  // Specifies the maximum allowable silent period in nanoseconds when
  // suppress_redundant is in use. The target should send a value at least once
  // in the period specified.
  uint64 heartbeat_interval = 5;
  uint32 depth = 6;
}
hellt commented 6 months ago

Hi @dplore

What would you recommended to be the next step for this PR to get it to its resolution? I remember you wanted to get some operators input.

dplore commented 6 months ago

We reviewed in the OC Operators meeting Feb 13, 2024. Of those present we didn't have operational use cases for depth, but we also didn't see a reason to object to this either. Can you reference or call on any network operators (customers of yours) to speak out on their need or use cases for this feature?

Pegasust commented 6 months ago

I skimmed through the discussion, and I'm in favor of this getting merged.

We have couple netop tools that only cares about some intermediate results, so it's quite chatty on gNMI requests, in lights of trying to be as consistent on state as possible.

We do have a use case similar to @hellt example: /basket under broken/ and description/, just like network interface's admin-state, description, name.

Without --depth, it would pull all /basket[name=*]/subinterface[index=*]/**/*, which can yield a pretty big payload.

It seems --depth strikes a good balance between niceties for gNMI client and the path designer for gNMI server.

protonjhow commented 6 months ago

👋🏼 Nokia and Juniper end user here.

Over the years we have made use of netconf in Juniper land to make scoped calls that target data specifically, to minimise bytes on the wire, cycles spent on SSH transport activity, and to reduce the amount of effort to parse the response in the data recipient tooling.

Switching to gNMI, we have observed a number of new efficiencies in the transport from the protobuf encoding, but we have also had a few situations where we end up with more data than we needed for the use case. The Nokia state endpoints can be quite chunky for example.

I would want to spend more time digging into the details for how and why, but off the top of my head there are a few ideas. I believe in some cases, Nokia would have to make changes that leverage this new extension, but assuming they did; We could make use of this to target certain areas of the tree for subscriptions that fire rapid reaction to specific state changes (e.g. fault conditions).. We might do this using multiple smaller targeted subscriptions rather than one larger one too for example.

Juniper gNMI support has been a little sketchy in our experience, although its supposedly improving in newer releases. Assuming they picked this up too, If we could unify our transport, and not give away features, that would be great.

hellt commented 6 months ago

Hi @dplore A few references from the end customers provided above, thanks @protonjhow and @Pegasust

I have updated the generated go/py protos and looking forward to have this PR unblocked

hellt commented 6 months ago

Hi @dplore

I know this is probably a P10 for OC at this point, but a P1 for us at Nokia. Since it seems even after a few customer references this PR ended up stalled, I wonder, maybe we should take it the opaque way and get a registerd ExtensionID and do our magic there without exposing this to others?

robshakir commented 6 months ago

I think as an extension, this seems reasonable to add. There's operators here saying they need it, two vendors who seem to indicate that they could support it.

Extensions are meant to be there for additional features that not all implementations necessarily support. This PR uses that in the right way IMHO, since it avoids the need to change core parts of the specification.

So, LGTM. This repo doesn't pull specific external PRs, so I'll need to merge this upstream and export it.

@hellt -- do you have a doc that we could merge upstream? The opening request from this PR looks pretty good for that. It'd be added over in github.com/openconfig/reference.

robshakir commented 6 months ago

@dplore - LMK if you have objections here. Otherwise, let's :shipit: :-)

hellt commented 6 months ago

@robshakir indeed, the idea was to dump the in the PR description what would entail the reference entry.

robshakir commented 6 months ago

@hellt - great, thanks -- can you put it at rpc/gnmi/gnmi-depth.md. I created a change upstream for the proto, will merge this with the changes to the generated protos once I've merged that change.

hellt commented 4 months ago

@robshakir it seems this may be closed, since the chnages were brought in from the internal repo?

dplore commented 4 months ago

Yes, closing this. Thanks @hellt