openconfig / reference

This repository contains reference implementations, specifications and tooling related to OpenConfig-based network management.
Apache License 2.0
155 stars 88 forks source link

Need a gRPC interface for dial-out streaming #42

Open tsuna opened 7 years ago

tsuna commented 7 years ago

gNMI currently only supports dial-in (i.e. a client coming from the outside and connecting to the network element where the gRPC server implementing the gNMI spec listens), but we see a need for a dial-out streaming model as well.

Advantages of a dial-out approach:

This issue is about settling on a standard gRPC interface for gNMI dial-out streaming.

edit: issue #42! 🎉

robshakir commented 7 years ago

Hi Benoit,

[Congrats on issue #42 ;-), hopefully you know where your towel is]

We agree that this is something that gNMI should have (it is noted in the current specification as a TODO). There are a number of deployment cases whereby the target cannot actually be contacted by the client directly (e.g., it is behind NAT) and hence dial-out makes a lot of sense. It also seems that this approach would be a possible manner to mitigate scaling issues in the case that sending all Notification messages to a centralised entity in a system becomes a bottleneck. A linecard in such a system could dial out to the client if required.

My proposal is that we define a new service - for the reason that this would then be the specific stub that would be implemented at these dial-out clients. It should handle only Subscribe AFAICS, without any of the configuration manipulation RPCs - since it is not clear to me that such a use case is required at this point in time.

My proposal is therefore something like:

// gNMIDialOut defines a service which is used by a target system (typically a
// network element) to initiate connections to a client (collector). The server
// is implemented at the collector, such that the target can initiate connections
// to the collector, based on a configured set of telemetry subscriptions.
service gNMIDialOut {
    // Publish allows the target to send telemetry updates (in the form of
    // SubscribeResponse messaages, which have the same semantics as in the
    // gNMI Subscribe RPC, to a client. The client may optionally return the
    // PublishResponse message in response to the dial-out connection from the
    // target. In this case, the client may modify the set of subscriptions
    // that are to be published by the target by:
    //   - Specifying a client_id within the PublishResponse message. In this
    //     case the target should match pre-configured subscriptions the specified
    //     client_id, and send data only for the paths associated with the
    //     specified client_id.
    //   - Specifying a SubscribeRequest message within the subscriptions field of
    //     the PublishResponse message. This message has the same semantics as
    //     in the Subscribe gNMI RPC.
    // In the case that the client specifies neither option, a default set of
    // subscriptions (which should be configurable on the target) should be
    // published to the client (collector).
    //
    // The configuration of subscriptions associated with the publish RPC may
    // be through the OpenConfig telemetry configuration and operational state
    // model: 
    // https://github.com/openconfig/public/blob/master/release/models/telemetry/openconfig-telemetry.yang
    rpc Publish(stream SubscribeResponse) returns (stream PublishResponse);
}

// PublishResponse is the message sent within the Publish RPC of the gNMI
// dial-out service by the client (collector) to the target. It is used to
// modify the set of paths that are to be sent by the target to the collector.
message PublishResponse {
    oneof request {
        string client_id = 1;                                    // A string identifying the client to the target.
        SubscribeRequest subscriptions = 2;    // Optional specification of the subscriptions.
    }
}

One thing we probably have to be careful about here is the terminology - since we probably want tor remain consistent with the rest of the gNMI specification.

How does this proposal sound? Happy to iterate on it, and try and add this to a future version of the gNMI spec. @gcsl, @hines, @aashaikh - any thoughts?

Cheers! r.

gcsl commented 7 years ago

On Wed, Mar 8, 2017 at 1:56 AM, Benoit Sigoure notifications@github.com wrote:

gNMI currently only supports dial-in (i.e. a client coming from the outside and connecting to the network element where the gRPC server implementing the gNMI spec listens), but we see a need for a dial-out streaming model as well.

Advantages of a dial-out approach:

  • No need to expose a service to the outside world (reduces attack surface, even if that can already be mitigated by using a management VRF and/or control-plane ACLs).

I argue this doesn't reduce the attack surface, it simply shifts it elsewhere in the network, to the collector. Security is still an important issue.

  • No need to have a system to manage the "shared responsibility" of collecting telemetry from each and every network element.

This has to be done regardless. Switching from dial-in to dial-out just shifts configuration from one system to N devices. There must still be some centralized management to know what N devices to configure and what M services to point them to, which themselves must also be configured. Granted, centralized management could be pen and paper but I'm not convinced that is advantage enough itself to justify dial-out.

-

  • Instead of worrying which collector is responsible for collecting data from switch X and what to do when this collector does, the switch is responsible for streaming its telemetry out to a preconfigured list of targets.

    A collector gathering from multiple devices would have no control on when connections to those devices are to be established and devices themselves should have no awareness of one another. Clearly the control over fan-in should be handled where the fan-in happens.

    -

  • The pre-configured list could be a static list of ip/ports, a DNS name that resolves to multiple IP addresses (and is periodically re-resolved), or better, some name to lookup in a service discovery system backed by something like etcd/Zookeeper. The network element just needs to connect to one at random, doesn't matter which.

This splits system behavior between switches and collector which could lead to interesting failure scenarios. In the event of disconnections, monitoring could shift around between load-balanced collectors making tracking down problems more difficult. We have actually prototyped some solutions along these lines and have run across several pain points.

-

  • It's easier to have a stateless collector backend (just accept connections, optionally authenticate devices, and store the incoming update stream in a database or Kafka-like bus or whatever) as opposed to maintaining state regarding what targets to collect from and what paths to subscribe to on each one of them.

This is orthogonal to dial-out. I actually requested the subscriptions be part of configuration even for dial-in as I agree there is some significant advantages.

My points above aside, I think there could be valid use cases for dial-out. I just don't believe that security or management is any easier from a system perspective, and in many cases could be more difficult.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/openconfig/reference/issues/42, or mute the thread https://github.com/notifications/unsubscribe-auth/ARfIL69xBGUylVn8ktbsXAk-9QhETDEeks5rjntcgaJpZM4MWlQh .

gcsl commented 7 years ago

I would counter propose a dial-out service that has exactly the same semantics as the original Subscribe, only the direction of the establishing of the connection changes. Meaning that a device would initiate the subscription, but it would not begin streaming anything until it received a SubscribeRequest from the collector.

service Subscriber { rpc Subscriber(stream SubscribeResponse) returns (stream SubscribeRequest); }

As I mentioned in my previous email, I believe having a subscription configured on a device is orthogonal to the direction in which the connection is established and it should work for both dial-in and dial-out. I would propose that the SubscribeRequest message be enhanced to request a default subscription. I think this topic needs additional discussion as I suspect we would want to handle multiple subsciption sets that could be retrieved separately.

On Wed, Mar 8, 2017 at 8:02 AM, Rob Shakir notifications@github.com wrote:

Hi Benoit,

[Congrats on issue #42 https://github.com/openconfig/reference/issues/42 ;-), hopefully you know where your towel is]

We agree that this is something that gNMI should have (it is noted in the current specification as a TODO). There are a number of deployment cases whereby the target cannot actually be contacted by the client directly (e.g., it is behind NAT) and hence dial-out makes a lot of sense. It also seems that this approach would be a possible manner to mitigate scaling issues in the case that sending all Notification messages to a centralised entity in a system becomes a bottleneck. A linecard in such a system could dial out to the client if required.

My proposal is that we define a new service - for the reason that this would then be the specific stub that would be implemented at these dial-out clients. It should handle only Subscribe AFAICS, without any of the configuration manipulation RPCs - since it is not clear to me that such a use case is required at this point in time.

My proposal is therefore something like:

// gNMIDialOut defines a service which is used by a target system (typically a// network element) to initiate connections to a client (collector). The server// is implemented at the collector, such that the target can initiate connections// to the collector, based on a configured set of telemetry subscriptions.service gNMIDialOut { // Publish allows the target to send telemetry updates (in the form of // SubscribeResponse messaages, which have the same semantics as in the // gNMI Subscribe RPC, to a client. The client may optionally return the // PublishResponse message in response to the dial-out connection from the // target. In this case, the client may modify the set of subscriptions // that are to be published by the target by: // - Specifying a client_id within the PublishResponse message. In this // case the target should match pre-configured subscriptions the specified // client_id, and send data only for the paths associated with the // specified client_id. // - Specifying a SubscribeRequest message within the subscriptions field of // the PublishResponse message. This message has the same semantics as // in the Subscribe gNMI RPC. // In the case that the client specifies neither option, a default set of // subscriptions (which should be configurable on the target) should be // published to the client (collector). // // The configuration of subscriptions associated with the publish RPC may // be through the OpenConfig telemetry configuration and operational state // model: // https://github.com/openconfig/public/blob/master/release/models/telemetry/openconfig-telemetry.yang rpc Publish(stream SubscribeResponse) returns (stream PublishResponse); } // PublishResponse is the message sent within the Publish RPC of the gNMI// dial-out service by the client (collector) to the target. It is used to// modify the set of paths that are to be sent by the target to the collector.message PublishResponse { oneof request { string client_id = 1; // A string identifying the client to the target. SubscribeRequest subscriptions = 2; // Optional specification of the subscriptions. } }

One thing we probably have to be careful about here is the terminology - since we probably want tor remain consistent with the rest of the gNMI specification.

How does this proposal sound? Happy to iterate on it, and try and add this to a future version of the gNMI spec. @gcsl https://github.com/gcsl, @hines https://github.com/hines, @aashaikh https://github.com/aashaikh

  • any thoughts?

Cheers! r.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openconfig/reference/issues/42#issuecomment-285082742, or mute the thread https://github.com/notifications/unsubscribe-auth/ARfILyRi9pVIwUa3QTYkmtbd0YgOI83aks5rjtEogaJpZM4MWlQh .

marcushines-zz commented 7 years ago

+1 on those semantics the idea that a device can dos the collector is worrisome

On Mar 8, 2017 11:25 AM, "gcsl" notifications@github.com wrote:

I would counter propose a dial-out service that has exactly the same semantics as the original Subscribe, only the direction of the establishing of the connection changes. Meaning that a device would initiate the subscription, but it would not begin streaming anything until it received a SubscribeRequest from the collector.

service Subscriber { rpc Subscriber(stream SubscribeResponse) returns (stream SubscribeRequest); }

As I mentioned in my previous email, I believe having a subscription configured on a device is orthogonal to the direction in which the connection is established and it should work for both dial-in and dial-out. I would propose that the SubscribeRequest message be enhanced to request a default subscription. I think this topic needs additional discussion as I suspect we would want to handle multiple subsciption sets that could be retrieved separately.

On Wed, Mar 8, 2017 at 8:02 AM, Rob Shakir notifications@github.com wrote:

Hi Benoit,

[Congrats on issue #42 https://github.com/ openconfig/reference/issues/42 ;-), hopefully you know where your towel is]

We agree that this is something that gNMI should have (it is noted in the current specification as a TODO). There are a number of deployment cases whereby the target cannot actually be contacted by the client directly (e.g., it is behind NAT) and hence dial-out makes a lot of sense. It also seems that this approach would be a possible manner to mitigate scaling issues in the case that sending all Notification messages to a centralised entity in a system becomes a bottleneck. A linecard in such a system could dial out to the client if required.

My proposal is that we define a new service - for the reason that this would then be the specific stub that would be implemented at these dial-out clients. It should handle only Subscribe AFAICS, without any of the configuration manipulation RPCs - since it is not clear to me that such a use case is required at this point in time.

My proposal is therefore something like:

// gNMIDialOut defines a service which is used by a target system (typically a// network element) to initiate connections to a client (collector). The server// is implemented at the collector, such that the target can initiate connections// to the collector, based on a configured set of telemetry subscriptions.service gNMIDialOut { // Publish allows the target to send telemetry updates (in the form of // SubscribeResponse messaages, which have the same semantics as in the // gNMI Subscribe RPC, to a client. The client may optionally return the // PublishResponse message in response to the dial-out connection from the // target. In this case, the client may modify the set of subscriptions // that are to be published by the target by: // - Specifying a client_id within the PublishResponse message. In this // case the target should match pre-configured subscriptions the specified // client_id, and send data only for the paths associated with the // specified client_id. // - Specifying a SubscribeRequest message within the subscriptions field of // the PublishResponse message. This message has the same semantics as // in the Subscribe gNMI RPC. // In the case that the client specifies neither option, a default set of // subscriptions (which should be configurable on the target) should be // published to the client (collector). // // The configuration of subscriptions associated with the publish RPC may // be through the OpenConfig telemetry configuration and operational state // model: // https://github.com/openconfig/public/blob/master/release/ models/telemetry/openconfig-telemetry.yang rpc Publish(stream SubscribeResponse) returns (stream PublishResponse); } // PublishResponse is the message sent within the Publish RPC of the gNMI// dial-out service by the client (collector) to the target. It is used to// modify the set of paths that are to be sent by the target to the collector.message PublishResponse { oneof request { string client_id = 1; // A string identifying the client to the target. SubscribeRequest subscriptions = 2; // Optional specification of the subscriptions. } }

One thing we probably have to be careful about here is the terminology - since we probably want tor remain consistent with the rest of the gNMI specification.

How does this proposal sound? Happy to iterate on it, and try and add this to a future version of the gNMI spec. @gcsl https://github.com/gcsl, @hines https://github.com/hines, @aashaikh < https://github.com/aashaikh>

  • any thoughts?

Cheers! r.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openconfig/reference/issues/ 42#issuecomment-285082742, or mute the thread https://github.com/notifications/unsubscribe-auth/ ARfILyRi9pVIwUa3QTYkmtbd0YgOI83aks5rjtEogaJpZM4MWlQh .

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/openconfig/reference/issues/42#issuecomment-285141591, or mute the thread https://github.com/notifications/unsubscribe-auth/ALKGm5Y6lelesaicJqQHEMuYCSPTZCk5ks5rjwCQgaJpZM4MWlQh .

pborman commented 7 years ago

The question becomes, in a dial-out service, who is responsible for determining what data is sent. It seems that for dial-out, the information to send would more than likely be part of the configuration and when dialing out you simply want to start sending a stream of notifications.

I am less concerned about DOSing a collector than a switch. The collector can simply drop the connection and stop listening if it feels overburdened.

-Paul

On Wed, Mar 8, 2017 at 11:25 AM, gcsl notifications@github.com wrote:

I would counter propose a dial-out service that has exactly the same semantics as the original Subscribe, only the direction of the establishing of the connection changes. Meaning that a device would initiate the subscription, but it would not begin streaming anything until it received a SubscribeRequest from the collector.

service Subscriber { rpc Subscriber(stream SubscribeResponse) returns (stream SubscribeRequest); }

As I mentioned in my previous email, I believe having a subscription configured on a device is orthogonal to the direction in which the connection is established and it should work for both dial-in and dial-out. I would propose that the SubscribeRequest message be enhanced to request a default subscription. I think this topic needs additional discussion as I suspect we would want to handle multiple subsciption sets that could be retrieved separately.

On Wed, Mar 8, 2017 at 8:02 AM, Rob Shakir notifications@github.com wrote:

Hi Benoit,

[Congrats on issue #42 https://github.com/ openconfig/reference/issues/42 ;-), hopefully you know where your towel is]

We agree that this is something that gNMI should have (it is noted in the current specification as a TODO). There are a number of deployment cases whereby the target cannot actually be contacted by the client directly (e.g., it is behind NAT) and hence dial-out makes a lot of sense. It also seems that this approach would be a possible manner to mitigate scaling issues in the case that sending all Notification messages to a centralised entity in a system becomes a bottleneck. A linecard in such a system could dial out to the client if required.

My proposal is that we define a new service - for the reason that this would then be the specific stub that would be implemented at these dial-out clients. It should handle only Subscribe AFAICS, without any of the configuration manipulation RPCs - since it is not clear to me that such a use case is required at this point in time.

My proposal is therefore something like:

// gNMIDialOut defines a service which is used by a target system (typically a// network element) to initiate connections to a client (collector). The server// is implemented at the collector, such that the target can initiate connections// to the collector, based on a configured set of telemetry subscriptions.service gNMIDialOut { // Publish allows the target to send telemetry updates (in the form of // SubscribeResponse messaages, which have the same semantics as in the // gNMI Subscribe RPC, to a client. The client may optionally return the // PublishResponse message in response to the dial-out connection from the // target. In this case, the client may modify the set of subscriptions // that are to be published by the target by: // - Specifying a client_id within the PublishResponse message. In this // case the target should match pre-configured subscriptions the specified // client_id, and send data only for the paths associated with the // specified client_id. // - Specifying a SubscribeRequest message within the subscriptions field of // the PublishResponse message. This message has the same semantics as // in the Subscribe gNMI RPC. // In the case that the client specifies neither option, a default set of // subscriptions (which should be configurable on the target) should be // published to the client (collector). // // The configuration of subscriptions associated with the publish RPC may // be through the OpenConfig telemetry configuration and operational state // model: // https://github.com/openconfig/public/blob/master/release/ models/telemetry/openconfig-telemetry.yang rpc Publish(stream SubscribeResponse) returns (stream PublishResponse); } // PublishResponse is the message sent within the Publish RPC of the gNMI// dial-out service by the client (collector) to the target. It is used to// modify the set of paths that are to be sent by the target to the collector.message PublishResponse { oneof request { string client_id = 1; // A string identifying the client to the target. SubscribeRequest subscriptions = 2; // Optional specification of the subscriptions. } }

One thing we probably have to be careful about here is the terminology - since we probably want tor remain consistent with the rest of the gNMI specification.

How does this proposal sound? Happy to iterate on it, and try and add this to a future version of the gNMI spec. @gcsl https://github.com/gcsl, @hines https://github.com/hines, @aashaikh < https://github.com/aashaikh>

  • any thoughts?

Cheers! r.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openconfig/reference/issues/ 42#issuecomment-285082742, or mute the thread https://github.com/notifications/unsubscribe-auth/ ARfILyRi9pVIwUa3QTYkmtbd0YgOI83aks5rjtEogaJpZM4MWlQh

.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/openconfig/reference/issues/42#issuecomment-285141591, or mute the thread https://github.com/notifications/unsubscribe-auth/AE4QHbAgQxKcwehTSfK5NMKCvbm1WF5Qks5rjwCQgaJpZM4MWlQh .

robshakir commented 7 years ago

Per the example I proposed above, the way that this would be determined can be either:

The proposal above covers almost exactly what @gcsl mentioned (semantics of the Subscribe RPC just with a pre-specified set). One option to avoid the PublishResponse message would be to simply add this ability to specify pre-configured path sets to SubscribeRequest. Treating these issues as orthogonal would lead us towards this solution.

gcsl commented 7 years ago

On Wed, Mar 8, 2017 at 12:30 PM, pborman notifications@github.com wrote:

The question becomes, in a dial-out service, who is responsible for determining what data is sent. It seems that for dial-out, the information to send would more than likely be part of the configuration and when dialing out you simply want to start sending a stream of notifications.

You want to be sure the collector is actually ready and I suggest the start of the stream should still be triggered by the collector. The key use case where this becomes exceedingly important is when a proxy exists between the device and a collector. We are in active discussions regarding deployment scenarios where a proxy would be necessary. We would want to avoid the need for a proxy to buffer the stream while the connection is being established to the ultimate collector.

Having identical semantics for dial-in and dial-out makes it trivial for an implementation to support both.

I am less concerned about DOSing a collector than a switch. The collector can simply drop the connection and stop listening if it feels overburdened.

-Paul

On Wed, Mar 8, 2017 at 11:25 AM, gcsl notifications@github.com wrote:

I would counter propose a dial-out service that has exactly the same semantics as the original Subscribe, only the direction of the establishing of the connection changes. Meaning that a device would initiate the subscription, but it would not begin streaming anything until it received a SubscribeRequest from the collector.

service Subscriber { rpc Subscriber(stream SubscribeResponse) returns (stream SubscribeRequest); }

As I mentioned in my previous email, I believe having a subscription configured on a device is orthogonal to the direction in which the connection is established and it should work for both dial-in and dial-out. I would propose that the SubscribeRequest message be enhanced to request a default subscription. I think this topic needs additional discussion as I suspect we would want to handle multiple subsciption sets that could be retrieved separately.

On Wed, Mar 8, 2017 at 8:02 AM, Rob Shakir notifications@github.com wrote:

Hi Benoit,

[Congrats on issue #42 https://github.com/ openconfig/reference/issues/42 ;-), hopefully you know where your towel is]

We agree that this is something that gNMI should have (it is noted in the current specification as a TODO). There are a number of deployment cases whereby the target cannot actually be contacted by the client directly (e.g., it is behind NAT) and hence dial-out makes a lot of sense. It also seems that this approach would be a possible manner to mitigate scaling issues in the case that sending all Notification messages to a centralised entity in a system becomes a bottleneck. A linecard in such a system could dial out to the client if required.

My proposal is that we define a new service - for the reason that this would then be the specific stub that would be implemented at these dial-out clients. It should handle only Subscribe AFAICS, without any of the configuration manipulation RPCs - since it is not clear to me that such a use case is required at this point in time.

My proposal is therefore something like:

// gNMIDialOut defines a service which is used by a target system (typically a// network element) to initiate connections to a client (collector). The server// is implemented at the collector, such that the target can initiate connections// to the collector, based on a configured set of telemetry subscriptions.service gNMIDialOut { // Publish allows the target to send telemetry updates (in the form of // SubscribeResponse messaages, which have the same semantics as in the // gNMI Subscribe RPC, to a client. The client may optionally return the // PublishResponse message in response to the dial-out connection from the // target. In this case, the client may modify the set of subscriptions // that are to be published by the target by: // - Specifying a client_id within the PublishResponse message. In this // case the target should match pre-configured subscriptions the specified // client_id, and send data only for the paths associated with the // specified client_id. // - Specifying a SubscribeRequest message within the subscriptions field of // the PublishResponse message. This message has the same semantics as // in the Subscribe gNMI RPC. // In the case that the client specifies neither option, a default set of // subscriptions (which should be configurable on the target) should be // published to the client (collector). // // The configuration of subscriptions associated with the publish RPC may // be through the OpenConfig telemetry configuration and operational state // model: // https://github.com/openconfig/public/blob/master/release/ models/telemetry/openconfig-telemetry.yang rpc Publish(stream SubscribeResponse) returns (stream PublishResponse); } // PublishResponse is the message sent within the Publish RPC of the gNMI// dial-out service by the client (collector) to the target. It is used to// modify the set of paths that are to be sent by the target to the collector.message PublishResponse { oneof request { string client_id = 1; // A string identifying the client to the target. SubscribeRequest subscriptions = 2; // Optional specification of the subscriptions. } }

One thing we probably have to be careful about here is the terminology

since we probably want tor remain consistent with the rest of the gNMI specification.

How does this proposal sound? Happy to iterate on it, and try and add this to a future version of the gNMI spec. @gcsl https://github.com/gcsl, @hines https://github.com/hines, @aashaikh < https://github.com/aashaikh>

  • any thoughts?

Cheers! r.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openconfig/reference/issues/ 42#issuecomment-285082742, or mute the thread https://github.com/notifications/unsubscribe-auth/ ARfILyRi9pVIwUa3QTYkmtbd0YgOI83aks5rjtEogaJpZM4MWlQh

.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/openconfig/reference/issues/ 42#issuecomment-285141591, or mute the thread https://github.com/notifications/unsubscribe-auth/ AE4QHbAgQxKcwehTSfK5NMKCvbm1WF5Qks5rjwCQgaJpZM4MWlQh .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openconfig/reference/issues/42#issuecomment-285159678, or mute the thread https://github.com/notifications/unsubscribe-auth/ARfILzMxBKqcJJar7a3Qz4v8CY_WDD1nks5rjw_ygaJpZM4MWlQh .

pborman commented 7 years ago

A collector should not start listening on the port until it is ready so by the fact that the device is able to connect, it should be able to start streaming notifications.

It sounds like what is being said is "we like collector initiated, but due to other factors the device must initiate" which in turn sounds like we want a proxy service near the collector that the device connects to and the collector connects to the proxy. The proxy would not be limited to this service, but could be a more generalized sort of proxy, essentially a beachhead on the foreign network for the device.

On Wed, Mar 8, 2017 at 12:38 PM, gcsl notifications@github.com wrote:

On Wed, Mar 8, 2017 at 12:30 PM, pborman notifications@github.com wrote:

The question becomes, in a dial-out service, who is responsible for determining what data is sent. It seems that for dial-out, the information to send would more than likely be part of the configuration and when dialing out you simply want to start sending a stream of notifications.

You want to be sure the collector is actually ready and I suggest the start of the stream should still be triggered by the collector. The key use case where this becomes exceedingly important is when a proxy exists between the device and a collector. We are in active discussions regarding deployment scenarios where a proxy would be necessary. We would want to avoid the need for a proxy to buffer the stream while the connection is being established to the ultimate collector.

Having identical semantics for dial-in and dial-out makes it trivial for an implementation to support both.

I am less concerned about DOSing a collector than a switch. The collector can simply drop the connection and stop listening if it feels overburdened.

-Paul

On Wed, Mar 8, 2017 at 11:25 AM, gcsl notifications@github.com wrote:

I would counter propose a dial-out service that has exactly the same semantics as the original Subscribe, only the direction of the establishing of the connection changes. Meaning that a device would initiate the subscription, but it would not begin streaming anything until it received a SubscribeRequest from the collector.

service Subscriber { rpc Subscriber(stream SubscribeResponse) returns (stream SubscribeRequest); }

As I mentioned in my previous email, I believe having a subscription configured on a device is orthogonal to the direction in which the connection is established and it should work for both dial-in and dial-out. I would propose that the SubscribeRequest message be enhanced to request a default subscription. I think this topic needs additional discussion as I suspect we would want to handle multiple subsciption sets that could be retrieved separately.

On Wed, Mar 8, 2017 at 8:02 AM, Rob Shakir notifications@github.com wrote:

Hi Benoit,

[Congrats on issue #42 https://github.com/ openconfig/reference/issues/42 ;-), hopefully you know where your towel is]

We agree that this is something that gNMI should have (it is noted in the current specification as a TODO). There are a number of deployment cases whereby the target cannot actually be contacted by the client directly (e.g., it is behind NAT) and hence dial-out makes a lot of sense. It also seems that this approach would be a possible manner to mitigate scaling issues in the case that sending all Notification messages to a centralised entity in a system becomes a bottleneck. A linecard in such a system could dial out to the client if required.

My proposal is that we define a new service - for the reason that this would then be the specific stub that would be implemented at these dial-out clients. It should handle only Subscribe AFAICS, without any of the configuration manipulation RPCs - since it is not clear to me that such a use case is required at this point in time.

My proposal is therefore something like:

// gNMIDialOut defines a service which is used by a target system (typically a// network element) to initiate connections to a client (collector). The server// is implemented at the collector, such that the target can initiate connections// to the collector, based on a configured set of telemetry subscriptions.service gNMIDialOut { // Publish allows the target to send telemetry updates (in the form of // SubscribeResponse messaages, which have the same semantics as in the // gNMI Subscribe RPC, to a client. The client may optionally return the // PublishResponse message in response to the dial-out connection from the // target. In this case, the client may modify the set of subscriptions // that are to be published by the target by: // - Specifying a client_id within the PublishResponse message. In this // case the target should match pre-configured subscriptions the specified // client_id, and send data only for the paths associated with the // specified client_id. // - Specifying a SubscribeRequest message within the subscriptions field of // the PublishResponse message. This message has the same semantics as // in the Subscribe gNMI RPC. // In the case that the client specifies neither option, a default set of // subscriptions (which should be configurable on the target) should be // published to the client (collector). // // The configuration of subscriptions associated with the publish RPC may // be through the OpenConfig telemetry configuration and operational state // model: // https://github.com/openconfig/public/blob/master/release/ models/telemetry/openconfig-telemetry.yang rpc Publish(stream SubscribeResponse) returns (stream PublishResponse); } // PublishResponse is the message sent within the Publish RPC of the gNMI// dial-out service by the client (collector) to the target. It is used to// modify the set of paths that are to be sent by the target to the collector.message PublishResponse { oneof request { string client_id = 1; // A string identifying the client to the target. SubscribeRequest subscriptions = 2; // Optional specification of the subscriptions. } }

One thing we probably have to be careful about here is the terminology

since we probably want tor remain consistent with the rest of the gNMI specification.

How does this proposal sound? Happy to iterate on it, and try and add this to a future version of the gNMI spec. @gcsl <https://github.com/gcsl , @hines https://github.com/hines, @aashaikh < https://github.com/aashaikh>

  • any thoughts?

Cheers! r.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openconfig/reference/issues/ 42#issuecomment-285082742, or mute the thread https://github.com/notifications/unsubscribe-auth/ ARfILyRi9pVIwUa3QTYkmtbd0YgOI83aks5rjtEogaJpZM4MWlQh

.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/openconfig/reference/issues/ 42#issuecomment-285141591, or mute the thread https://github.com/notifications/unsubscribe-auth/ AE4QHbAgQxKcwehTSfK5NMKCvbm1WF5Qks5rjwCQgaJpZM4MWlQh .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openconfig/reference/issues/ 42#issuecomment-285159678, or mute the thread https://github.com/notifications/unsubscribe-auth/ ARfILzMxBKqcJJar7a3Qz4v8CY_WDD1nks5rjw_ygaJpZM4MWlQh .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/openconfig/reference/issues/42#issuecomment-285161697, or mute the thread https://github.com/notifications/unsubscribe-auth/AE4QHUHd53CEraJ6Ni18PN8jPiGWTGrQks5rjxHEgaJpZM4MWlQh .

melgendy-ciena commented 6 years ago

We also have been trying to find the right logistics to do this in our systems, and when we saw this thread, we felt that we have company and we are not left alone in the dark. Did you guys try out any of those above proposals yet? Any agreement/complexities/success with any of them? Thanks, -- Mohamed..

jipanyang commented 6 years ago

I think the original proposal from @robshakir looks good as to new service gNMIDialOut and reuse of SubscribeResponse. But for PublishResponse, probably changing the pre-configured subscriptions should be avoided. If the config was from netconf or other config channel, that channel should be the source of truth for configuration. Having multiple channels modifying the same object might cause confusions. How about make the optional PublishResponse message as acknowledge only and contains info like timestamp and path?

rodrigo-albuquerque commented 3 years ago

Are we moving forwarding with this or not? Dial-in has scaling issues. Cisco already implemented some form of dial out with gRPC + JSON/Protobuf support but I'd love to see it as a vendor-neutral standard.

aashaikh commented 3 years ago

There is active discussion on dial-out, with vendor engagement being driven by a couple of operators in the group. Please see openconfig/grpctunnel for the current proposed direction for supporting dial-out for gNMI.

I'm not sure I understand the comment about dial-in having scaling issues -- we are using the dial-in model with gNMI at very large scale for monitoring a variety of devices.

mpatrick02702 commented 3 years ago

@aashaikh Would you explain how a "grpctunnel" would support "dial-out" gNMI? I don't see how it's applicable. How are out-of-band configured subscriptions setup? When does a telemetry publisher dial-out? And to where? Does anyone have a message sequence chart for how gNMI dial-out would work?

In my industry (Cable), operators are very iterested in dial-out telemetry by network elements to proxy-able collector IP addresses (or even an anycast IP), while vendors are very interested in implementing network elements with gNMI open-source libraries. I'm trying to figure out how to satisfy both sides :-)

gcsl commented 3 years ago

grpctunnel is the approach that has been proposed and discussed with multiple vendors and network operators as a general solution that addresses many different use cases, including dial-out telemetry. For the dial-out telemetry case I will try to answer the questions posed here.

On Mon, Mar 8, 2021 at 11:23 AM mpatrick02702 notifications@github.com wrote:

@aashaikh https://github.com/aashaikh Would you explain how a "grpctunnel" would support "dial-out" gNMI? I don't see how it's applicable. How are out-of-band configured subscriptions setup?

With the tunnel mode we are proposing that a target dial-out to establish a tunnel which can trigger the start of a remote collector issuing a subscription over that tunnel. Thus gNMI semantics are identical as they are today. Device is the server, collector is the client.

Orthogonal to the dial direction (dial-in without a tunnel, vs. dial-out tunnel triggering a dial-in over the tunnel), a subscription can be configured either in the collection system that issues the subscription, or on the device via an alias, whereby the collection system issues a subscription to a named alias (e.g. "default" or "interfaces" or "inventory", etc). This allows a network operator to define the source of truth for what is collected either centrally in the management system (collector side) or distributed on the device side.

Using a tunnel allows both the change in direction of the initial dial for triggering the start of collection without altering gNMI semantics in any way, but also allows for collection from a device that is not directly network reachable in the other direction (collector to device) due to firewall or NAT and enables telemetry collection to transit public networks without the devices being directly reachable from those networks.

When does a telemetry publisher dial-out? And to where?

A configuration on the device will have a list of addresses to dial with credentials to establish one or more tunnels and register the device with one or more tunnel servers. Each configured tunnel server would be contacted upon initial configuration, loss of connection, or reboot until removed from configuration. A tunnel server may itself be the gNMI collector, or may be an intermediate service that a gNMI collector can be configured to connect to listen for new device connections and dial through to initiate telemetry streams.

Does anyone have a message sequence chart for how gNMI dial-out would work?

Device dials tunnel server and registers itself as a tunnel client with a unique id configured on the device(operator defined, hostname, ip, serial number, etc). A collection system, either itself hosting the tunnel server, or itself a client of tunnel server, subscribes for registrations of target devices. Upon a registration, a collector can issue either a full subscription or an alias subscription as mentioned above. Continuous telemetry streaming occurs until the connection breaks, at which point the above process is repeated.

In my industry (Cable), operators are very iterested in dial-out telemetry by network elements to proxy-able collector IP addresses (or even an anycast IP), while vendors are very interested in implementing network elements with gNMI open-source libraries. I'm trying to figure out how to satisfy both sides :-)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openconfig/reference/issues/42#issuecomment-792873930, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEL4QL45XWSPKLXB3WBM5BTTCT2YFANCNFSM4DC2KQQQ .

mpatrick02702 commented 3 years ago

Is anyone working on updating the gNMI specification for dial-out?

Are there any URLs to open-source code supporting dial-out?

On Mon, Mar 8, 2021 at 10:40 PM Carl Lebsack @.***> wrote:

grpctunnel is the approach that has been proposed and discussed with multiple vendors and network operators as a general solution that addresses many different use cases, including dial-out telemetry. For the dial-out telemetry case I will try to answer the questions posed here.

On Mon, Mar 8, 2021 at 11:23 AM mpatrick02702 @.***> wrote:

@aashaikh https://github.com/aashaikh Would you explain how a "grpctunnel" would support "dial-out" gNMI? I don't see how it's applicable. How are out-of-band configured subscriptions setup?

With the tunnel mode we are proposing that a target dial-out to establish a tunnel which can trigger the start of a remote collector issuing a subscription over that tunnel. Thus gNMI semantics are identical as they are today. Device is the server, collector is the client.

Orthogonal to the dial direction (dial-in without a tunnel, vs. dial-out tunnel triggering a dial-in over the tunnel), a subscription can be configured either in the collection system that issues the subscription, or on the device via an alias, whereby the collection system issues a subscription to a named alias (e.g. "default" or "interfaces" or "inventory", etc). This allows a network operator to define the source of truth for what is collected either centrally in the management system (collector side) or distributed on the device side.

Using a tunnel allows both the change in direction of the initial dial for triggering the start of collection without altering gNMI semantics in any way, but also allows for collection from a device that is not directly network reachable in the other direction (collector to device) due to firewall or NAT and enables telemetry collection to transit public networks without the devices being directly reachable from those networks.

When does a telemetry publisher dial-out? And to where?

A configuration on the device will have a list of addresses to dial with credentials to establish one or more tunnels and register the device with one or more tunnel servers. Each configured tunnel server would be contacted upon initial configuration, loss of connection, or reboot until removed from configuration. A tunnel server may itself be the gNMI collector, or may be an intermediate service that a gNMI collector can be configured to connect to listen for new device connections and dial through to initiate telemetry streams.

Does anyone have a message sequence chart for how gNMI dial-out would work?

Device dials tunnel server and registers itself as a tunnel client with a unique id configured on the device(operator defined, hostname, ip, serial number, etc). A collection system, either itself hosting the tunnel server, or itself a client of tunnel server, subscribes for registrations of target devices. Upon a registration, a collector can issue either a full subscription or an alias subscription as mentioned above. Continuous telemetry streaming occurs until the connection breaks, at which point the above process is repeated.

In my industry (Cable), operators are very iterested in dial-out telemetry by network elements to proxy-able collector IP addresses (or even an anycast IP), while vendors are very interested in implementing network elements with gNMI open-source libraries. I'm trying to figure out how to satisfy both sides :-)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/openconfig/reference/issues/42#issuecomment-792873930>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AEL4QL45XWSPKLXB3WBM5BTTCT2YFANCNFSM4DC2KQQQ

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/openconfig/reference/issues/42#issuecomment-793333640, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGYSKXMZREFDLAUH6OGBMTTCWKC3ANCNFSM4DC2KQQQ .

gcsl commented 3 years ago

The gNMI specification is unchanged for the proposed dialout mechanism using grpctunnel. The device dials a tunnel session to a collector which is used as a trigger to make a gNMI.Subscribe back over that tunnel. There is an example implementation in github.com/openconfig/gnmi/cmd/gnmi_collector that uses github.com/openconfig/grpctunnel.

mpatrick02702 commented 3 years ago

Carl,

Thanks for the prompt response!

Mike work email: @.***

On Tue, Apr 6, 2021 at 10:35 AM Carl Lebsack @.***> wrote:

The gNMI specification is unchanged for the proposed dialout mechanism using grpctunnel. The device dials a tunnel session to a collector which is used as a trigger to make a gNMI.Subscribe back over that tunnel. There is an example implementation in github.com/openconfig/gnmi/cmd/gnmi_collector that uses github.com/openconfig/grpctunnel.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/openconfig/reference/issues/42#issuecomment-814172208, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGYSKURZW7NVB47KUAI373THML4RANCNFSM4DC2KQQQ .

bitnovel commented 2 years ago

GRPC tunnel seems like an interesting idea.

However one of the advantages of the 'dial-out' (without tunnel) is that the collector need not manage subscription details. Persistent/ Ephemeral configuration on devices made through NETCONF or other channels should suffice. In the case of GRPC tunnel, the collector will have to initiate the subscription.

On the other hand, in the tunnel approach, the collector stays in charge. Possibility of a rogue device bombarding traffic towards the collector is reduced because the collector stays in control.

balajei commented 2 years ago

There is active discussion on dial-out, with vendor engagement being driven by a couple of operators in the group. Please see openconfig/grpctunnel for the current proposed direction for supporting dial-out for gNMI.

I'm not sure I understand the comment about dial-in having scaling issues -- we are using the dial-in model with gNMI at very large scale for monitoring a variety of devices.

Hi All, I am just beginning with gNMI and understanding on config+notification+PM model for large scale distributed controller system. As said by @tsuna if we go with dial-out approach, collector can be implemented stateless for notifications+PM, so that network element data can be processed by multiple collectors through proxy/LB and can scale horizontally whenever NE increases in network. dial-out -> NE sends the data to collector and the channel or connection is closed right ?

But @aashaikh have mentioned that dial-in model is used for very large scale monitoring, it would be helpful if anyone one you provide me pointers/patterns on how we can scale the collector horizontally in this case ? dial-in -> collector opens the channel with NE and keeps the connection open right ?

With my understanding I am worried on ->in case of dial-in for notifications/PM if NE holds persistence connection with collector, how much connection/channels can a collector hold in a large scale network ?