Clarifying Proto Encoding

pedroaston commented 1 year ago

Today in section 2.3.3 Protobuf we don't clarify what type of gpb to use.
Cisco for example in dial-out mode provides both gpb-self-describing and gpb-compact.

Problem

Because there is no mention on the type of encoding, vendor are implementing different types when referring to Proto encoding:

Cisco uses gpb-self-describing
Nokia uses gpb-compact

Proposal

There are mainly two options:

have 2 types of encodings (compact and self-described KV)
have the proto encoding specifically use gpb-compact, since gpb-self-describing performance is similar to using json_ietf encoding.

Thanks for the attention :)

gcsl commented 1 year ago

Section 2.3.3 was clarified in https://github.com/openconfig/reference/pull/151 and is explicit about the PROTOBUF encoding requiring the use of the TypedValue message on a per leaf basis.

https://github.com/openconfig/gnmi/blob/master/proto/gnmi/gnmi.proto#L108

There is no reference to a compact encoding in gNMI.

hellt commented 1 year ago

@gcsl does it mean that when the gNMI client sets PROTO encoding in their get/set/sub request the target is not expected to use typedvalue of any_val or proto_bytes where the data is binary encoded protobuf with OOB agreed schema?

I'd like to see if there are use cases left for any_val and proto_bytes and when they should/may be used

gcsl commented 1 year ago

That is correct. The gNMI specification prescribes the formatting for PROTO explicitly. An OOB agreed schema would be unspecified and not standardized, in clear contention with the goal to standardize the API. While there was some early consideration for alternate encodings, these were never fleshed out and thus do not yet have a place in the formal specification. The early reference to OOB agreed schema was targeted to specific experimental data sets, namely AFT, which can be much larger than state from most OC models. It was never intended as an alternative for all OC modeled data.

This is not to say that further improvements cannot be made to the specification but given the number of participants on both sides of the API, an unspecified or underspecified API is quite problematic.

pedroaston commented 1 year ago

@gcsl Thanks for the remark! I understand now that by using TypedValue means using the self-describing proto (2.3 table is clear about it).

Although using compact mode with *.proto file per yang model would allow a more efficient data exchange, at the cost of higher maintenance at the collector. Still that could be taken into account since current protobuf TypedValue performance is similar to json.

Some benchmarking i have made with Nokia gnmi implementation, the volume of telemetry data on the wire decreased by 35% with the compact version

gcsl commented 1 year ago

We had done similar benchmarking early on with gNMI and precursors and I fully agree that there are more efficient ways to encode the data. The primary issue is one of separating the collection infra from full schema awareness. While end-clients will always need full schema awareness to leverage the subsets of data they care about, the middleware collection layer and debugging tools do not when using a self-describing data format. With the relatively constant churn in OC modeling and the need to rapidly iterate in development cycles on both sides of the ecosystem, we opted to fall on the side of flexibility at the expense of encoding efficiency. We have seen in the course of OC model development the occasional backward incompatible change. From an operator perspective, where devices from multiple vendors, each which could be running multiple versions across their deployed devices, the self-describing schema does not run into the issue of needing to support multiple incompatible proto versions.

In terms of telemetry volume, we attempted to focus our solutions to problem on not sending unchanged data with the use of event-driven-telemetry (ON_CHANGE) and duplicate suppression.

hellt commented 1 year ago

@gcsl should OC propose a way for clients to indicate if they want for gNMI-target to use binary encoded protobuf if the end target supports it?

Let's take a vendor who implements both schema-unaware proto encoding (this is the one using scalar types from the typedvalue enum) and schema-aware model with any_val (or proto_bytes). How with gnmi 0.8.0 a client can force the target to use one method over the other.

In SR Linux we had to create proprietary encodings to provide custom encoding variations, but maybe there should be a more generic way. Maybe an extension if not a separate encoding value?

gcsl commented 1 year ago

gNMI would support proprietary encodings for a different origin (not "openconfig") that would be akin to having a proprietary schema. I think there would be value in having a more compact representation generally, but there are lots of semantic decisions that would need to be fully specified with consensus in order to be standardized.

openconfig / reference

Clarifying Proto Encoding #173

Problem

Proposal