openconfig / ygot

A YANG-centric Go toolkit - Go/Protobuf Code Generation; Validation; Marshaling/Unmarshaling
Apache License 2.0
284 stars 106 forks source link

🐜: squash bug related to leaf-lists and shadow schemas. #980

Closed robshakir closed 1 month ago

robshakir commented 1 month ago
 * (M) util/reflect.go
   - Fix missing parameter to string output.
 * (M) ytypes/{leaf,node,node_test,schema_test,util_schema}.go
   - Two bugs fixed.
     1. In the case that one calls `SetNode` with something that was
        not a gNMI `TypedValue` and was not JSON, then we could panic
        when attempting to type cast it.
     2. If a schema was generated that uses path compression, and
        `SetNode` was called for a node that was compressed out, but
        `PreferShadowPaths` was not set AND this node was a leaf-list
        then rather than performing a no-op (the expected behaviour,
        since we asked to set a node that was not the 'preferred' thing
        to set), then we would bail with an error since the schema was
        not a leaf schema. Small fix, lots of testing to find the root
        cause here.
 * (M) ytypes/schema_test/set_test.go
   - Bug reproduction.
coveralls commented 1 month ago

Coverage Status

coverage: 88.807% (+0.004%) from 88.803% when pulling c37741e2e2e5b28bbff295edae962e98e07e4cf9 on debug-llshadow into 0ff740a7de89d03593e43f105f940953c24af082 on master.

robshakir commented 1 month ago

Quoted from an internal source for the background

OK, this was a fun one to track down -- especially without the actual input that the device was providing. Thanks for the example output in the linked bug.

The root cause was that we didn't handle leaf-list values specifically in the case that we got an update for a compressed out value.

Wait, what? What does that even mean?

Well. Recall that compression of an OpenConfig schema means that for a leaf /a/b/config/leaf and /a/b/state/leaf then we make things neater for users by being able to call this leaf A.B.Leaf() rather than A.B.Config.Leaf(). This makes sense when you're dealing with either config, or telemetry. In ONDATRA tests, ygnmi handily lets us deal with both -- so we have a neat trick where we say A.B.Leaf.Config() or A.B.Leaf.State() depending on which one you care about. Now -- ONDATRA users mostly care about getting telemetry, so the schema that we generate is one where we prefer state leaves over config ones -- that means that when we say A.B.Leaf we assume that the user really means /a/b/state/leaf. Cool - nice, we saved folks some typing.

Now -- when we do a Watch (or any kind of subscribe) -- the user tells us that they want A.B.Leaf.State() and so we go and subscribe, but in this case, we are really asking for A.B.State() -- which makes us subscribe to /a/b so that we get both the /a/b/config and /a/b/state values. The switch is doing the right thing and returning us both /a/b/config/leaf and /a/b/state/leaf -- excellent.

Unfortunately, when we get the update for /a/b/config/leaf -- we were saying "uh, this is actually a shadow path" - i.e., one that we removed. The right behaviour in this case is to say "well, the user just told us that they wanted state values, and the shadow path is config path, so this is a NOOP", which we successfully did if, but only if, the node that we were dealing with was a leaf. Hence, the bug was relatively easy to actually fix (@dloher actually identified this at the time we first looked at it.).

What was harder was to figure out exactly the trigger, and then create some test cases that repro'd the bug before fixing it. :-)

This PR fixes this bug (and another panic bug that I found whilst reproducing this one).

robshakir commented 1 month ago

Thanks for the review @DanG100.