Closed robshakir closed 4 months ago
Quoted from an internal source for the background
OK, this was a fun one to track down -- especially without the actual input that the device was providing. Thanks for the example output in the linked bug.
The root cause was that we didn't handle leaf-list values specifically in the case that we got an update for a compressed out value.
Wait, what? What does that even mean?
Well. Recall that compression of an OpenConfig schema means that for a leaf /a/b/config/leaf and /a/b/state/leaf then we make things neater for users by being able to call this leaf A.B.Leaf() rather than A.B.Config.Leaf(). This makes sense when you're dealing with either config, or telemetry. In ONDATRA tests, ygnmi handily lets us deal with both -- so we have a neat trick where we say A.B.Leaf.Config() or A.B.Leaf.State() depending on which one you care about. Now -- ONDATRA users mostly care about getting telemetry, so the schema that we generate is one where we prefer state leaves over config ones -- that means that when we say A.B.Leaf we assume that the user really means /a/b/state/leaf. Cool - nice, we saved folks some typing.
Now -- when we do a Watch (or any kind of subscribe) -- the user tells us that they want A.B.Leaf.State() and so we go and subscribe, but in this case, we are really asking for A.B.State() -- which makes us subscribe to /a/b so that we get both the /a/b/config and /a/b/state values. The switch is doing the right thing and returning us both /a/b/config/leaf and /a/b/state/leaf -- excellent.
Unfortunately, when we get the update for /a/b/config/leaf -- we were saying "uh, this is actually a shadow path" - i.e., one that we removed. The right behaviour in this case is to say "well, the user just told us that they wanted state values, and the shadow path is config path, so this is a NOOP", which we successfully did if, but only if, the node that we were dealing with was a leaf. Hence, the bug was relatively easy to actually fix (@dloher actually identified this at the time we first looked at it.).
What was harder was to figure out exactly the trigger, and then create some test cases that repro'd the bug before fixing it. :-)
This PR fixes this bug (and another panic bug that I found whilst reproducing this one).
Thanks for the review @DanG100.