openconfig / gnmic

gNMIc is a gNMI CLI client and collector
https://gnmic.openconfig.net
Apache License 2.0
192 stars 57 forks source link

gnmi-server: cache search stops immediately if error happens in any of the subscription-cache #286

Closed fahadnaeemkhan closed 1 year ago

fahadnaeemkhan commented 1 year ago

oc_cache stores data per subscriptions so targets can be spread across multiple subscriptions (depending on configuration) and when we try to subscribes, it looks for match in all the subscriptions and create one goroutine per subscription for finding/query:

https://github.com/openconfig/gnmic/blob/e1e58b86eb33f50038c2393a260cbbfa666b3099/pkg/cache/oc_cache.go#L186

func (gc *gnmiCache) handleSingleQuery(ctx context.Context, ro *ReadOpts, ch chan *Notification) {
    if gc.debug {
        gc.logger.Printf("running single query for target %q", ro.Target)
    }

    caches := gc.getCaches(ro.Subscription)

    if gc.debug {
        gc.logger.Printf("single query got %d caches", len(caches))
    }
    wg := new(sync.WaitGroup)
    wg.Add(len(caches))

    for name, c := range caches {
        go func(name string, c *subCache) {
            defer wg.Done()
            for _, p := range ro.Paths {
                fp, err := path.CompletePath(p, nil)
                if err != nil {
                    gc.logger.Printf("failed to generate CompletePath from %v", p)
                    ch <- &Notification{Name: name, Err: err}
                    return
                }
                err = c.c.Query(ro.Target, fp,
                    func(_ []string, l *ctree.Leaf, _ interface{}) error {
                        if err != nil {
                            return err
                        } 
                                                 ....

Issue here is that these query goroutines sends errors to channel which gnmi_server is reading from and incase of cache miss from any of the goroutines, error is send to gnmi_server which returns immediately after seeing error even though data might me present under another subscription. https://github.com/openconfig/gnmic/blob/e1e58b86eb33f50038c2393a260cbbfa666b3099/pkg/app/gnmi_server.go#L683

There are multiple solutions here:

Let me know if this is making sense or more info is needed

karimra commented 1 year ago

While your description of how the oc_cache works is correct, the subscription does not fail in case data does not exist in the cache. The subscribe function fails in these cases:

The only error that could count as a cache miss here is the 3rd one (The specified target does not exist in one of the subscriptions caches). All the other errors should result in a failure of the subscription.

The case of missing target for one of the subscription scaches can be solved by calling cache.HasTarget() before running the query. Let me know if you want to give this a shot in a PR.

fahadnaeemkhan commented 1 year ago

Yeah I was trying to refer to case 3.

Sure, would love to submit a PR for this.

fahadnaeemkhan commented 1 year ago

PR submitted: https://github.com/openconfig/gnmic/pull/288