karimra / gnmic

gNMIc is a gNMI CLI client and collector
https://gnmic.kmrd.dev
Apache License 2.0
218 stars 32 forks source link

gnmic never closes connection for Target.Client #513

Closed lukaszo closed 2 years ago

lukaszo commented 2 years ago

In our long-running process, we are using gnmic as a library. We are connecting to Arista EOS and we've noticed that connections are never closed:

sh-4.2# netstat -no |grep 6030
tcp6       0      0 192.168.178.10:6030     192.168.178.5:53896     ESTABLISHED keepalive (10.10/0/0)
tcp6       0      0 192.168.178.10:6030     192.168.178.5:53894     ESTABLISHED keepalive (9.84/0/0)
tcp6       0      0 192.168.178.10:6030     192.168.178.5:53898     ESTABLISHED keepalive (10.35/0/0)
tcp6       0      0 192.168.178.10:6030     192.168.178.5:53892     ESTABLISHED keepalive (9.33/0/0)
tcp6       0      0 192.168.178.10:6030     192.168.178.5:53890     ESTABLISHED keepalive (8.81/0/0)

All we do is basically this in a loop:

conf := config.New()
conf.GetTargets()
tgt := target.NewTarget(conf.Targets[socket])
tgt.CreateGNMIClient(ctx)
tgt.Get(ctx, req)
tgt.Stop()

It looks like conn.Close() is never executed for the selected connection.

karimra commented 2 years ago

Hi,

Thanks for taking the time to write this issue.

First let me say that using gnmic as lib needs some work and proper documentation.

The conn.Close() you are pointing at is intended to close the secondary connection ( when the primary one is successful ) if the target configuration has a comma separated list of addresses. Is it the case for your targets?

tgt.Stop() is meant to stop a gnmi connection used to run a subscribe RPC, if you need to close the connection the gnmi client uses for other RPCs, canceling the context ctx should do it.

lukaszo commented 2 years ago

First let me say that using gnmic as lib needs some work and proper documentation.

Yeah, we've noticed :D. It's a great CLI, though!

The conn.Close() you are pointing at is intended to close the secondary connection ( when the primary one is successful ) if the target configuration has a comma separated list of addresses. Is it the case for your targets?

No, we have only one address. Btw, In case of more than one address I believe there is a small chance for a race condition there. I mean, one more gorutine can send a connection through the channel before done is closed.

tgt.Stop() is meant to stop a gnmi connection used to run a subscribe RPC, if you need to close the connection the gnmi client uses for other RPCs, canceling the context ctx should do it.

According to grpc module documentation, when DialContext is used in a non blocking way(like here) ctx does not act against the connection. If it did, this cancel would make GNMIClient unusable.

karimra commented 2 years ago

Btw, In case of more than one address I believe there is a small chance for a race condition there. I mean, one more gorutine can send a connection through the channel before done is closed.

You are right, this was added with the assumption that only one of the connections will be successful. I can add a step that checks if the target has a gNMI client, if true close the "secondary connection"

According to grpc module documentation, when DialContext is used in a non blocking way(like here) ctx does not act against the connection. If it did, this cancel would make GNMIClient unusable.

Maybe I misunderstood your requirement, if this is a long running process, there is no need to recreate a new gNMI client for each RPC. tgt.Get() will reuse the existing gNMI client tgt.Client. You will just need to supply a new ctx to each tgt.Get()

I agree that there is no way to gracefully close the underlying gRPC connection right now, so what I can do is:

Do you think this will make it more useful for your use case ?

lukaszo commented 2 years ago
  • Adding a target.Close() method that closes the gRPC connection if it's not nil

This sounds good

karimra commented 2 years ago

Can you give the last release (0.21.0) a try? I added the target.Stop() method as well a target.ConnState() method that returns the connection state as a string.

here is a sample of how it can be used to run a Get and a Set RPCs:

package main

import (
    "context"
    "log"
    "time"

    "github.com/AlekSi/pointer"
    "github.com/karimra/gnmic/target"
    "github.com/karimra/gnmic/types"
    "github.com/karimra/gnmic/utils"
    "github.com/openconfig/gnmi/proto/gnmi"
)

func main() {
    ctx, cancel := context.WithCancel(context.Background())
    defer cancel()

    // build the target configuration
    tc := &types.TargetConfig{
        Name:       "srl1",
        Address:    "srl1:57400",
        Username:   pointer.ToString("admin"),
        Password:   pointer.ToString("admin"),
        SkipVerify: pointer.ToBool(true),
        Timeout:    10 * time.Second,
    }

    // create the target
    t := target.NewTarget(tc)

    // parse path
    p, err := utils.ParsePath("/system/name/host-name")
    if err != nil {
        panic(err)
    }
    //
    for i := 0; i < 10; i++ {
        log.Printf("iteration %d", i)
        // create a gNMI client
        err := t.CreateGNMIClient(ctx)
        if err != nil {
            panic(err)
        }
        log.Printf("target %q connectivity state: %q", t.Config.Name, t.ConnState())

        // send gNMI GetRequest
        getRespone, err := t.Get(ctx, &gnmi.GetRequest{
            Path:     []*gnmi.Path{p},
            Encoding: gnmi.Encoding_JSON_IETF,
        })
        if err != nil {
            panic(err)
        }
        log.Printf("target %q connectivity state: %q", t.Config.Name, t.ConnState())
        log.Printf("gNMI GetResponse: %s", getRespone)

        // change the target hostname
        hostname := "\"srl1\""
        if i%2 == 0 {
            hostname = "\"srl2\""
        }
        setResponse, err := t.Set(ctx, &gnmi.SetRequest{
            Update: []*gnmi.Update{
                {
                    Path: p,
                    Val: &gnmi.TypedValue{
                        Value: &gnmi.TypedValue_JsonIetfVal{
                            JsonIetfVal: []byte(hostname),
                        },
                    },
                },
            },
        })
        if err != nil {
            panic(err)
        }
        log.Printf("target %q connectivity state: %q", t.Config.Name, t.ConnState())
        log.Printf("gNMI SetResponse: %s", setResponse)
        // close the target connection
        err = t.Close()
        if err != nil {
            panic(err)
        }
        log.Printf("target %q connectivity state: %q", t.Config.Name, t.ConnState())
        time.Sleep(100 * time.Millisecond)
    }
}
karimra commented 2 years ago

I believe this was solved, feel free to reopen if you see that this behavior is still there.