openconfig / gnmic

gNMIc is a gNMI CLI client and collector
https://gnmic.openconfig.net
Apache License 2.0
171 stars 55 forks source link

race condition when client disconnects from gnmic-server #304

Closed fahadnaeemkhan closed 9 months ago

fahadnaeemkhan commented 9 months ago

I was debugging a issue related to dead-lock in gNMI-server cache and built gnmic with -race flag. As soon as I closed the gnmic that is a client to gnmic-server I could see the following logs:

==================
WARNING: DATA RACE
Write at 0x00c00321a330 by goroutine 2508:
  github.com/openconfig/gnmic/app.(*App).handleStreamSubscriptionRequest()
      /home/equinix/development/gnmic/app/gnmi_server.go:703 +0xc33
  github.com/openconfig/gnmic/app.(*App).Subscribe.func8()
      /home/equinix/development/gnmic/app/gnmi_server.go:524 +0x44

Previous write at 0x00c00321a330 by goroutine 2509:
  github.com/openconfig/gnmic/app.(*App).handleStreamSubscriptionRequest.func2()
      /home/equinix/development/gnmic/app/gnmi_server.go:687 +0x1890
  github.com/openconfig/gnmic/app.(*App).handleStreamSubscriptionRequest.func5()
      /home/equinix/development/gnmic/app/gnmi_server.go:698 +0x41

Goroutine 2508 (running) created at:
  github.com/openconfig/gnmic/app.(*App).Subscribe()
      /home/equinix/development/gnmic/app/gnmi_server.go:524 +0x1195
  github.com/openconfig/gnmi/proto/gnmi._GNMI_Subscribe_Handler()
      /home/equinix/go/pkg/mod/github.com/openconfig/gnmi@v0.9.1/proto/gnmi/gnmi_grpc.pb.go:226 +0xb5
  github.com/openconfig/gnmic/app.(*App).gRPCServerOpts.(*ServerMetrics).StreamServerInterceptor.func1()
      /home/equinix/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-prometheus@v1.2.0/server_metrics.go:121 +0x15e
  google.golang.org/grpc.(*Server).processStreamingRPC()
      /home/equinix/go/pkg/mod/google.golang.org/grpc@v1.56.1/server.go:1638 +0x2010
  google.golang.org/grpc.(*Server).handleStream()
      /home/equinix/go/pkg/mod/google.golang.org/grpc@v1.56.1/server.go:1718 +0xf64
  google.golang.org/grpc.(*Server).serveStreams.func1.1()
      /home/equinix/go/pkg/mod/google.golang.org/grpc@v1.56.1/server.go:959 +0xe6

Goroutine 2509 (running) created at:
  github.com/openconfig/gnmic/app.(*App).handleStreamSubscriptionRequest()
      /home/equinix/development/gnmic/app/gnmi_server.go:625 +0x79b
  github.com/openconfig/gnmic/app.(*App).Subscribe.func8()
      /home/equinix/development/gnmic/app/gnmi_server.go:524 +0x44
==================

From the code it seams obvious that err variable is shared b/w two goroutines without protection.

One fix could be to use channel b/w the goroutines to communicate err.

Let me know if that makes sense, i can work on the patch.

GNMIC VESION: 0.32.0

karimra commented 9 months ago

Can you try with the latest version (0.34.2) to make sure the issue is still there ?

fahadnaeemkhan commented 9 months ago

Yeah issue exists on v0.34.2 too

==================
WARNING: DATA RACE
Write at 0x00c00260a640 by goroutine 5408:
  github.com/openconfig/gnmic/pkg/app.(*App).handleStreamSubscriptionRequest()
      /home/equinix/development/gnmic/pkg/app/gnmi_server.go:705 +0xc33
  github.com/openconfig/gnmic/pkg/app.(*App).Subscribe.func8()
      /home/equinix/development/gnmic/pkg/app/gnmi_server.go:526 +0x44

Previous write at 0x00c00260a640 by goroutine 5409:
  github.com/openconfig/gnmic/pkg/app.(*App).handleStreamSubscriptionRequest.func2()
      /home/equinix/development/gnmic/pkg/app/gnmi_server.go:689 +0x1890
  github.com/openconfig/gnmic/pkg/app.(*App).handleStreamSubscriptionRequest.func5()
      /home/equinix/development/gnmic/pkg/app/gnmi_server.go:700 +0x41

Goroutine 5408 (running) created at:
  github.com/openconfig/gnmic/pkg/app.(*App).Subscribe()
      /home/equinix/development/gnmic/pkg/app/gnmi_server.go:526 +0x1195
  github.com/openconfig/gnmi/proto/gnmi._GNMI_Subscribe_Handler()
      /home/equinix/go/pkg/mod/github.com/openconfig/gnmi@v0.10.0/proto/gnmi/gnmi_grpc.pb.go:225 +0xb5
  github.com/openconfig/gnmic/pkg/app.(*App).gRPCServerOpts.(*ServerMetrics).StreamServerInterceptor.func1()
      /home/equinix/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-prometheus@v1.2.0/server_metrics.go:121 +0x15e
  google.golang.org/grpc.(*Server).processStreamingRPC()
      /home/equinix/go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:1644 +0x2070
  google.golang.org/grpc.(*Server).handleStream()
      /home/equinix/go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:1741 +0x1292
  google.golang.org/grpc.(*Server).serveStreams.func1.1()
      /home/equinix/go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:986 +0x11d

Goroutine 5409 (running) created at:
  github.com/openconfig/gnmic/pkg/app.(*App).handleStreamSubscriptionRequest()
      /home/equinix/development/gnmic/pkg/app/gnmi_server.go:627 +0x79b
  github.com/openconfig/gnmic/pkg/app.(*App).Subscribe.func8()
      /home/equinix/development/gnmic/pkg/app/gnmi_server.go:526 +0x44
==================
fahadnaeemkhan commented 9 months ago

FYI: there are other race warning too when we start the gnmic-sever . I think all needs to be addressed.

karimra commented 9 months ago

A simple fix would be to create a new error variable for each goroutine that gets started here

fahadnaeemkhan commented 9 months ago

Wouldn't it affect defer function? As then err variable inside the goroutines would be contained to that goroutine's scope?

    defer func() {
        if err == nil {
            a.Logger.Printf("subscription request from %q to target %q processed", peer.Addr, sc.target)
            return
        }
        if errors.Is(err, context.Canceled) {
            a.Logger.Printf("subscription to target %q canceled", sc.target)
            sc.errChan <- err
            return
        }
        if err != nil {
            a.Logger.Printf("error processing STREAM subscription to target %q: %v", sc.target, err)
            sc.errChan <- err
            return
        }
    }()
karimra commented 9 months ago

right, haven't looked at that part of the code in a long time... channel it is then. Thanks

fahadnaeemkhan commented 9 months ago

Just wondering, what is use of sc.errChan as this is send-only channel. The buffer size is 3 and no one is reading from it so this could block 4th sender.