jackc / pgx

PostgreSQL driver and toolkit for Go
MIT License
10.44k stars 828 forks source link

Ever Increasing number of Go routines with Pgx v5.5.5 and CockroachDB #1999

Open victor-ferrer-form3 opened 5 months ago

victor-ferrer-form3 commented 5 months ago

Describe the bug

Recently we upgraded our software to pgx v5.5.5 and inmediately noticed that the number of Go routines our pods use is ever increasing. This is a screenshot of our monitoring tool, depicting the go_goroutinesmetric:

image

The point where it starts to crawl up, matches our update to v5.5.2 to v5.5.5.

If we enable Pprof after the pod has spent some hours running we can see this:

goroutine profile: total 318
[...]
74 @ 0x43e32e 0x4099ad 0x4095b2 0x9529ec 0x471501
#   0x9529eb    github.com/jackc/pgx/v5/pgconn/internal/ctxwatch.(*ContextWatcher).Watch.func1+0x8b /build/vendor/github.com/jackc/pgx/v5/pgconn/internal/ctxwatch/context_watcher.go:51

To Reproduce Steps to reproduce the behavior:

If possible, please provide runnable example such as:

package main

import (
    "context"
    "log"
    "os"

    "github.com/jackc/pgx/v5"
)

func main() {
    conn, err := pgx.Connect(context.Background(), os.Getenv("DATABASE_URL"))
    if err != nil {
        log.Fatal(err)
    }
    defer conn.Close(context.Background())

    // Your code here...
}

Please run your example with the race detector enabled. For example, go run -race main.go or go test -race.

Expected behavior We would expect those watches to be cancelled/closed and the Go Routines ended.

Actual behavior Context watchers seem not to be finished properly.

Version

cockroach version details:
Build Tag:        v23.1.12
Build Time:       2023/11/09 06:15:38
Distribution:     CCL
Platform:         linux amd64 (x86_64-pc-linux-gnu)
Go Version:       go1.19.13
C Compiler:       gcc 6.5.0
Build Commit ID:  d7e9824b4cd6ebf7a8548156f2a772ae6648257d
Build Type:       release
Enabled Assertions: false
(use 'cockroach version --build-tag' to display only the build tag)
victor-ferrer-form3 commented 4 months ago

Note: Reverting to v5.5.2 solves the issue.

drakkan commented 4 months ago

Please provide more details. I don't think this happens for every query/use case. For example, I can't replicate it using pgx in database/sql mode (tested both Postgres and CockroachDB).

Probably a Unwatch call is missing in some edge case in the code changed between 5.5.2 and 5.5.5

victor-ferrer-form3 commented 4 months ago

Hello @drakkan!

One thing that we have noticed is that the only service in which we see this behavior is one that uses Batch statements. We have noticed this commit, introduced as part of the release 5.5.4 that has several changes related to Batches, although I am not sure if this causes the issue.

To try to narrow the problem down, we are going to repeat our tests with pgx v.5.5.3 and let you know of the results.

Update: pgx v.5.5.3 does not have this problem.

sean- commented 4 months ago

@victor-ferrer-form3 : have you tried with pgx v5.5.4, or had any success bisecting the problem?

victor-ferrer-form3 commented 4 months ago

Hi @sean-, Yes v.5.5.4 has the problem too. For the moment our only solution was to downgrade to v.5.5.3 and add exceptions for the security vulnerabilities it has.

victor-ferrer-form3 commented 3 months ago

Hello @drakkan, do you have any update or ETA on the fix for this? Thanks

drakkan commented 3 months ago

Hello @drakkan, do you have any update or ETA on the fix for this? Thanks

I'm not working on this, sorry. I don't use Batch statements and so I'm unable to replicate the issue in my use case