hibiken / asynq

Simple, reliable, and efficient distributed task queue in Go
MIT License
9k stars 658 forks source link

[BUG] redis: discarding bad PubSub connection #801

Open BrandSnob opened 5 months ago

BrandSnob commented 5 months ago

Describe the bug Hello, The bug seems like to be a famous bug with go-redis v8 BUT the problem is that there is no use of go-redis v8 in my project, ONLY go-redis v9 The error happens randomly, rarely at the starts and sometimes after few hours!

To Reproduce Steps to reproduce the behavior (Code snippets if applicable): Server codes is as below, however, the server is not only asynq server and it actually does other jobs in goroutines Server codes:

    // limit 1 event per 5 minutes
    limiter = rate.NewLimiter(rate.Every(5*time.Minute), 2)
    if ctx == nil {
        ctx = context.Background()
    }
    // Build the worker server
    srv := asynq.NewServer(
        asynq.RedisClientOpt{
            Addr:     fmt.Sprintf("%s:%d", c.Config.Cache.Hostname, c.Config.Cache.Port),
            DB:       db,
            Password: c.Config.Cache.Password,
        },
        asynq.Config{
            Concurrency:    1,
            BaseContext:    func() context.Context { return ctx },
            IsFailure:      func(err error) bool { return !IsRateLimitError(err) },
            RetryDelayFunc: retryDelay,
            Queues: map[string]int{
                "critical": 6,
                "default":  3,
                "low":      1,
            },
            Logger:                   c.Web.Logger,
            ShutdownTimeout:          0,
            DelayedTaskCheckInterval: 0,
            GroupGracePeriod:         10 * time.Second,
            GroupMaxDelay:            10 * time.Second,
            GroupMaxSize:             0,
            GroupAggregator:          nil,
        },
    )

    // Map task types to the handlers
    mux := asynq.NewServeMux()
    mux.Handle(..., ...)
    if err := srv.Run(mux); err != nil {
        log.Fatalf("could not run worker server: %v", err)
    }

the BaseContext is created once in the main and used over several parts of the app

Expected behavior Just keep running as it usually does. or at least, reconnect once this issue happens! It keep stuck like that all the whole time until I restart the server container (not redis or anything else)

Environment (please complete the following information):

Additional context Logs:

redis: 2024/01/09 13:26:38 pubsub.go:168: redis: discarding bad PubSub connection: write tcp 172.23.0.16:52106->172.23.0.8:6379: i/o timeout
{"time":"2024-01-09T13:26:47.795096432","level":"WARN","prefix":"echo","file":"log.go","line":"169","message":"recoverer: could not list lease expired tasks: INTERNAL_ERROR: redis eval error: dial tcp: lookup service_cache: i/o timeout"}
asynq: pid=15 2024/01/09 16:32:52.600476 WARN: Scheduler could not write heartbeat data: UNKNOWN: redis command error: ZADD failed: dial tcp: lookup service_cache: i/o timeout
asynq: pid=15 2024/01/09 16:32:58.352098 WARN: Scheduler could not write heartbeat data: UNKNOWN: redis command error: ZADD failed: dial tcp: lookup service_cache: i/o timeout
{"time":"2024-01-09T20:02:58.361862078","level":"WARN","prefix":"echo","file":"log.go","line":"169","message":"recoverer: could not list lease expired tasks: INTERNAL_ERROR: redis eval error: dial tcp: lookup service_cache: i/o timeout"}
{"time":"2024-01-09T20:05:26.076166895","level":"ERROR","prefix":"echo","file":"log.go","line":"176","message":"Failed to forward scheduled tasks: INTERNAL_ERROR: INTERNAL_ERROR: redis eval error: dial tcp: lookup service_cache: i/o timeout"}

And it goes on with similar logs

kamikazechaser commented 5 months ago

Check if the issue persists when you use the following versions:

# asynq
go get github.com/hibiken/asynq@master
# asynq/x
go get github.com/hibiken/asynq/x@master
#asynq/tools
go get github.com/hibiken/asynq/tools@master
BrandSnob commented 3 weeks ago

Check if the issue persists when you use the following versions:

# asynq
go get github.com/hibiken/asynq@master
# asynq/x
go get github.com/hibiken/asynq/x@master
#asynq/tools
go get github.com/hibiken/asynq/tools@master

Thanks, I did made some changes and also updated the packages you mentioned and I have not yet encountered this issue.