riverqueue / river

Fast and reliable background jobs in Go
https://riverqueue.com
Mozilla Public License 2.0
3.51k stars 94 forks source link

Graceful shutdown - StopAndCancel panics #549

Closed krhubert closed 2 months ago

krhubert commented 2 months ago

I'm not sure why client.StopAndCancel panics while client.Stop returns without any error

package main

import (
    "context"
    "fmt"
    "os"
    "os/signal"
    "sync"
    "syscall"
    "time"

    "github.com/jackc/pgx/v5/pgxpool"
    "github.com/riverqueue/river"
    "riverqueue.com/riverpro/driver/riverpropgxv5"
)

func main() {
    ctx, cancel := context.WithCancel(context.Background())
    defer cancel()

    pool, err := pgxpool.New(ctx, "postgres://postgres:password@localhost:5432/postgres")
    die(err)
    defer pool.Close()

    riverClient, err := river.NewClient(
        riverpropgxv5.New(pool, nil),
        &river.Config{
            Queues: map[string]river.QueueConfig{
                river.QueueDefault: {MaxWorkers: 100},
            },
            Workers: river.NewWorkers(),
        },
    )
    die(err)
    riverClient.Start(ctx)
    fmt.Println("started")

    quit := make(chan os.Signal, 1)
    signal.Notify(quit, os.Interrupt, syscall.SIGTERM)
    <-quit
    cancel()

    shutdownCtx, cancel := context.WithTimeout(context.Background(), 1*time.Second)
    defer cancel()
    fmt.Println("shutting down")

    var wg sync.WaitGroup
    wg.Add(1)
    go func() {
        // die(riverClient.Stop(shutdownCtx)) // -> OK
        riverClient.StopAndCancel(shutdownCtx) // -> panic
        wg.Done()
    }()
    wg.Wait()
    fmt.Println("shutdown complete")
}

func die(err error) {
    if err == nil {
        return
    }
    panic(err)
}

stacktrace

started
^Cshutting down
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x829d0d]

goroutine 23 [running]:
github.com/riverqueue/river.(*Client[...]).StopAndCancel(0x0, {0x9ec8e8?, 0xc00017bf10})
        /home/user/pkg/mod/github.com/riverqueue/river@v0.11.4/client.go:838 +0xad
main.main.func1()
        /home/user/src/rift/archipelagoapp/backend/cmd/main.go:51 +0x2f
created by main.main in goroutine 1
        /home/user/src/rift/archipelagoapp/backend/cmd/main.go:49 +0x46e
exit status 2

go.mod

module main

go 1.23.0

require (
    github.com/jackc/pgx/v5 v5.6.0
    github.com/riverqueue/river v0.11.4
    riverqueue.com/riverpro/driver/riverpropgxv5 v0.2.1
)

require (
    github.com/davecgh/go-spew v1.1.1 // indirect
    github.com/jackc/pgpassfile v1.0.0 // indirect
    github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761 // indirect
    github.com/jackc/puddle/v2 v2.2.1 // indirect
    github.com/pmezard/go-difflib v1.0.0 // indirect
    github.com/riverqueue/river/riverdriver v0.11.4 // indirect
    github.com/riverqueue/river/riverdriver/riverpgxv5 v0.11.4 // indirect
    github.com/riverqueue/river/rivershared v0.11.4 // indirect
    github.com/riverqueue/river/rivertype v0.11.4 // indirect
    github.com/stretchr/testify v1.9.0 // indirect
    github.com/tidwall/gjson v1.17.1 // indirect
    github.com/tidwall/match v1.1.1 // indirect
    github.com/tidwall/pretty v1.2.1 // indirect
    go.uber.org/goleak v1.3.0 // indirect
    golang.org/x/crypto v0.25.0 // indirect
    golang.org/x/sync v0.8.0 // indirect
    golang.org/x/text v0.17.0 // indirect
    gopkg.in/yaml.v3 v3.0.1 // indirect
    riverqueue.com/riverpro v0.2.1 // indirect
    riverqueue.com/riverpro/driver v0.2.1 // indirect
)
bgentry commented 2 months ago

@krhubert based on the trace, I'm guessing you might be calling StopAndCancel on a client which was never actually started? Obviously this shouldn't panic regardless, but I just want to confirm that's what's happening before attempting a fix.

A Client only needs to be started if it is going to be working jobs (not just inserting them), and it only needs to be stopped if it has been started.

krhubert commented 2 months ago

@bgentry I'm not sure if we talk about the same start. In the code above there's a riverClient.Start call.

Btw, this is a full source code that can be used to reproduce this panic.

Edited:

I haven't checked the error returned by Start :|

at least one Worker must be added to the Workers bundle

So you're right the client wasn't started at all.

What I did was - I checked if Stop worked first and since the Stop call did not panic nor return any error for not started client I assumed the problem was in the StartAndClose

brandur commented 2 months ago

Opened #557 to fix the problem of a panic in case of a call to StopAndCancel before Start.