nats-io / nats.go

Golang client for NATS, the cloud native messaging system.
https://nats.io
Apache License 2.0
5.53k stars 696 forks source link

nil pointer dereference #1527

Open mvrhov opened 9 months ago

mvrhov commented 9 months ago

Observed behavior

We are getting the following tack trace on some of our servers when we try to unsubscribe. I'd guess that info.Config is nil, but I'm unable to confirm.

[signal SIGSEGV: segmentation violation code=0x1 addr=0xc8 pc=0xc8ccca]

goroutine 8038 [running]:
github.com/nats-io/nats%2ego.(*js).upsertConsumer(0xc0002a2000, {0xc02b6d4130, 0x9}, {0xc000883078, 0x8}, 0xc010325a40, {0x0, 0x0, 0x0})
    /root/go/pkg/mod/github.com/nats-io/nats.go@v1.31.0/jsm.go:439 +0x80a
github.com/nats-io/nats%2ego.(*js).UpdateConsumer(0xc0377869a0?, {0xc02b6d4130?, 0x0?}, 0x108086f?, {0x0?, 0x0?, 0x0?})
    /root/go/pkg/mod/github.com/nats-io/nats.go@v1.31.0/jsm.go:369 +0x85

The relevant code from our side is.

kv.updateConsumer("empty")

func (kv *KV[T]) updateConsumer(newKey string) error {
    subjects := make([]string, 1, len(kv.keys)+1)
    subjects[0] = kv.prefix + newKey

    kv.l.RLock()
    for id := range kv.keys {
        subjects = append(subjects, kv.prefix+id)
    }
    kv.l.RUnlock()

    // copy of config is needed, because we update the consumer from different go routines
    config := kv.consumerConfig
    config.FilterSubjects = subjects
    _, err := kv.js.UpdateConsumer(kvBucketNamePre+kv.bucketName, &config)
    if err != nil {
        return fmt.Errorf("update consumer: %w", err)
    }

    return nil
}

Expected behavior

no panic

Server and client version

The client is 1.31, however the code in 1.32 is the same. Server 2.10.9

Host environment

No response

Steps to reproduce

I have no idea on how to reproduce this as it's also sporadic on our end.

edit: changed the server version from 2.10.8 to 2.10.9 (it automatically updated with our release , as we set the version in docker-compose to 2.10) as this is the version we were running

piotrpio commented 9 months ago

Hello @mvrhov, thank you for reporting the issue.

That does look like we're getting nil info from the server - I'll investigate and let you know.

mvrhov commented 9 months ago

If it helps. There is a bunch of those in the NATS server log at the time of above crash. They start approximately 40s before crash. I did turn the feature flag of at the time those messages start in the NATS log, and turning feature off results in mass unsubscribe from multiple servers and multiple subjects.

[1] 2024/01/15 06:21:29.036589 [WRN] Internal subscription on "$JS.API.CONSUMER.CREATE.KV_STATUS.AV7oozAP3wLqzUtRQqMAUP" took too long: 2.105295582s
[1] 2024/01/15 06:21:29.036602 [WRN] X.Y.Z.W:37370 - cid:60 - Readloop processing time: 2.105367662s
[1] 2024/01/15 06:21:30.396007 [WRN] Internal subscription on "$JS.API.CONSUMER.CREATE.KV_STATUS.0jsgR82U" took too long: 3.156122306s
[1] 2024/01/15 06:21:30.396041 [WRN] Internal subscription on "$JS.API.CONSUMER.CREATE.KV_STATUS.0jsgR82U" took too long: 3.156166899s