nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
16.03k stars 1.41k forks source link

PeerInfo Lag contains exceptionally large uint64 value #2332

Closed scottf closed 3 years ago

scottf commented 3 years ago

Defect

THIS ISSUE IS AWAITING MORE INFO

Replica.PeerInfo.Lag value is 18446744073709551614 (fffffffffffffffe) which seems like an overflow?

From a user:

This issue occurs especially when we purge some data in stream and starting sequence number is other than 1 for ephemeral consumer with deliver policy as all.

Versions of nats-server and affected client libraries used:

Server: Java Client: 2.11.4

OS/Container environment:

Steps or code to reproduce the issue:

Expected result:

Actual result:

ripienaar commented 3 years ago

I tired a few things based on the description and cant trigger this case where the server would send that.

Whats interesting is that its said this is for ephemerals - ephemerals are always R1 and there's never any lag in theory, so the value will never be sent - always 0. We could of course have a bug where we decrement a 0 uint - in go though I think that would panic so probably thats not what the server is doing.

ripienaar commented 3 years ago

Ah, I can trigger - probably the same issue - in another way - stream create:

$ nats s rm X -f;nats s add X --config x.conf --trace

with conf:

{
  "name": "X",
  "subjects": [
    "js.in.x"
  ],
  "retention": "limits",
  "max_consumers": -1,
  "max_msgs_per_subject": -1,
  "max_msgs": -1,
  "max_bytes": -1,
  "max_age": 0,
  "max_msg_size": -1,
  "storage": "file",
  "discard": "old",
  "num_replicas": 3,
  "duplicate_window": 120000000000
}

every 10 or so tries of this command I get:

{
    "type": "io.nats.jetstream.api.v1.stream_create_response",
    "config": {
        "name": "X",
        "subjects": ["js.in.x"],
        "retention": "limits",
        "max_consumers": -1,
        "max_msgs": -1,
        "max_bytes": -1,
        "max_age": 0,
        "max_msgs_per_subject": -1,
        "max_msg_size": -1,
        "discard": "old",
        "storage": "file",
        "num_replicas": 3,
        "duplicate_window": 120000000000
    },
    "created": "2021-07-01T07:05:37.975672323Z",
    "state": {
        "messages": 0,
        "bytes": 0,
        "first_seq": 0,
        "first_ts": "0001-01-01T00:00:00Z",
        "last_seq": 0,
        "last_ts": "0001-01-01T00:00:00Z",
        "consumer_count": 0
    },
    "cluster": {
        "name": "lon",
        "leader": "n1-lon",
        "replicas": [{
            "name": "n2-lon",
            "current": true,
            "active": 245390
        }, {
            "name": "n3-lon",
            "current": false,
            "active": 1625123138370448476
        }]
    }
}

note the active value in the last peer.

gstaware commented 3 years ago

Server version: nats-server: v2.2.4

ripienaar commented 3 years ago

This is some variant of

type thing struct {
    Val uint64
}

func main() {
    x := uint64(1)
    y := uint64(2)

    bug := x - y

    fmt.Printf("uint64 - uint64: %d\n", int(bug))

    j, _ := json.Marshal(&thing{Val: bug})
    fmt.Printf("as json: %s\n", string(j))
}

This produce:

uint64 - uint64: -1
as json: {"Val":18446744073709551615}

Probably in the raft.Peers

derekcollison commented 3 years ago

On my list to take a look, thanks.