nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
15.49k stars 1.38k forks source link

stream has no quorum,stalled #4044

Closed guojianyu closed 1 year ago

guojianyu commented 1 year ago

I created a stream a long time ago but did not use it, and today found that I could not publish messages to the stream.

nats cluster three nodes in my k8s are running and other stream can elect leader and work.

The following error message is displayed on the server: image

The following error message is displayed on the client: image The abnormal stream information: image

Versions of nats-server and affected client libraries used:

nats-server: nats:2.9.15-alpine nat.go: v1.25.0

OS/Container environment:

Linux,Kubernetes

Steps or code to reproduce the issue:

Expected result:

stream can elect leader and work when the cluster is ready.

Actual result:

stream has no quorum

derekcollison commented 1 year ago

You can reset the raft layer which may resolve.

nats stream update <stream name> --replicas 1 -f

then

nats stream update <stream name> --replicas 3 -f

guojianyu commented 1 year ago

Thank you for response. Manually fix I don't think is a good way,is there a better solution at the jetstream level in the future? The stream has been fixed that reset the raft layer but the stream raft is still a bit abnormal. image

You can reset the raft layer which may resolve.

nats stream update <stream name> --replicas 1 -f

then

nats stream update <stream name> --replicas 3 -f

derekcollison commented 1 year ago

We agree, but without knowing more details about your exact setup that was a suggested quick fix.

2.9.16 which is do out any day now and can be accessed in nightly builds as a docker container has some fixes and improvements to the raft layer.

guojianyu commented 1 year ago

We agree, but without knowing more details about your exact setup that was a suggested quick fix.

2.9.16 which is do out any day now and can be accessed in nightly builds as a docker container has some fixes and improvements to the raft layer.

Thank you so much @derekcollison, I will update to the 2.9.16,i hope it's a better.

This my nats cofigurations:

pid_file: "/var/run/nats/nats.pid"
http: 8222

jetstream {
  max_mem: 8Gi
  store_dir: /data/
  max_file: 100Gi
}
server_name:$POD_NAME
tls: {
  verify:  true
  ca_file: /etc/edge-nats-server-tls/ca.crt
  cert_file: /etc/edge-nats-server-tls/tls.crt
  key_file: /etc/edge-nats-server-tls/tls.key
}
cluster {
  name: cloud
  port: 6222
  cluster_advertise: $CLUSTER_ADVERTISE
  routes [
    nats://cloud-jet-nats-0.jet-mgmt.nats.svc.cluster.local:6222
    nats://cloud-jet-nats-1.jet-mgmt.nats.svc.cluster.local:6222
    nats://cloud-jet-nats-2.jet-mgmt.nats.svc.cluster.local:6222
  ]
  tls: {
    ca_file: /etc/nats-routes-tls-certs/ca.crt
    cert_file: /etc/nats-routes-tls-certs/tls.crt
    key_file: /etc/nats-routes-tls-certs/tls.key
  }
  connect_retries: 10
}
leafnodes {
  port: 7422
  tls: {
    verify:   true
    ca_file: /etc/nats-server-tls-certs/ca.crt
    cert_file: /etc/nats-server-tls-certs/tls.crt
    key_file: /etc/nats-server-tls-certs/tls.key
  },
}
write_deadline: 30s
lame_duck_grace_period: 10s
lame_duck_duration: 30s
accounts:{
  "system":{"jetstream":true,"users":[{"pass":"system","user":"system"}]},
  "abe":{"jetstream":true,"users":[{"pass":"abe","user":"abe"}]}
}

I find that if the stream is abnormal, the consumer is also abnormal

The stream info: image The consumer of stream: image I hope this information is useful.

A couple of follow-up questions that use of jetstream:

  1. Do you recommend using the jetstream for production environments?(It's urgent that we upgrade our nats to jetstream,what problems do we face if we use it?)

  2. Jetstream raft groups are a great design but also add complexity to the overall system.there are thousands of stream/consumer dynamically generated in our environments,can jetstream maintains its stability and high performance in such conditions?

Looking forward to your reply !

derekcollison commented 1 year ago

Yes JetStream is in production use at quite a lot of places and with many different architectures and topologies.

We recommend looking into a commercial agreement with Synadia for critical production usage.

guojianyu commented 1 year ago

Thank you for taking time to answer my questions. I'll do that if I have to.