nats-io / nats-operator

NATS Operator
https://nats-io.github.io/k8s/
Apache License 2.0
574 stars 111 forks source link

Reloading of server config seems choppy on multiple service role creation #297

Open sdeoras opened 3 years ago

sdeoras commented 3 years ago

I am using NATS operator and the ServiceRole creation in an automation workflow, where a bunch of new ServiceRoles get created in rapid succession. I am noticing that the server logs do show that the authorization users and the server config get reloaded, but appear to do so on on prematurely missing update-triggers on creation of subsequent ServiceRoles.

I am attaching the server logs that highlight this issue:

[6] 2020/11/11 21:40:24.231281 [INF] Reloaded: authorization users
[6] 2020/11/11 21:40:24.231307 [INF] Reloaded server configuration
[6] 2020/11/11 21:40:41.427721 [ERR] 10.88.0.4:46878 - cid:5 - Authorization Error - User "nats-volta-2"
[6] 2020/11/11 21:40:42.435702 [ERR] 10.88.0.4:46886 - cid:6 - Authorization Error - User "nats-volta-3"
[6] 2020/11/11 21:40:43.449459 [ERR] 10.88.0.4:46888 - cid:7 - Authorization Error - User "nats-volta-4"
[6] 2020/11/11 21:40:44.459955 [ERR] 10.88.0.4:46890 - cid:8 - Authorization Error - User "nats-volta-5"

As you probably guessed, the reloading triggered on creation of user nats-volta-1, but it failed to trigger again for subsequent objects 2-through-5. As a result the authorization failed for each of them. All objects 1-through-5 were created together in one call.

I also confirmed by creating a new object, which triggered reloading of server config and the authZ errors shown above disappeared.

I would like to know if it is possible to do something in the short run to avoid this issue such as a server config that I might be missing.

Appreciate all your help and really loving the fact that NATS is available as an operator. Thank you so much!

wallyqs commented 3 years ago

Thanks @sdeoras for the report, there might be a bug in the reloader... One workaround that could be done is to send the reload signal out of band via kubectl exec like this.

kubectl exec -it nats-cluster-1 -- /nats-server -sl=reload=/var/run/nats/nats.pid
sdeoras commented 3 years ago

Thanks @wallyqs I'll use the kubectl command in the mean time.

sdeoras commented 3 years ago

@wallyqs: Another auth question. Prior to creation of the first ServiceRole the clients are able to connect in an unauthenticated manner. Is this an expected behavior and is there a way to turn it off such that server always denies unauthenticated client connections. Even after creation of the first service role it takes a bit before auth kicks in (as you see below), which I am guessing is the reconciliation delay, but it would be good to always deny connections by default if auth switch is enabled.

└─ $ ▶ nats-sub -s nats://${NATS_IP}:4222 hello
Listening on [hello]
^C

└─ $ ▶ nats-sub -s nats://${NATS_IP}:4222 hello
Listening on [hello]
^C

└─ $ ▶ nats-sub -s nats://${NATS_IP}:4222 hello
Listening on [hello]
^C

└─ $ ▶ nats-sub -s nats://${NATS_IP}:4222 hello
nats: Authorization Violation