Closed Alagroc closed 2 years ago
Hi @Alagroc. Could you please confirm whether ACLs are enabled and bootstrap on the authoritative and federated clusters with the ACL replication token set on the federated cluster servers?
Could you also please include additional logs from the servers in the authoritative region? The current snippet of logs makes it hard to identify what is exactly happening.
Hi @jrasell these are the nomad config file for the authoritative server and its log. I just noticed the authoritative server is quite outdated (version 0.12.3 instead of 1.2.5, this detailed slipped trough).
Config:
data_dir = "/opt/nomad"
addresses {
http = [redacted]
rpc = [redacted]
serf = [redacted]
}
acl {
enabled = true
replication_token = [redacted]
}
ports {
http = [redacted]
}
region = "staging"
datacenter = "aws"
tls {
http = true
rpc = true
ca_file =[redacted]
cert_file = [redacted]
key_file = [redacted]
}
server {
enabled = true
authoritative_region = "staging"
#minimum times in terminal state before garbage collection
job_gc_threshold = "48h"
eval_gc_threshold = "48h"
bootstrap_expect = 3
server_join {
retry_max = 3
retry_interval = "15s"
retry_join = ["XXX.XXX.XXX.XXX", "XXX.XXX.XXX.XXX", "XXX.XXX.XXX.XXX"]
}
}
Log content:
Aug 11 20:35:04 nom-srv-master-staging-01 nomad[5744]: 2022-08-11T20:35:04.352Z [ERROR] nomad.rpc: RPC error: error="rpc: can't find service Namespace.ListNamespaces" connection="&{[redacted] {{0 0 <nil>}} {{0 0 <nil>}}}"
Aug 11 20:35:15 nom-srv-master-staging-01 nomad[5744]: 2022-08-11T20:35:15.051Z [INFO] nomad: serf: EventMemberUpdate: nom-srv-stagingb1-01.[fqdn]
Aug 11 20:35:18 nom-srv-master-staging-01 nomad[5744]: 2022-08-11T20:35:18.142Z [ERROR] http: request failed: method=GET path=/v1/namespaces?region=stagingb1 error="Nomad Enterprise only endpoint" code=501
Aug 11 20:35:19 nom-srv-master-staging-01 nomad[5744]: 2022-08-11T20:35:19.555Z [ERROR] http: request failed: method=GET path=/v1/status/leader?region=stagingb1 error="rpc error: stream closed" code=500
Aug 11 20:35:19 nom-srv-master-staging-01 nomad[5744]: 2022-08-11T20:35:19.621Z [ERROR] http: request failed: method=GET path=/v1/namespaces?region=stagingb1 error="Nomad Enterprise only endpoint" code=501
Aug 11 20:35:42 nom-srv-master-staging-01 nomad[5744]: 2022-08-11T20:35:42.374Z [ERROR] nomad.rpc: RPC error: error="rpc: can't find service Namespace.ListNamespaces" connection="&{[redacter] {{0 0 <nil>}} {{0 0 <nil>}}}"
Hi @Alagroc and thanks for the additional information.
Nomad namespaces were originally an enterprise feature and were open sourced in v1.0.0
. This is why your setup is not currently functioning as expected. This can be seen in the log line Aug 11 20:35:19 nom-srv-master-staging-01 nomad[5744]: 2022-08-11T20:35:19.621Z [ERROR] http: request failed: method=GET path=/v1/namespaces?region=stagingb1 error="Nomad Enterprise only endpoint" code=501
.
I would therefore suggest upgrading your authoritative region to v1.2.5
so that it matches the federated cluster version and includes the namespace OSS code.
I will close this issue as I believe the version mismatch is the source of the problem. If you have further problems, please do not hesitate to reopen this issue, or raise a new one.
thanks !
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Nomad version
Nomad v1.2.5 (06d912a20ba029c7f4d6b371cd07594cba3ae3cd)
Operating system and Environment details
Debian GNU/Linux 10 (buster)
Issue
rpc error: rpc: can't find service Namespace.ListNamespaces on non authoritative clusters appearing when authoritative_region is specified on the non authoritative nomad servers.
Reproduction steps
1 - Create an authoritative nomad cluster, i.e. staging: nomad.hcl
2 - Create a non authoritative nomad cluster under the first one (i.e. sub environment), make use of the parameter authoritative_region to point to the authoritative environment. i.e. creating staging-b1 and pointing the region to staging nomad.hcl
3 - The cluster works, but this keep appearing in nomad logs:
Expected Result
No rpc errors on the logs
Actual Result
RPC errors on the logs
Job file (if appropriate)
Nomad Server logs (if appropriate)
i.e. we have staging cluster as authoritative (formed by a couple of nomad servers), then we have staging b1 and staging b2 as non-authoritative sub environments. Each sub environment is made of a single nomad server. This is part of the output of staging b1:
Nomad Client logs (if appropriate)
no related errors on clients