Closed phanama closed 4 years ago
@xiang90
I'm having the same issue, and I've tested changing the order of the servers on the address
configuration option and the servers that give the error change
So if I use:
storage "etcd" {
address = "https://etcd2.example.com:2379,https://etcd1.example.com:2379,https://etcd3.example.com:2379"
...
}
I get the error (error "remote error: tls: bad certificate", ServerName "etcd2.example.com"
) on etcd1.example.com
and etcd3.example.com
, but if I use:
storage "etcd" {
address = "https://etcd1.example.com:2379,https://etcd2.example.com:2379,https://etcd3.example.com:2379"
...
}
I get the error (error "remote error: tls: bad certificate", ServerName "etcd1.example.com"
) on etcd2.example.com
and etcd3.example.com
I have the exact same problem. Also, the Vault servers only talk to the first Etcd server in the address list, and do not fail over to the other Etcd servers.
I think the HA problem and the TLS errors are symptoms of the same error.
See my comments on the related issue...
https://github.com/hashicorp/vault/issues/4961
Setting etcd_api = "v2"
in the Vault Server config solves the problem.
Hello. I have exactly the same issue, is a fix in the works ?
This problem still exists in Vault 1.0.0-beta1
We experienced the same issue - Vault uses only first address in the list of etcd endpoints. When I stop this instance - vault status
fails. And when two vault instances have different first etcd endpoint - then both instances become a master. This is a huge issue for us.
@gyuho @hexfusion @philips Is etcd team still interested/committed in maintaining this as promised here or more specifically https://github.com/hashicorp/vault/pull/2168#issuecomment-266090432 ?
Or is it now up to the community to drive it and is given up?
Is etcd team still interested/committed in maintaining this as promised here or more specifically #2168 (comment) ? Or is it now up to the community to drive it and is given up?
@raoofm we are not giving up on anything but we just simply don't have the man power to cover all of these bases. As you are in the trenches here bringing these problems to our attention is helpful. But also we need more cycles so anything you can do to be a bridge with that would be great. This issue seems to be misconfigured TLS SAN.
@yudiandreanp what is the output of
openssl x509 -in /etcd/client.crt -text -noout
@hexfusion awesome, thanks and it makes sense. I'll pitch in where I can just wanted to sense where its heading.
@hexfusion there's a more comprehensive description of the issue in general over at https://github.com/etcd-io/etcd/issues/9949 where I think the attention should be focused.
@jsok at a high level we are working on improving client balancer for 3.4 but basically clientv3 needs to handle this situation better with regards to the balancer logic. So if an endpoint is not available it goes to the next.
2018-04-13 08:01:50.267438 I | embed: rejected connection from "10.10.145.64:32946" (error "remote error: tls: bad certificate", ServerName "etcd-0.example.com")
I believe though regardless of the general balancer issue that these errors are literal, basically it is telling you that you should have etcd-0.example.com in your TLS SAN and you do not. So I believe you have 2 separate issues here so I would like to review output of the openssl command above.
I'm fairly certain the issue here is that the first endpoint that the client balancer hits determines the expected ServerName for all other endpoints, which doesn't make sense.
You shouldn't expect every peer to have every other peer's FQDN and/or IP in their SAN. Yes they will have a common subset of SANs (e.g. for SRV discovery) but not identical.
Is there any news on this case? Have exactly the same problem. vault 1.0.2 (latest from site), etcd 3.3.8.
I tried to regenerate vault's client certificate with all hostnames and IPs in SAN, but have no luck with it.
@prudnitskiy The etcd team has disappeared and I don't believe anyone in the community has worked on a fix, so no updates currently.
@prudnitskiy I've outlined the workarounds in the etcd issue. It's not pretty but does work for the time being.
Environment:
Vault Config File:
Startup Log Output:
Expected Behavior:
All the etcd endpoints don't throw tls error and accept the connection
Actual Behavior:
Two last etcd endpoints thrown tls error, stating bad certificate, thus rejecting them. I suspect that this is because Vault sends request to them using the first server's (etcd-0) server name.
Sample etcd error logs: (from etcd-1. etcd-2 showed the same error)
Steps to Reproduce: