rancher / terraform-provider-rancher2

Terraform Rancher2 provider
https://www.terraform.io/docs/providers/rancher2/
Mozilla Public License 2.0
258 stars 223 forks source link

bootstrap fails on k3s with traefik and TLS termination on load balancer #317

Open dnoland1 opened 4 years ago

dnoland1 commented 4 years ago

On a k3s cluster with traefik, get the following error when attempting to using the rancher2 bootstrap resource (TF_LOG=debug) set:

2020-04-13T22:37:23.999-0700 [DEBUG] plugin.terraform-provider-rancher2_v1.8.3_x4: 2020/04/13 22:37:23 Getting from  https://<myurl>/ping
2020-04-13T22:37:26.566-0700 [DEBUG] plugin.terraform-provider-rancher2_v1.8.3_x4: 2020/04/13 22:37:26 Time to get req:  2567  ms
2020-04-13T22:37:31.568-0700 [DEBUG] plugin.terraform-provider-rancher2_v1.8.3_x4: 2020/04/13 22:37:31 Getting from  https://<myurl>/ping
module.rancher-bootstrap.rancher2_bootstrap.admin: Still creating... [20s elapsed]
2020-04-13T22:37:32.573-0700 [DEBUG] plugin.terraform-provider-rancher2_v1.8.3_x4: 2020/04/13 22:37:32 Time to get req:  1005  ms
2020-04-13T22:37:37.578-0700 [DEBUG] plugin.terraform-provider-rancher2_v1.8.3_x4: 2020/04/13 22:37:37 Getting from  https://<myurl>/ping
2020-04-13T22:37:38.441-0700 [DEBUG] plugin.terraform-provider-rancher2_v1.8.3_x4: 2020/04/13 22:37:38 Time to get req:  863  ms
module.rancher-bootstrap.rancher2_bootstrap.admin: Still creating... [30s elapsed]
2020-04-13T22:37:43.442-0700 [DEBUG] plugin.terraform-provider-rancher2_v1.8.3_x4: 2020/04/13 22:37:43 Getting from  https://<myurl>/ping
2020-04-13T22:37:44.716-0700 [DEBUG] plugin.terraform-provider-rancher2_v1.8.3_x4: 2020/04/13 22:37:44 Time to get req:  1273  ms
2020-04-13T22:37:49.720-0700 [DEBUG] plugin.terraform-provider-rancher2_v1.8.3_x4: 2020/04/13 22:37:49 Getting from  https://<myurl>e/ping
2020-04-13T22:37:50.822-0700 [DEBUG] plugin.terraform-provider-rancher2_v1.8.3_x4: 2020/04/13 22:37:50 Time to get req:  1101  ms
module.rancher-bootstrap.rancher2_bootstrap.admin: Still creating... [40s elapsed]
2020/04/13 22:37:55 [DEBUG] module.rancher-bootstrap.rancher2_bootstrap.admin: apply errored, but we're indicating that via the Error pointer rather than returning it: [ERROR] Connecting with user/pass: Rancher is not ready: <nil>

This is because traefik intercepts /ping and returns OK and not the expected "pong":

$ curl  https://<myurl>/ping
OK

See https://docs.traefik.io/operations/ping/#configuration-options

Note, works fine if we debug nginx-ingress controller instead of the default traefik ingress. Above testing was done with rancher2 v1.8.3 provider.

rawmind0 commented 4 years ago

That sounds weird @dnoland1 . Traefik shouldn't intercept /ping uri on every entrypoint, just on an specific one, traefik by default.

It seems that traefik ingress is not redirecting requests to proper entrypoint. Are you using proper FQDN to access Rancher??

Tested on k3s v1.17.4-k3s1 with traefik ingress and running rancher

$ curl -k https://172.17.0.5/ping 
OK
$ curl -k --header 'Host: rancher.my.org' https://172.17.0.5/ping 
pong
dnoland1 commented 4 years ago

@rawmind0 Could you try this in your environment:

curl --header 'Host: rancher.my.org' http://172.17.0.5/ping

and let me know the results. It appears if you use https, pong is returned, but http returns OK.

In our environment, we are using the tls external option in Rancher and doing TLS termination on AWS ALB. The ALB will forward traffic onto traefik using http, not https.

rawmind0 commented 4 years ago

Issue is with traefik and external LB configuration:

The provider needs to access proper rancher healthcheck at /ping, due to it's checking rancher readiness before connect, https://github.com/terraform-providers/terraform-provider-rancher2/blob/master/rancher2/config.go#L73 and this configuration is redirecting /ping requests on http, to the ingress healthcheck instead of Rancher healthcheck.

2 possible options to fix: