kahkhang / kube-linode

:whale: Provision a Kubernetes/CoreOS cluster on Linode
MIT License
212 stars 31 forks source link

ERR_SSL_PROTOCOL_ERROR #71

Closed nbiles closed 6 years ago

nbiles commented 6 years ago

I have previously successfully provisioned a cluster using this script. But after a teardown I ran the same script using the same domain and now I cannot access any web ui or any traefik routes because I am getting an ERR_SSL_PROTOCOL_ERROR error from the browser. I feel like I have tried everything but I can't get past this error.

Please help!

ps - I would also like to see an example of a deployment using the same subdomain and ssl...and possible an example of using this cluster but a different domain. I apologize I am not a sysadmin and ops engineer.

kahkhang commented 6 years ago

Unfortunately, you might have hit the weekly certificate renewal rate limit: https://letsencrypt.org/docs/rate-limits. You'll probably need to wait until next week, or you can use the staging server (with a higher rate limit but the browser will throw a warning).

nbiles commented 6 years ago

So, if I use a different domain this won't be an issue?

kahkhang commented 6 years ago

Yes the rate limit is per domain.

nbiles commented 6 years ago

What about my second issue? Do you have an example/sample deployment.

I'm trying to use a single cluster as a staging environment. So, I'll need to create deployments on the fly that work with the treafik (and SSL) to do:

project1.example.com project2.example.com

Along with all the others created.

I would also like to know how I can deploy in this cluster another domain so if I have project1.example.com I can deploy project1.com (assuming I'll have to create another SSL/TLS).

kahkhang commented 6 years ago

If you look in the manifests folder (for example alertmanager / kubernetes dashboard) you can see how the ingress for those subdomains are deployed.

If you want to serve another domain, you can create corresponding A and CNAME records in the linode dns manager to your new domain pointing to the master node ip, then create a corresponding ingress pointing to the new domain (see https://kubernetes.io/docs/concepts/services-networking/ingress/)

nbiles commented 6 years ago

I looked at the example for alertmanager and created a deployment but when I did I got the ERR_SSL_PROTOCOL_ERROR for the new subdomain. At that time all the other subdomains created by script were still working.

When I compare my Ingress I noticed I was missing:

annotations:
    kubernetes.io/ingress.class: "traefik"

Is this required? Also are the RoleBindings required?

kahkhang commented 6 years ago

You've probably hit the rate limit, the certificate issuing is on demand once an ingress is created. Technically I think you would be fine without that annotation since I've made Traefik the default ingress controller, but it helps in case another ingress controller is deployed. (see https://docs.traefik.io/user-guide/kubernetes/). Yes it's required for access control (see https://kubernetes.io/docs/admin/authorization/rbac/)

pmjohann commented 6 years ago

Same issue here, I get an ERR_SSL_PROTOCOL_ERROR in Chrome. If I port forward Traefik's pod's 8080 port to localhost, I can see it's dashboard, but I can't access any other service (Grafana, kube dashboard etc.). How could I debug deeper? Thank you! EDIT: I also tried via plain HTTP, but it redirects to HTTPS by default. Any way to disable auto redirect? HTTP would be fine for me as my setup is just for testing and educational purposes. Thanks once more!

kahkhang commented 6 years ago

@pmjohann, could you look at the traefik ingress controller pod logs to see what is happening when the certificate is being created? Not sure if it is the same issue as with https://github.com/kahkhang/kube-linode/issues/72. To support plain http, you'll need to remove the redirect

[entryPoints.http.redirect]
      entryPoint = "https"

in traefik.yaml in the manifests folder (see the Traefik documentation for more details).

The error might be related to the recent TLS-SNI vulnerability (https://community.letsencrypt.org/t/2018-01-09-issue-with-tls-sni-01-and-shared-hosting-infrastructure/49996). I'm going to try to see if I can work around this if it is indeed the error, and not due to hitting the rate limit.

kahkhang commented 6 years ago

Ok can confirm that this is an issue, it shows up as "ERR_SSL_PROTOCOL_ERROR" on Chrome, and "SSL_ERROR_INTERNAL_ERROR_ALERT" on Firefox. Closing this issue in favor of #72 since it is a duplicate one.