rancher / rio

Application Deployment Engine for Kubernetes
https://rio.io
Apache License 2.0
2.27k stars 228 forks source link

Public domain not returning the proper service content, http returns 404 [v0.7.0] #1021

Closed citananda closed 4 years ago

citananda commented 4 years ago

Describe the bug Registering a domain to a service is working but the response is wrongly returning 404 and secure http just halts the connection.

To Reproduce

  1. rio run -p 80:8080 https://github.com/rancher/rio-demo
  2. rio domain register my.domain service.name
  3. curl the http://service.name-XXX.on-rio.io:31766 is working
  4. curl the http://my.domain is getting default backend - 404

Expected behavior I should get the service content equals to what it's endpoint is returning but using the publicdomain

Kubernetes version & type (GKE, on-prem): kubectl version

Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:14:22Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:07:13Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}

Type: Rio version: rio info

Rio Version: v0.7.0 (4afd4901)
Rio CLI Version: v0.7.0 (4afd4901)
Cluster Domain: wy1wnw.on-rio.io
Cluster Domain IPs: 46.105.42.108
System Namespace: rio-system
Wildcard certificates: wy1wnw.on-rio.io(false)

Additional context rio system logs output:

rio-controller | time="2020-03-19T14:59:35Z" level=info msg="injecting acme http-01 path for domain my.domain"

/var/log/containers/gateway-proxy-xxx_rio-system_gateway-proxy-xxx.log

{"log":"[2020-03-19 14:59:32.889][7][warning][config] [external/envoy/source/server/listener_manager_impl.cc:345] adding listener '[::]:8443': filter chain match rules require TLS Inspector listener filter, but it isn't configured, trying to inject it (this might fail if Envoy is compiled without it)\n","stream":"stderr","time":"2020-03-19T14:59:32.889402009Z"}
{"log":"[2020-03-19 14:59:32.889][7][info][upstream] [external/envoy/source/server/lds_api.cc:63] lds: add/update listener 'listener-::-8443'\n","stream":"stderr","time":"2020-03-19T14:59:32.889613581Z"}
{"log":"[2020-03-19 14:59:36.873][7][info][upstream] [external/envoy/source/common/upstream/cds_api_impl.cc:70] cds: add 56 cluster(s), remove 2 cluster(s)\n","stream":"stderr","time":"2020-03-19T14:59:36.873453349Z"}
{"log":"[2020-03-19 14:59:36.874][7][info][upstream] [external/envoy/source/common/upstream/cds_api_impl.cc:86] cds: add/update cluster 'kube-svc:rio-system-cm-acme-http-solver-8fprb-8089_rio-system'\n","stream":"stderr","time":"2020-03-19T14:59:36.874412783Z"}
StrongMonkey commented 4 years ago

I saw your endpoint is appending nodePort(30000-32767). You should be able to access your domain by appending the same port (my.domain:port). Https won't work because http-01 requires your service loadbalancer to listen on 80/http.

citananda commented 4 years ago

@StrongMonkey Thanks for your help http://my.domain:port is giving No page found at the address... I don't know if this can help but I see strange behaviors :

StrongMonkey commented 4 years ago

This is because your cluster doesn't support service loadbalancer probably, so Rio will use nodePort to expose ingress gateway. Make sure you use the correct port by doing rio inspect on the service name(shown in rio ps) and look at endpoints section. There are two ports(one with http and one with https). Letsencrypt won't work for public domain since it requires your ingress to run on 80/http. You can workaround that by manually injecting secrets into your publicdomain.

citananda commented 4 years ago

@StrongMonkey thanks for your fast reply I think ports are ok : rio inspect cool-snyder gives

---
apiVersion: rio.cattle.io/v1
kind: Service
metadata:
  creationTimestamp: "2020-03-19T14:06:06Z"
  generateName: cool-snyder-v0
  generation: 3
  name: cool-snyder-v0cp5h9
  namespace: default
  resourceVersion: "3038297"
  selfLink: /apis/rio.cattle.io/v1/namespaces/default/services/cool-snyder-v0cp5h9
  uid: 4e15d14a-cfbc-4f33-ab9d-c8ce9e441bca
spec:
  app: cool-snyder
  build:
    branch: master
    repo: https://github.com/rancher/rio-demo
    revision: f8fab97fddc8ed5e98e45cd6373ad6feff3197f9
  image: localhost:5442/default/cool-snyder-v0cp5h9:f8fab
  ports:
  - port: 80
    targetPort: 8080
  version: v0
status:
  appEndpoints:
  - http://my.domain
  - https://my.domain
  - http://cool-snyder-XXX.on-rio.io:31766
  buildLogToken: d844klq55hj52xrv5jk82rcpmlgmgcqgvd67jxgm2q8q6znp777sw6
  computedApp: cool-snyder
  computedVersion: v0
  computedWeight: 10000
  conditions:
  - lastUpdateTime: "2020-03-19T14:06:11Z"
    status: "True"
    type: BuildDeployed
  - lastUpdateTime: "2020-03-19T14:06:11Z"
    status: "True"
    type: ServiceDeployed
  - lastUpdateTime: "2020-03-19T14:06:11Z"
    status: "True"
    type: ServiceClusterRBAC
  deploymentReady: true
  endpoints:
  - http://cool-snyder-XXX.on-rio.io:31766
  scaleStatus:
    available: 1
  watch: true
citananda commented 4 years ago

@StrongMonkey please can you elaborate? I am not sure to understand what you are saying, but here is what I got :

About loadbalancer, I can create workloads on my cluster, and then create Load Balancing (using ingress) on a specific domain name. I have also installer cert-manager so it creates automatically SSL certificates. This is fully working, I can access my workloads on http://other.domain and https://other.domain

This was installed before I install RIO, maybe it is the problem.

Other information, I don't know if it can be of any help, but the command rio dashboard is not working and stucked on the message Waiting for dashboard service to be ready

Thanks by advance for your help

StrongMonkey commented 4 years ago

Do you deploy your own cert-manager? It might have conflicts on the one that Rio deploys. Also you can check the cert-manager pod in rio-system to see if there is any error log. rio dashboard is pending because it requires HTTPS, but looks like your rio cluster failed to provision a wildcard certificate for your cluster.

citananda commented 4 years ago

@StrongMonkey Yes I did install my onw cert-manager, so I remove pods and reinstalled rio Rio cert-manager is working fine, I can see in logs that it is managing certs of current certificates instead of my own cert-manager. In the rio cert-manager, there is no log error, but I neither see message about a wildcard certificate.

But I can see in rio info that line: Wildcard certificates: XXX.on-rio.io(false)

And I find this in rio-controller logs

time="2020-03-26T17:24:31Z" level=info msg="Starting rio-controller, version: v0.7.0, git commit: 4afd4901"
time="2020-03-26T17:24:31Z" level=info msg="Updating CRD services.rio.cattle.io"
time="2020-03-26T17:24:31Z" level=info msg="Updating CRD stacks.rio.cattle.io"
I0326 17:24:31.667474 1 leaderelection.go:241] attempting to acquire leader lease rio-system/rio...
time="2020-03-26T17:24:31Z" level=info msg="listening at :443"
I0326 17:24:35.124201 1 leaderelection.go:251] successfully acquired lease rio-system/rio
time="2020-03-26T17:24:36Z" level=info msg="Starting /v1, Kind=ConfigMap controller"
time="2020-03-26T17:24:37Z" level=info msg="Starting apps/v1, Kind=StatefulSet controller"
time="2020-03-26T17:24:37Z" level=info msg="Starting rbac.authorization.k8s.io/v1, Kind=ClusterRole controller"
time="2020-03-26T17:24:37Z" level=info msg="Starting apps/v1, Kind=Deployment controller"
time="2020-03-26T17:24:37Z" level=info msg="Starting cert-manager.io/v1alpha2, Kind=Certificate controller"
time="2020-03-26T17:24:37Z" level=info msg="Starting apps/v1, Kind=DaemonSet controller"
time="2020-03-26T17:24:37Z" level=info msg="Starting /v1, Kind=Service controller"
time="2020-03-26T17:24:37Z" level=info msg="Starting /v1, Kind=Endpoints controller"
time="2020-03-26T17:24:37Z" level=info msg="Starting admin.rio.cattle.io/v1, Kind=PublicDomain controller"
time="2020-03-26T17:24:37Z" level=info msg="Starting admin.rio.cattle.io/v1, Kind=ClusterDomain controller"
time="2020-03-26T17:24:37Z" level=info msg="Starting /v1, Kind=Secret controller"
time="2020-03-26T17:24:37Z" level=info msg="Starting rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding controller"
time="2020-03-26T17:24:37Z" level=info msg="Starting extensions/v1beta1, Kind=Ingress controller"
time="2020-03-26T17:24:37Z" level=info msg="Starting rio.cattle.io/v1, Kind=Service controller"
time="2020-03-26T17:24:37Z" level=info msg="Starting rio.cattle.io/v1, Kind=ExternalService controller"
time="2020-03-26T17:24:37Z" level=info msg="Starting rio.cattle.io/v1, Kind=Router controller"
time="2020-03-26T17:24:37Z" level=info msg="Starting gloo.solo.io/v1, Kind=Settings controller"
E0326 17:24:38.476019 1 controller.go:135] error syncing 'XXX.on-rio.io': handler clusterdomain-letsencrypt: failed to create rio-system/XXX.on-rio.io-tls cert-manager.io/v1alpha2, Kind=Certificate for clusterdomain-letsencrypt XXX.on-rio.io: Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: dial tcp 10.43.12.128:443: connect: connection refused, requeuing
E0326 17:24:39.495868 1 controller.go:135] error syncing 'XXX.on-rio.io': handler clusterdomain-letsencrypt: failed to create rio-system/XXX.on-rio.io-tls cert-manager.io/v1alpha2, Kind=Certificate for clusterdomain-letsencrypt XXX.on-rio.io: Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: dial tcp 10.43.12.128:443: connect: connection refused, requeuing
E0326 17:24:39.499893 1 controller.go:135] error syncing 'rio-system/rio-config': handler letsencrypt-issuer: failed to create rio-system/rio-dns-issuer cert-manager.io/v1alpha2, Kind=Issuer for letsencrypt-issuer rio-system/rio-config: Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: dial tcp 10.43.12.128:443: connect: connection refused, failed to create rio-system/rio-http-issuer cert-manager.io/v1alpha2, Kind=Issuer for letsencrypt-issuer rio-system/rio-config: Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: dial tcp 10.43.12.128:443: connect: connection refused, requeuing
E0326 17:24:40.551448 1 controller.go:135] error syncing 'XXX.on-rio.io': handler clusterdomain-letsencrypt: failed to create rio-system/XXX.on-rio.io-tls cert-manager.io/v1alpha2, Kind=Certificate for clusterdomain-letsencrypt XXX.on-rio.io: Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: dial tcp 10.43.12.128:443: connect: connection refused, requeuing
E0326 17:24:41.544090 1 controller.go:135] error syncing 'rio-system/rio-config': handler letsencrypt-issuer: failed to create rio-system/rio-dns-issuer cert-manager.io/v1alpha2, Kind=Issuer for letsencrypt-issuer rio-system/rio-config: Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: dial tcp 10.43.12.128:443: connect: connection refused, failed to create rio-system/rio-http-issuer cert-manager.io/v1alpha2, Kind=Issuer for letsencrypt-issuer rio-system/rio-config: Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: dial tcp 10.43.12.128:443: connect: connection refused, requeuing
E0326 17:24:41.611244 1 controller.go:135] error syncing 'XXX.on-rio.io': handler clusterdomain-letsencrypt: failed to create rio-system/XXX.on-rio.io-tls cert-manager.io/v1alpha2, Kind=Certificate for clusterdomain-letsencrypt XXX.on-rio.io: Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: dial tcp 10.43.12.128:443: connect: connection refused, requeuing
E0326 17:24:42.695989 1 controller.go:135] error syncing 'XXX.on-rio.io': handler clusterdomain-letsencrypt: failed to create rio-system/XXX.on-rio.io-tls cert-manager.io/v1alpha2, Kind=Certificate for clusterdomain-letsencrypt XXX.on-rio.io: Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: dial tcp 10.43.12.128:443: connect: connection refused, requeuing
E0326 17:24:43.596256 1 controller.go:135] error syncing 'rio-system/rio-config': handler letsencrypt-issuer: failed to create rio-system/rio-dns-issuer cert-manager.io/v1alpha2, Kind=Issuer for letsencrypt-issuer rio-system/rio-config: Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: dial tcp 10.43.12.128:443: connect: connection refused, failed to create rio-system/rio-http-issuer cert-manager.io/v1alpha2, Kind=Issuer for letsencrypt-issuer rio-system/rio-config: Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: dial tcp 10.43.12.128:443: connect: connection refused, requeuing

Something I don't get is that

So which pod is supposed to solve to cert-manager-webhook.cert-manager.svc? Do you have an idea of how I can solve this problem?

citananda commented 4 years ago

Finally, I uninstalled rio & cert-manager, clean and then when I install rio again, dashboard works Thanks a lot for your help @StrongMonkey