rancher / rke

Rancher Kubernetes Engine (RKE), an extremely simple, lightning fast Kubernetes distribution that runs entirely within containers.
Apache License 2.0
3.21k stars 582 forks source link

RKE1: Ingress Controller and Ingress not working #3598

Closed sourabhsharma487 closed 1 month ago

sourabhsharma487 commented 4 months ago

RKE version: 1.5.9

Docker version: (docker version,docker info preferred) 24.0.9

Operating system and kernel: (cat /etc/os-release, uname -r preferred) RHEL-9.3

Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO) VMWare (complete server nodes are from Onprem Infra)

cluster.yml file: cluster-setup-file.zip

Steps to Reproduce: rke1 up -config (attached above)

Results: We have 3 server nodes that has role controlplane, etcd and worker associated to each nodes. Ingress controller was deployed using the same cluster.yaml (attached above).

Senario 1: PASS scenario

Having deployed Ingress controller on all three nodes as a daemon set, the application workload has been deployed as one replica, and I have been able to receive a response from the server node on which POD is deployed.

Scenario 2: FAIL scenario Ingress controller has been deployed to all three nodes as a daemon set, but the application workload was scaled up to three instances so all applications are running on each cluster server node. I receive a 504 Gateway Timed Out when trying to access from the browser because only one server node loads the page with massive latency, and I do not receive a response from the remaining two nodes.

For example:

Ingress Contoller logs: upstream timed out (110: Operation timed out) while connecting to upstream, client: X.X.X.X, server: abc.com, request "GET /abc/v3/api-docs

Tried to access the application using the server node IP or via the Host entry, updated in the local machine.

NOTE: All the Firewall ports are allowed and all VMs are from same subnet and same ESXi.

I would like your assistance in ensuring that Ingress controller is used for high availability or an alternative approach to ensure a high availability of each micro-services running on RKE kubernetes cluster.

github-actions[bot] commented 2 months ago

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.