rancher / rke

Rancher Kubernetes Engine (RKE), an extremely simple, lightning fast Kubernetes distribution that runs entirely within containers.
Apache License 2.0
3.22k stars 584 forks source link

Unset proxy env vars when using bastion #2525

Closed pmorillon closed 3 years ago

pmorillon commented 3 years ago

Hello ! i use RKE command line for production cluster, all works fine, thanks a lot for this product !

For development and testing purpose, i use terraform RKE provider (based on rke lib) through an SSH bastion and i use terraform Kubernetes provider through a SOCKS proxy with HTTPS_PROXY env var. But, when rke create the rke-job-deployer ServiceAccount, the k8s client use local http proxy env vars through the SSH tunnel on the bastion, and cannot connect to the kubernetes controlplane.

terraform apply
...
module.k8s-full-stack.module.k8s_cluster.rke_cluster.cluster: Still creating... [3m40s elapsed]

Error: 
============= RKE outputs ==============
...
time="2021-04-19T11:15:38+02:00" level=info msg="[controlplane] Successfully started Controller Plane.."
time="2021-04-19T11:15:38+02:00" level=info msg="Using proxy environment variable HTTP_PROXY with value [socks5://localhost:51876]"
time="2021-04-19T11:15:38+02:00" level=info msg="Using proxy environment variable HTTPS_PROXY with value [socks5://localhost:51876]"
time="2021-04-19T11:15:38+02:00" level=info msg="[authz] Creating rke-job-deployer ServiceAccount"

Failed running cluster err:Failed to apply the ServiceAccount needed for job execution: Post "https://grimoire-2.nancy.grid5000.fr:6443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings?timeout=30s": proxyconnect tcp: Unable to access the service on localhost:51876. The service might be still starting up. Error: ssh: rejected: connect failed (Connection refused)

I propose to unset http proxy env vars when an SSH bastion is used in the RKE configuration.

After recompiling the terraform rke provider with this patched rke lib, all works fine :

terraform apply
...
module.k8s-full-stack.module.k8s_cluster.rke_cluster.cluster: Creating...
...
module.k8s-full-stack.module.k8s_cluster.rke_cluster.cluster: Creation complete after 3m11s [id=55602cee-a3e2-48fe-a164-c0d03f6401a8]
...
module.k8s-full-stack.kubernetes_namespace.metallb: Creating...
module.k8s-full-stack.kubernetes_namespace.rook-ceph: Creating...
module.k8s-full-stack.kubernetes_namespace.metallb: Creation complete after 0s [id=metallb]
module.k8s-full-stack.kubernetes_namespace.rook-ceph: Creation complete after 0s [id=rook-ceph]

Related to PR rancher/rke#2520

pmorillon commented 3 years ago

@superseb comment :

Please create an issue with this so we can use that to track it. This sounds like changing this breaks the use-case where it is needed from the bastion host? So I guess we need a flag that defaults to the current behavior and can be used to disable it?

By flag, do you mean a cli flag ? Or an option into cluster configuration at the bastion_host level ? like :

# Bastion/Jump host configuration
bastion_host:
    address: x.x.x.x
    user: ubuntu
    port: 22
    ssh_key_path: /home/user/.ssh/bastion_rsa
    ignore_proxy_env_vars: true # Default to false

So we can use it with Terraform RKE provider in block bastion_host block : https://github.com/rancher/terraform-provider-rke/blob/master/docs/resources/cluster.md#bastion_host

superseb commented 3 years ago

Yes, thats how I would do it so we don't break existing setups

pmorillon commented 3 years ago

I updated the PR #2520 to take into account this discussion

slickwarren commented 3 years ago

tested on rke 1-2-11 -- repro steps:

reopening for this issue:

superseb commented 3 years ago

The log included in the PR is "Unset http proxy environment variables", please share debug log from the rke up performed. And the environment variables used.

slickwarren commented 3 years ago

retested on 1.3.0-rc10: