Don't fallback to localhost cluster

ikarlashov commented 3 years ago

Hi folks,

We have Gitlab-CI runners running pipeline in eks cluster. Whenever k8s provider can't establish connection to desired cluster through k8s provider config block - it falls back to localhost and trying to mess up cluster where pipeline runs. It's very dangerous behavior and should be enabled EXPLICITLY in k8s provider settings (if there's a real usecase for it).

Terraform Version, Provider Version and Kubernetes Version

Terraform version: 1.0.1
Kubernetes provider version: 2.6.1
Kubernetes version: 1.19

Affected Resource(s)

Authentication mechanism for provider

Debug Output

Fallback to localhost: https://gist.github.com/ikarlashov/7af79c1225e9383bd6ca135cca2e0aa3

Steps to Reproduce

Misconfigured settings for k8s in k8s provider block

Expected Behavior

Fail and error message (like it does when runs in non-k8s enviro)

Actual Behavior

Trying to mess up wrong cluster

jrhouston commented 3 years ago

Thanks for opening this @ikarlashov. This seems to be the default behaviour of client-go (we don't set any explicit configuration for InCluster config, it's just what happens if no options are specified and the client is inside a cluster). I need to investigate if there is a way to disable this and make it configurable.

Is the KUBERNETES_MASTER environment being set in the pod you are running Terraform in? A workaround here may be to unset that variable before Terraform runs.

ikarlashov commented 3 years ago

@jrhouston no problem :)

I don't think there's such env variable. I execed to the gitlab-runner pod and there're the following k8s-related vars:

KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT=tcp://172.20.0.1:443
KUBERNETES_SERVICE_PORT=443
FF_USE_LEGACY_KUBERNETES_EXECUTION_STRATEGY=false
KUBERNETES_SERVICE_HOST=172.20.0.1
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_SERVICE_PORT_HTTPS=443
KUBERNETES_PORT_443_TCP_ADDR=172.20.0.1
KUBERNETES_PORT_443_TCP=tcp://172.20.0.1:443

jrhouston commented 3 years ago

@ikarlashov Can you share some more information about how you are configuring the provider block in your Terraform config? After investigating it seems like you shouldn't fall back to the in-cluster config unless the provider block ends up with empty values.

jrhouston commented 3 years ago

Looks like client-go uses KUBERNETES_SERVICE_PORT and KUBERNETES_SERVICE_HOST to get the in-cluster config here. You could try unsetting those as a workaround for now.

chandankashyap19 commented 3 years ago

facing the same issue in our own environment. Kubernetes provider is working fine with tf 0.13 but not at tf 1.0.x. It is falling back to localhost. Our cluster is AWS EKS.

Used configurations :

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws-iam-authenticator"
    args = [
      "token",
      "-i",
      aws_eks_cluster.main.name,
      "--role",
      xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,
    ]
  }
}

Error : Error: Get "http://localhost/api/v1/namespaces/xxxxxxxx": dial tcp 127.0.0.1:80: connect: connection refused

simwak commented 2 years ago

In our case, it even tries to connect to a completly different service (NoMachine webinterface). Because it runs on localhost and has a redirect. And that even when the cluster endpoint is available.

Get "https://127.0.0.1/nxwebplayer": x509: cannot validate certificate for 127.0.0.1 because it doesn't contain any IP SANs
with module.eks.module.eks.kubernetes_config_map.aws_auth[0],
on .terraform/modules/eks.eks/main.tf line 298, in resource "kubernetes_config_map" "aws_auth":
298: resource "kubernetes_config_map" "aws_auth"

On Helm it says the following. Somehow the configuration went missing.

Kubernetes cluster unreachable: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable

apeabody commented 2 years ago

Hi Team - We have observed the related issue when the provider "kubernetes" {} block is omitted, resulting in the unexpected behavior of the provider attempting to contact localhost. For a UX standpoint an invalid configuration error or warning for omitted values would be strongly preferable rather than silently falling back to localhost.

Terraform version: 1.1.6 Kubernetes provider version: v2.10.0_x5

kaihendry commented 2 years ago

How does one check the kubernetes configuration? https://www.reddit.com/r/Terraform/comments/vsme03/how_do_i_verify_the_kubernetes_provider/if25eb2/?context=3

streamnsight commented 1 year ago

I have a similar issue with kubernetes provider on a different cloud provider: the interesting part is that the provider config works fine on first run, but then on subsequent plan or apply, it fails with this issue. It seems like it is just not re-runing the 'exec' block, therefore there is no config / no token, and it defaults to nothing which somehow turns into localhost

the problem lies in the exec block issue though...

casualuser commented 1 year ago

hello here! any news about this issue? it looks like I hit the same as described here and in #2127

github-actions[bot] commented 6 days ago

Marking this issue as stale due to inactivity. If this issue receives no comments in the next 30 days it will automatically be closed. If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. This helps our maintainers find and focus on the active issues. Maintainers may also remove the stale label at their discretion. Thank you!

hashicorp / terraform-provider-kubernetes