pulumi / pulumi-kubernetes

A Pulumi resource provider for Kubernetes to manage API resources and workloads in running clusters
https://www.pulumi.com/docs/reference/clouds/kubernetes/
Apache License 2.0
406 stars 115 forks source link

Installing traefik helm chart hangs with "Finding Pods to direct traffic to" message and then fails. #1456

Closed davidroth closed 3 years ago

davidroth commented 3 years ago

Problem description

I am trying to deploy traefik in an eks cluster via pulumi. Unfortunately, installing the traefik helm chart via pulumi does not work:

// Does not work because of CRDs
const traefik = new k8s.helm.v3.Chart("traefik", {
    chart : "traefik",
    repo: "traefik",
    namespace: "traefik",
    version: "9.14.0"
}, { provider: cluster.provider});

Errors & Logs

image

Hangs:

kubernetes:core/v1:Service     traefik          creating     [1/3] Finding Pods to direct traffic to

Unfortunately it is stuck in the step creating the "kubernetes:core/v1:Service" resource. Note that the service is created (e.x. kubectl svc -A lists the created treafik service), but pulumi hangs there for 5 minutes or so.

After a timeout, pulumi outputs the following error:

kubernetes:core/v1:Service (traefik):
    error: 3 errors occurred:
        * resource default/traefik was successfully created, but the Kubernetes API server reported that it failed to fully initialize or become live: 'traefik' timed out waiting to be Ready
        * Service does not target any Pods. Selected Pods may not be ready, or field '.spec.selector' may not match labels on any Pods
        * Service was not allocated an IP address; does your cloud provider support this?

  pulumi:pulumi:Stack (kubefleet-dev):
    I0202 09:11:33.118961   18008 request.go:655] Throttling request took 1.0286782s, request: GET:https://*******************.gr7.eu-central-1.eks.amazonaws.com/apis/apiregistration.k8s.io/v1beta1?timeout=32s
    I0202 09:11:43.121888   18008 request.go:655] Throttling request took 1.547302s, request: GET:https://://*******************.gr7.eu-central-1.eks.amazonaws.com/apis/networking.k8s.io/v1beta1?timeout=32s

    error: update failed

When checking the pods with kubectl, it looks like they have been created:

kubectl get pod -A:  traefik       traefik-788d868666-8j46d   1/1     Running   0          16m

But the service is in a wrong namespace and has no external ip:

kubectl get svc -A: **default**       traefik      LoadBalancer   172.20.***.**   <pending>     80:30729/TCP,443:31531/TCP   20m

Affected product version(s)

v2.19.0

Reproducing the issue

Application to reproduce:

Dependencies:

{
    "name": "kubefleet",
    "devDependencies": {
        "@types/node": "^10.0.0"
    },
    "dependencies": {
        "@pulumi/aws": "^3.26.1",
        "@pulumi/awsx": "^0.23.0",
        "@pulumi/eks": "^0.21.0",
        "@pulumi/kubernetes": "^2.7.8",
        "@pulumi/pulumi": "^2.19.0"
    }
}

Program:

import * as pulumi from "@pulumi/pulumi";
import * as awsx from "@pulumi/awsx";
import * as eks from "@pulumi/eks";
import * as k8s from "@pulumi/kubernetes";

const name = "kubefleet";

const vpc = new awsx.ec2.Vpc(name, {
    subnets: [{ type: "public" }, { type: "private" } ],
    tags: { Name: name },
});

async function getAllVpcSubnetIds(vpc: awsx.ec2.Vpc): Promise<pulumi.Output<string>[]>
{
    let publicIds = await vpc.publicSubnetIds;
    let privateIds = await vpc.privateSubnetIds;
    return publicIds.concat(privateIds);
}

const cluster = new eks.Cluster(name, {
    vpcId: vpc.id,
    subnetIds: getAllVpcSubnetIds(vpc),
    desiredCapacity: 2,
    minSize: 1,
    maxSize: 2,
    storageClasses: "gp2",
    nodeAssociatePublicIpAddress: false,
});

export const kubeconfig = cluster.kubeconfig;
export const clusterName = cluster.eksCluster.name;

function createKubernetesNamespace(name: string): k8s.core.v1.Namespace {
    return new k8s.core.v1.Namespace(
        name,
        { metadata: { name: name } },
        { provider: cluster.provider }
    );
}

const traefikNamespace = createKubernetesNamespace("traefik").metadata.apply(m => m.name);

const traefik = new k8s.helm.v3.Chart("traefik", {
    chart : "traefik",
    repo: "traefik",
    namespace: "traefik",
    version: "9.14.0"
}, { provider: cluster.provider});

Questions

What does this "Finding Pods to direct traffic to" message mean? The error remains, even when re-running "pulumi up":

image

lblackstone commented 3 years ago

I'm investigating a related issue right now, and I think the problem is specific to our v3 SDK (it appears that we're not creating resources that include Helm hooks).

I expect to have a fix soon, but you can also try using the v2 SDK in the meantime.

lblackstone commented 3 years ago

I confirmed that I can install the chart successfully with the latest release if the namespace isn't specified. I was seeing the same Service timeout that you reported if the namespace was set. From a cursory look with kubectl, it appears that the Service may not be properly selecting the Deployment in the traefik namespace you specified. Perhaps that needs to be set as part of the chart configuration?

I don't think this is a Pulumi bug at this point, so I'm going to close this out. Feel free to reopen if you are still having problems.

eljoth commented 3 years ago

I encounter a similar issue with the kube-prometheus-stack when the pods - which have to be created - are made from a CRD. the CRD is installed but the console gives the error, that the resource is not available and CRDs may be missing. So Pods do not come alive and the service is waiting infinitely for the pods to become alive. I've created an issue in the pulumi repo pulumi/pulumi#6326.