vmware / cluster-api-provider-cloud-director

Cluster API Provider for VMware Cloud Director. The project is an open source implementation of K8s ClusterAPI project and allows customers to provision resources directly from VMware Cloud Director. It enables Cloud Director powered Clouds to be treated as yet-another-cloud in the multi-cloud journey for VMware Cloud Providers.
Apache License 2.0
38 stars 36 forks source link

Some API call are not using https_proxy/no_proxy set at controller pod level. #648

Open FrancoisKlieberOrange opened 6 months ago

FrancoisKlieberOrange commented 6 months ago

Describe the bug

It appears that not all calls made by cluster-api-provider-cloud-director are utilizing the proxy settings defined in the environment variables.

Environment Details:

Here is the deployment configuration:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    cluster.x-k8s.io/provider: infrastructure-vcd
    clusterctl.cluster.x-k8s.io: ""
    control-plane: controller-manager
  name: capvcd-controller-manager
  namespace: capvcd-system
spec:

  replicas: 1
  selector:
    matchLabels:
      cluster.x-k8s.io/provider: infrastructure-vcd
      control-plane: controller-manager
  template:
    metadata:
      labels:
        cluster.x-k8s.io/provider: infrastructure-vcd
        control-plane: controller-manager
    spec:
      containers:
      - command:
        - /opt/vcloud/bin/cluster-api-provider-cloud-director
        env:
        - name: https_proxy
          value: <proxy settings>
        - name: no_proxy
          value: localhost,.svc,.cluster.local,.svc.cluster.local,<pod cidr>,<service cidr>
        image: projects.registry.vmware.com/vmware-cloud-director/cluster-api-provider-cloud-director:v1.3.0
        imagePullPolicy: IfNotPresent

Issue Observed:

During the load balancer creation step, the request is being resolved by CoreDNS (10.96.0.10:53) instead of using the proxy. Below is the error message encountered:

Reconciler error    
{"controller": "vcdcluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "error": "failed to create gateway manager using the workload client to reconcile cluster [<cluster name>]: error caching gateway related details: [unable to get OVDC network [<network name>]: [unable to get all ovdc networks: [<nil>]  : [Get \"https://<vcd>/cloudapi/1.0.0/orgVdcNetworks?page=1&pageSize=32\": dial tcp: lookup <vcd> on 10.96.0.10:53: read udp 10.244.0.13:36578->10.96.0.10:53: i/o timeout]]]", "errorVerbose": "error caching gateway related details: [unable to get OVDC network [<network name>]: [unable to get all ovdc networks: [<nil>]: [Get \"https://<vcd>/cloudapi/1.0.0/orgVdcNetworks?page=1&pageSize=32\": dial tcp: lookup <vcd> on 10.96.0.10:53: read udp 10.244.0.13:36578->10.96.0.10:53: i/o timeout]]]\nfailed to create gateway manager using the workload client to reconcile cluster [<cluster name>]

Interestingly, other API calls are successful, such as token creation and determining which API version to use. These calls fail if the proxy is not set in the environment variables, indicating that some calls are respecting the proxy settings:

auth.go:50] Using VCD OpenAPI version [37.2]
client.go:201] Client is sysadmin: [false]  

Additional Information:

Reproduction steps

  1. Deploy a cluster in an environment that requires a proxy to connect to Cloud Director.
  2. Set the https_proxy and no_proxy environment variables in the cluster-api-provider-cloud-director deployment.
  3. Observe the error during the load balancer creation step, as detailed above.
  4. Deploy the same cluster in an environment that does not require a proxy and observe that all API calls are successful.

Expected behavior

All API calls made by cluster-api-provider-cloud-director should utilize the proxy settings defined in the environment variables.

Additional context

No response

rocknes commented 4 months ago

If we search for http.Client{ in the cpi-vcd 1.6.1 (https://github.com/vmware/cloud-provider-for-cloud-director/tree/1.6.z) code base, we will see that proxy setup is inconsistent during http client creation.

vcdsdk/client.go Method : RefreshBearerToken https://github.com/vmware/cloud-provider-for-cloud-director/blob/a0a0e916a5eda50705f9f3e3b7da8471bd6ff763/pkg/vcdsdk/client.go#L113 https://github.com/vmware/cloud-provider-for-cloud-director/blob/a0a0e916a5eda50705f9f3e3b7da8471bd6ff763/pkg/vcdsdk/client.go#L125

vcdsdk/client.go Method : NewVCDClientFromSecrets -> vcdsdk/auth.go Method : GetSwaggerClientFromSecrets https://github.com/vmware/cloud-provider-for-cloud-director/blob/a0a0e916a5eda50705f9f3e3b7da8471bd6ff763/pkg/vcdsdk/auth.go#L99

This will need a fix in cpi-vcd and then that fix needs to be consumed in capvcd