vmware-tanzu / community-edition

VMware Tanzu Community Edition is no longer an actively maintained project. Code is available for historical purposes only.
https://tanzucommunityedition.io/
Apache License 2.0
1.34k stars 306 forks source link

Contour fails to install if envoy.service.type is set to ClusterIP. #1675

Closed GrahamDumpleton closed 2 years ago

GrahamDumpleton commented 2 years ago

Bug Report

Documentation for Contour package at:

says for the envoy.service.type option:

The type of Kubernetes service to provision for Envoy. Valid values are LoadBalancer, NodePort, and ClusterIP.

Using a type of ClusterIP results in a failure when installing the Contour package as it fails some sort of validation step done by kapp.

The values file contained:

$ cat contour-config.yaml 
envoy:
  service:
    type: ClusterIP
contour:
  replicas: 1

and the tanzu package install command failed as follows:

$ tanzu package install contour --package-name contour.community.tanzu.vmware.com --version 1.17.1 --values-file ~/exercises/contour-config.yaml
- Installing package 'contour.community.tanzu.vmware.com' 
| Getting namespace 'default' 
| Getting package metadata for 'contour.community.tanzu.vmware.com' 
| Creating service account 'contour-default-sa' 
| Creating cluster admin role 'contour-default-cluster-role' 
| Creating cluster role binding 'contour-default-cluster-rolebinding' 
| Creating secret 'contour-default-values' 
- Creating package resource 
- Package install status: Reconciling 

Please consider using 'tanzu package installed update' to update the installed package with correct settings

Error: package reconciliation failed: kapp: Error: Applying create service/envoy (v1) namespace: projectcontour:
  Creating resource service/envoy (v1) namespace: projectcontour:
    Service "envoy-x-projectcontour-x-lab-tce-w07-s001-vcluster" is invalid: spec.externalTrafficPolicy:
      Invalid value: "Local":
        ExternalTrafficPolicy can only be set on NodePort and LoadBalancer service (reason: Invalid)
Usage:
  tanzu package install INSTALLED_PACKAGE_NAME --package-name PACKAGE_NAME --version VERSION [flags]

Examples:

    # Install package contour with installed package name as 'contour-pkg' with specified version and without waiting for package reconciliation to complete 
    tanzu package install contour-pkg --package-name contour.tanzu.vmware.com --namespace test-ns --version 1.15.1-tkg.1-vmware1 --wait=false

    # Install package contour with kubeconfig flag and waiting for package reconcilaition to complete
    tanzu package install contour-pkg --package-name contour.tanzu.vmware.com --namespace test-ns --version 1.15.1-tkg.1-vmware1 --kubeconfig path/to/kubeconfig

Flags:
      --create-namespace              Create namespace if the target namespace does not exist, optional
  -h, --help                          help for install
      --kubeconfig string             The path to the kubeconfig file, optional
  -n, --namespace string              Target namespace to install the package, optional (default "default")
  -p, --package-name string           Name of the package to be installed
      --poll-interval duration        Time interval between subsequent polls of package reconciliation status, optional (default 1s)
      --poll-timeout duration         Timeout value for polls of package reconciliation status, optional (default 5m0s)
      --service-account-name string   Name of an existing service account used to install underlying package contents, optional
  -f, --values-file string            The path to the configuration values file, optional
  -v, --version string                Version of the package to be installed
      --wait                          Wait for the package reconciliation to complete, optional. To disable wait, specify --wait=false (default true)

Global Flags:
      --log-file string   Log file path
      --verbose int32     Number for the log level verbosity(0-9)

Error: exit status 1

✖  exit status 1

Expected Behavior

Documentation says that ClusterIP is a valid value so it should work.

If it isn't actually supported, then the documentation is wrong.

For use case being tested had required ClusterIP to work as advertised so the preferred outcome is that ClusterIP works.

Steps to Reproduce the Bug

Install Contour package using values file shown above.

Environment Details

$ tanzu version
version: v0.2.0
buildDate: 2021-09-10
sha: f5a8717
$ uname -a
Linux lab-tce-w07-s001-57bcf47764-2mftn 4.14.231-173.361.amzn2.x86_64 #1 SMP Mon Apr 26 20:57:08 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

cc @jorgemoralespou

jorgemoralespou commented 2 years ago

ExternalTrafficPolicy can only be set on NodePort and LoadBalancer service means that if ClusterIP is going to be a valid value, there needs to be an overlay that will remove this entry. Something like:

#@overlay/match by=overlay.subset({"kind":"Service", "metadata": {"name": "envoy"}})
---
spec:
#@overlay/remove
#@overlay/match missing_ok=True
  externalTrafficPolicy: Local

cc @vmware-tanzu/pkg-contour-owners

GrahamDumpleton commented 2 years ago

Note one question is whether it should automatically change the externalTrafficPolicy when the service type is changed. There is actually a separate envoy.service.externalTrafficPolicy setting and so technically a user could override it in the configuration at the same time. If one takes this stance, then the issue is really that the documentation needs to make clear that if envoy.service.type is changed to ClusterIP that it is the users responsibility to separately override envoy.service.externalTrafficPolicy and set it to Cluster, presuming that works.

GrahamDumpleton commented 2 years ago

Actually, that will not work, as the issue is externalTrafficPolicy being set to any value when ClusterIP is chosen, not its particular value:

Error: package reconciliation failed: kapp: Error: Applying create service/envoy (v1) namespace: projectcontour:
  Creating resource service/envoy (v1) namespace: projectcontour:
    Service "envoy-x-projectcontour-x-lab-tce-w07-s007-vcluster" is invalid: spec.externalTrafficPolicy:
      Invalid value: "Cluster":
        ExternalTrafficPolicy can only be set on NodePort and LoadBalancer service (reason: Invalid)

So an overlay to remove it when ClusterIP specifically is set would be required, else envoy.service.externalTrafficPolicy would have to allow being set to null to have it not added.

GrahamDumpleton commented 2 years ago

FWIW, setting envoy.service.externalTrafficPolicy to either null or empty string doesn't work either as defaults back to Local.

skriss commented 2 years ago

Yep seems like a bug, we should not set externalTrafficPolicy when the service type is ClusterIP. I will work on a patch.