solo-io / gloo

The Feature-rich, Kubernetes-native, Next-Generation API Gateway Built on Envoy
https://docs.solo.io/
Apache License 2.0
4.08k stars 438 forks source link

Gateway proxy restart causes all routes to fail when having misconfiguration on responseHeadersToAdd #9013

Open edubonifs opened 9 months ago

edubonifs commented 9 months ago

Gloo Edge Product

Enterprise

Gloo Edge Version

1.15.x

Kubernetes Version

1.26

Describe the bug

Missconfiguration on one VirtualService by not using right envoy flag on responseHeadersToAdd is causing all VirtualService to fail when gateway-proxy is restarted

Expected Behavior

Missconfigured VirtualService should report misconfiguration, but this shouldn't affect to other VirtualServices. A customer would expect that misconfiguration on one VirtualService doesn't affect every Route

Steps to reproduce the bug

Just deploy one VirtualService that works:

apiVersion: gateway.solo.io/v1
kind: VirtualService
metadata:
  name: auth-tutorial
  namespace: gloo-system
spec:
  virtualHost:
    domains:
    - '*'
    routes:
    - matchers:
      - prefix: /
      routeAction:
        single:
          upstream:
            name: httpbin-httpbin-8000
            namespace: gloo-system

Then, deploy another VirtualService with an envoy flag that doesn't exist, in this case(%RES(x-original-host)%):

apiVersion: gateway.solo.io/v1
kind: VirtualService
metadata:
  name: edu
  namespace: gloo-system
spec:
  virtualHost:
    domains:
    - edu
    routes:
    - matchers:
      - prefix: /
      options:
        headerManipulation:
          responseHeadersToAdd:
          - header:
              key: location
              value: https://login.staging.drdropin.com/login?redirectUrl=%RES(x-original-host)%
      routeAction:
        single:
          upstream:
            name: httpbin-httpbin-8000
            namespace: gloo-system

If you run glooctl check, you will see that proxies report errors:

% glooctl check
----------
glooctl binary version (1.14.6) differs from server components (v1.15.17) by at least a minor version.
Consider running:
glooctl upgrade --release=v1.15.17
----------

Checking deployments... OK
Checking pods... OK
Checking upstreams... OK
Checking upstream groups... OK
Checking auth configs... OK
Checking rate limit configs... OK
Checking VirtualHostOptions... OK
Checking RouteOptions... OK
Checking secrets... OK
Checking virtual services... OK
Checking gateways... OK
Checking proxies... 1 Errors!
Gloo has detected that the data plane is out of sync. The following types of resources have not been accepted: [{resource="type.googleapis.com/envoy.config.route.v3.RouteConfiguration"}]. Gloo will not be able to process any other configuration updates until these errors are resolved.
Problem while checking for gloo xds errors
Error: 2 errors occurred:
    * An update to your gateway-proxy deployment was rejected due to schema/validation errors. The envoy_http_rds_update_rejected{envoy_rds_route_config="listener-__-8080-routes",envoy_http_conn_manager_prefix="http"} metric increased.
You may want to try using the `glooctl proxy logs` or `glooctl debug logs` commands.

    * Problem while checking for gloo xds errors

However, other VirtualService are still working.

If you restart the gateway-proxy, no VirtualService will work unless you remove the wrong configuration:

% curl http://a6cb968159450437eba36b7c7e021c32-861695558.us-east-1.elb.amazonaws.com/get -H "original-location: www.google.com" -i
HTTP/1.1 404 Not Found
date: Fri, 22 Dec 2023 08:37:52 GMT
server: envoy
content-length: 0

Additional Environment Detail

No response

Additional Context

No response

Related Issues

nfuden commented 9 months ago

This is another case where our validation is not strict enough. If we ever create a configuration that is rejected by envoy then we will have this upgrade behavior

github-actions[bot] commented 3 months ago

This issue has been marked as stale because of no activity in the last 180 days. It will be closed in the next 180 days unless it is tagged "no stalebot" or other activity occurs.