solo-io / gloo

The Cloud-Native API Gateway and AI Gateway
https://docs.solo.io/
Apache License 2.0
4.1k stars 446 forks source link

Matching sslConfigs in multiple VSs cause listener filter chain merging and loss of sniDomains config #7739

Open bdecoste opened 1 year ago

bdecoste commented 1 year ago

Gloo Edge Version

1.12.x

Kubernetes Version

None

Describe the bug

I have 2 VSs with matching sslConfig except for different sniDomains:

spec:
  sslConfig:
    oneWayTls: true
    parameters:
      cipherSuites:
      - ECDHE-ECDSA-AES128-GCM-SHA256
      - ECDHE-RSA-AES128-GCM-SHA256
      - ECDHE-ECDSA-AES256-GCM-SHA384
      - ECDHE-RSA-AES256-GCM-SHA384
      - ECDHE-ECDSA-CHACHA20-POLY1305
      - ECDHE-RSA-CHACHA20-POLY1305
      minimumProtocolVersion: TLSv1_2
    secretRef:
      name: server-tls-foo
      namespace: gloo-system
    sniDomains:
    - bar.com
spec:
  sslConfig:
    oneWayTls: true
    parameters:
      cipherSuites:
      - ECDHE-ECDSA-AES128-GCM-SHA256
      - ECDHE-RSA-AES128-GCM-SHA256
      - ECDHE-ECDSA-AES256-GCM-SHA384
      - ECDHE-RSA-AES256-GCM-SHA384
      - ECDHE-ECDSA-CHACHA20-POLY1305
      - ECDHE-RSA-CHACHA20-POLY1305
      minimumProtocolVersion: TLSv1_2
    secretRef:
      name: server-tls-foo
      namespace: gloo-system

NOTE: there is no sniDomains in the second VS

Have tested with settings.gateway.isolateVirtualHostsBySslConfig: true and settings.gateway.isolateVirtualHostsBySslConfig: false and the behavior is the same.

This creates a single listener filter chain with the following. The bar.com server_name is lost.

       "filter_chains": [
        {
         "filter_chain_match": {},

Steps to reproduce the bug

  1. Create 2 VSs described above
  2. Note that there is only a single listener filter chain and one of the VS SNI config is lost.

Expected Behavior

Two listener filter chains are created preserving the sniDomains (or absence of)

Additional Context

No response

sam-heilbron commented 1 year ago

Our filter chain logic is where we attempt to decide which set of SslConfgs exist for a given listener. This calls out to Consolidate Ssl Configurations.

Here we assume that sslConfigs which differ only by SNI can be merged which seems to contain our logical error

bdecoste commented 1 year ago

This is also an issue if the 2 VSs have differing sniDomains:

spec:
  sslConfig:
    oneWayTls: true
    parameters:
      cipherSuites:
      - ECDHE-ECDSA-AES128-GCM-SHA256
      - ECDHE-RSA-AES128-GCM-SHA256
      - ECDHE-ECDSA-AES256-GCM-SHA384
      - ECDHE-RSA-AES256-GCM-SHA384
      - ECDHE-ECDSA-CHACHA20-POLY1305
      - ECDHE-RSA-CHACHA20-POLY1305
      minimumProtocolVersion: TLSv1_2
    secretRef:
      name: server-tls-foo
      namespace: gloo-system
    sniDomains:
    - foo.bar
  virtualHost:
    domains:
    - foo.com
    options:
      jwt:
        providers:
          valid:
            audiences:
            - solo
            claimsToHeaders:
            - claim: iss
              header: issuer
            - claim: sub
              header: subheader
            issuer: http://a9ef341cdcc0f4a5ab2c5c8a14b18940-177396470.us-west-1.elb.amazonaws.com:8080/auth/realms/solo
            jwks:
              remote:
                cacheDuration: 3600s
                upstreamRef:
                  name: keycloak
                  namespace: gloo-system
                url: http://jwt.example.com/auth/realms/solo/protocol/openid-connect/certs
            keepToken: true
            tokenSource:
              headers:
              - header: access_token
              - header: Authorization
                prefix: Bearer
spec:
  sslConfig:
    oneWayTls: true
    parameters:
      cipherSuites:
      - ECDHE-ECDSA-AES128-GCM-SHA256
      - ECDHE-RSA-AES128-GCM-SHA256
      - ECDHE-ECDSA-AES256-GCM-SHA384
      - ECDHE-RSA-AES256-GCM-SHA384
      - ECDHE-ECDSA-CHACHA20-POLY1305
      - ECDHE-RSA-CHACHA20-POLY1305
      minimumProtocolVersion: TLSv1_2
    secretRef:
      name: server-tls-foo
      namespace: gloo-system
    sniDomains:
    - bar.com
  virtualHost:
    domains:
    - bar.com
    options:
      extauth:
        configRef:
          name: keycloak-jwt-or-oidc
          namespace: gloo-system
      jwt:
        providers:
          valid:
            audiences:
            - solo
            claimsToHeaders:
            - claim: iss
              header: issuer
            - claim: sub
              header: subheader
            issuer: http://a9ef341cdcc0f4a5ab2c5c8a14b18940-177396470.us-west-1.elb.amazonaws.com:8080/auth/realms/solo
            jwks:
              remote:
                cacheDuration: 3600s
                upstreamRef:
                  name: keycloak
                  namespace: gloo-system
                url: http://foo.example.com/auth/realms/solo/protocol/openid-connect/certs
            keepToken: true
            tokenSource:
              headers:
              - header: access_token
              - header: Authorization
                prefix: Bearer

Note the differences between sniDomains and the JWT JWKS url

The resulting envoy config has a single filter chain with the JWT filter configured with:

"uri": "http://foo.example.com/auth/realms/solo/protocol/openid-connect/certs",

nfuden commented 1 year ago

Looks like its not the core snidomain differ logic https://github.com/solo-io/gloo/compare/poc/no-sni-no-problem?expand=1 Putting this here for whomever picks this up

bdecoste commented 1 year ago

waf.tar.gz

default has ModSecurity intervention!, duplicate has ModSecurity intervention!!. When I add duplicate then the config from default disappears.

sam-heilbron commented 1 year ago

After some investigation and attempts to reproduce, I have encountered the following:

  1. When I apply 2 VirtualServices, as documented above, with identical sslConfig except that one has SNI defined and the other does not, the produced Envoy configuration is a single FilterChain, with the FilterChainMatch empty.

However, the generated Http Filter is:

{
   "name":"io.solo.filters.http.solo_jwt_authn_staged",
   "typed_config":{
      "@type":"type.googleapis.com/envoy.config.filter.http.solo_jwt_authn.v2.JwtWithStage",
      "jwt_authn":{
         "providers":{
            "gloo-system_vs-east_valid":{
               "issuer":"http://a9ef341cdcc0f4a5ab2c5c8a14b18940-177396470.us-west-1.elb.amazonaws.com:8080/auth/realms/solo",
               "audiences":[
                  "solo"
               ],
               "remote_jwks":{
                  "http_uri":{
                     "uri":"http://jwt.example.com/auth/realms/solo/protocol/openid-connect/certs",
                     "cluster":"keycloak_gloo-system",
                     "timeout":"5s"
                  },
                  "cache_duration":"3600s"
               },
               "forward":true,
               "from_headers":[
                  {
                     "name":"access_token"
                  },
                  {
                     "name":"Authorization",
                     "value_prefix":"Bearer"
                  }
               ],
               "payload_in_metadata":"gloo-system_vs-east_valid"
            },
            "gloo-system_vs-west_valid":{
               "issuer":"http://a9ef341cdcc0f4a5ab2c5c8a14b18940-177396470.us-west-1.elb.amazonaws.com:8080/auth/realms/solo",
               "audiences":[
                  "solo"
               ],
               "remote_jwks":{
                  "http_uri":{
                     "uri":"http://foo.example.com/auth/realms/solo/protocol/openid-connect/certs",
                     "cluster":"keycloak_gloo-system",
                     "timeout":"5s"
                  },
                  "cache_duration":"3600s"
               },
               "forward":true,
               "from_headers":[
                  {
                     "name":"access_token"
                  },
                  {
                     "name":"Authorization",
                     "value_prefix":"Bearer"
                  }
               ],
               "payload_in_metadata":"gloo-system_vs-west_valid"
            }
         },
         "filter_state_rules":{
            "name":"stage0-filterState",
            "requires":{
               "gloo-system_vs-east":{
                  "provider_name":"gloo-system_vs-east_valid"
               },
               "gloo-system_vs-west":{
                  "provider_name":"gloo-system_vs-west_valid"
               }
            }
         }
      }
   }
}

Additionally, within the RouteConfiguration, details from both VirtualHosts are present and distinct (ie they have not been merged).

  1. When I attempted to apply a new VirtualService, also with no SNI domains defined, I ran into an error
    Validating *v1.VirtualService failed: validating *v1.VirtualService name:"default" namespace:"gloo-system": failed to validate Proxy [namespace: gloo-system, name: gateway-proxy] with gloo validation: Listener Error: SSLConfigError. Reason: Tried to apply multiple filter chains with the same FilterChainMatch {}. This is usually caused by overlapping sniDomains or multiple empty sniDomains in virtual services

This is to be expected, as we are attempting to create multiple FilterChains, each that match on everything (they are empty).

  1. I investigated the code, and couldn't find any reference to merging of configuration.

I have been unable to reproduce this behavior myself, and my next steps are to follow up directly to try to get reproduction steps

bdecoste commented 1 year ago

vs.tar.gz

Apply default-virtualservice.yaml. Note the SNI config is correctly applied in config_dump. Then apply duplicate-virtualservice.yaml. Note the SNI config is now gone in config_dump. Also note that the 2 different WAF configs are there.

sam-heilbron commented 1 year ago

We have been unable to reproduce this behavior. The linked PR fixes a similar issue in this space. Since we are unable to reproduce, I am moving this back to Triage and un-assigning myself.

cc @bdecoste @SantoDE

github-actions[bot] commented 4 months ago

This issue has been marked as stale because of no activity in the last 180 days. It will be closed in the next 180 days unless it is tagged "no stalebot" or other activity occurs.