nirmata / kyverno-notation-aws

Kyverno extension service for Notation and the AWS signer
Apache License 2.0
11 stars 11 forks source link

failed to fetch data for APICall: failed to execute HTTP request for APICall response: Post #124

Closed vponoikoait closed 9 months ago

vponoikoait commented 9 months ago

"failed to load data" err="failed to fetch data for APICall: failed to execute HTTP request for APICall response: Post \"https://kyverno-notation-aws-fullname-override-svc:443/checkimage\": dial tcp 172.20.80.74:443: connect: connection timed out" logger="DefaultContextLoaderFactory" name="response"

My policy looks as following

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: validate-images-20
spec:
  validationFailureAction: Enforce
  failurePolicy: Fail
  background: true
  webhookTimeoutSeconds: 30
  schemaValidation: false
  rules:
    - name: call-aws-signer-extension
      match:
        any:
          - resources:
              namespaces:
                - my-namespace
              kinds:
                - Pod
              operations:
                - CREATE
                - UPDATE
      context:
        - name: tlscerts
          apiCall:
            urlPath: "/api/v1/namespaces/security/secrets/kyverno-notation-aws-fullname-override-svc.security.svc.tls-pair"
            jmesPath: "base64_decode( data.\"tls.crt\" )"
        - name: response
          apiCall:
            method: POST
            data:
              - key: namespace
                value: "{{request.namespace}}"
              - key: images
                value: "{{ request.object.spec.[ephemeralContainers, initContainers, containers][].image }}"
              - key: imageReferences
                value:
                  - "*"
            service:
              url: https://kyverno-notation-aws-fullname-override-svc:443/checkimage
              caBundle: '{{ tlscerts }}'
      validate:
        message: "THIS ISSUE RESPONSE: {{response}}"
        deny:
          conditions:
            all:
              - key: "{{ response.verified }}"
                operator: EQUALS
                value: false

And as well I have verified on my side that hostname is accessible & I am able to get certificate value I've as well tried just to CURL same service on which it does react As well, right now I have

  kyverno_admission_controller_image_verison = "v1.10.3"
  kyverno_helm_version = "3.1.3"

Installed in my cluster. Can you suggest, please, if there's additional debugging be possible from the Kyverno side?

As for installation of kyverno-notation-aws I've used helm chart, basically forked it my own repo & published current latest version

vponoikoait commented 9 months ago
curl -k https://kyverno-notation-aws-fullname-override-svc/checkimages -X POST -d '{"images": ["844333597536.dkr.ecr.us-west-2.amazonaws.com/kyverno-demo:v1"]}'

Response

Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 172.20.80.74:443...
* Connected to kyverno-notation-aws-fullname-override-svc (172.20.80.74) port 443 (#0)
* ALPN: offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=kyverno-notation-aws-fullname-override-svc
*  start date: Jan 18 16:48:03 2024 GMT
*  expire date: Jun 16 17:48:03 2024 GMT
*  issuer: CN=*.kyverno.svc
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* using HTTP/2
* h2h3 [:method: POST]
* h2h3 [:path: /checkimages]
* h2h3 [:scheme: https]
* h2h3 [:authority: kyverno-notation-aws-fullname-override-svc]
* h2h3 [user-agent: curl/8.0.1]
* h2h3 [accept: */*]
* h2h3 [content-length: 76]
* h2h3 [content-type: application/x-www-form-urlencoded]
* Using Stream ID: 1 (easy handle 0x7f93e31fba90)
> POST /checkimages HTTP/2
> Host: kyverno-notation-aws-fullname-override-svc
> user-agent: curl/8.0.1
> accept: */*
> content-length: 76
> content-type: application/x-www-form-urlencoded
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* We are completely uploaded and fine
< HTTP/2 401 
< content-type: text/plain; charset=utf-8
< x-content-type-options: nosniff
< content-length: 34
< date: Fri, 19 Jan 2024 10:43:35 GMT
< 
Authorization header not supplied
* Connection #0 to host kyverno-notation-aws-fullname-override-svc left intact
vponoikoait commented 9 months ago

So far, found that error occurs because of https://github.com/nirmata/kyverno-notation-aws/blob/main/main.go#L89 I'll try to use image built before this updated came in, assuming that following was added after initial README.md was written @vishal-chdhry can you suggest please if additional documentation for README.md may be added, so that https://github.com/nirmata/kyverno-notation-aws/blob/main/README.md#troubleshoot would work in same manner as README.md?

vishal-chdhry commented 9 months ago

To debug it using curl you will have to update the deployment and set the reviewKyvernoToken flag to false, that will disable to Kyverno token review.

I will update the documentation to add this, thanks for pointing this out

vponoikoait commented 9 months ago

@vishal-chdhry it seems that mine issue with token is

func isKyverno(username string) bool {
    return username == "system:serviceaccount:kyverno:kyverno-admission-controller" || username == "system:serviceaccount:kyverno:kyverno-reports-controller"
}

https://github.com/nirmata/kyverno-notation-verifier/blob/main/verifier/client.go#L248 Specifically, as it locks out kyverno namespace & SA accounts as well. Then, it means that any case if we're deploying kyverno in any other namespace rather then kyverno & having different SA name for kyverno components, we might potentially face issue

vponoikoait commented 9 months ago

I will meanwhile redeploy my kyverno to kyverno namespace, as it's assumed in source code, as I am not quite sure if I will be able to build it with mine CI fast enough to disable token check. It just should be faster. I will update you on results @vishal-chdhry

vishal-chdhry commented 9 months ago

Specifically, as it locks out kyverno namespace & SA accounts as well.

@vponoikoait That is a valid point, we can add a flag to specify kyverno SA name and namespace to allow flexibility 🤔

vponoikoait commented 9 months ago

@vishal-chdhry that seems like a valid point, but still if it's hardcoded and if this lib is used anywhere else wouldn't it basically mean that inter-component auth is broken as well? That's a separate concern on mine, as I am using different namespace rather then default one. If so, it would be a great thing to have this statement being put somewhere that non-default namespace wouldn't work for some specific use cases (versions) to avoid additional confusion

vponoikoait commented 9 months ago

Status after I've removed auth

curl -k https://kyverno-notation-aws-fullname-override-svc.my-namespace.svc.cluster.local:443/checkimages -X POST -d '{"images": ["844333597536.dkr.ecr.us-west-2.amazonaws.com/kyverno-demo:v1"]}'  -H 'Authorization: [value1]' -v
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 172.20.190.217:443...
* Connected to kyverno-notation-aws-fullname-override-svc.security.svc.cluster.local (172.20.190.217) port 443 (#0)
* ALPN: offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=kyverno-notation-aws-fullname-override-svc
*  start date: Jan 19 12:16:32 2024 GMT
*  expire date: Jun 17 13:16:32 2024 GMT
*  issuer: CN=*.kyverno.svc
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* using HTTP/2
* h2h3 [:method: POST]
* h2h3 [:path: /checkimages]
* h2h3 [:scheme: https]
* h2h3 [:authority: kyverno-notation-aws-fullname-override-svc.security.svc.cluster.local]
* h2h3 [user-agent: curl/8.0.1]
* h2h3 [accept: */*]
* h2h3 [authorization: [value1]]
* h2h3 [content-length: 76]
* h2h3 [content-type: application/x-www-form-urlencoded]
* Using Stream ID: 1 (easy handle 0x7ff6bebe7a90)
> POST /checkimages HTTP/2
> Host: kyverno-notation-aws-fullname-override-svc.security.svc.cluster.local
> user-agent: curl/8.0.1
> accept: */*
> authorization: [value1]
> content-length: 76
> content-type: application/x-www-form-urlencoded
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* We are completely uploaded and fine
< HTTP/2 406 
< content-type: text/plain; charset=utf-8
< x-content-type-options: nosniff
< content-length: 94
< date: Fri, 19 Jan 2024 20:01:59 GMT
< 
json: cannot unmarshal array into Go struct field RequestData.images of type types.ImageInfos
* Connection #0 to host kyverno-notation-aws-fullname-override-svc.security.svc.cluster.local left intact

Application side

2024-01-19T20:01:52.413Z    INFO    verifier/client.go:184  failed to decode {"images": ["844333597536.dkr.ecr.us-west-2.amazonaws.com/kyverno-demo:v1"]}: json: cannot unmarshal array into Go struct field RequestData.images of type types.ImageInfos

I will try to figure out what's wrong with the payload Kyverno -> AWS Signer extension communication still doesn't work though at all

E0119 20:28:42.291762       1 deferred.go:42] "failed to load data" err="failed to fetch data for APICall: failed to execute HTTP request for APICall response: Post \"https://kyverno-notation-aws-fullname-override-svc.security.svc.cluster.local:443/checkimages\": dial tcp 172.20.190.217:443: connect: connection timed out" logger="DefaultContextLoaderFactory" name="response"
vponoikoait commented 9 months ago

I've also got

policy validate-images-21/autogen-call-aws-signer-extension error: failed to check deny conditions: failed to substitute variables in condition key: failed to resolve response.verified at path : failed to fetch data for APICall: failed to execute HTTP request for APICall response: Post "http://kyverno-notation-aws-fullname-override-svc-mux.security.svc.cluster.local:443/checkimages": dial tcp 172.20.141.27:443: connect: connection timed out

When added Mux for endpoint, so it makes sense that there's a specifically issue with my kyverno version I'll try to update to latest version and let you know about results

vponoikoait commented 9 months ago

So, basically, initial issue -

... dial tcp 172.20.80.74:443: connect: connection timed out" logger="DefaultContextLoaderFactory" name="response"

Was related to the network, specifically to security groups configuration.
Current issue with the policy is next:

2024-01-20T12:57:09.276Z    INFO    verifier/client.go:184  failed to decode {"imageReferences":["*"],"images":["000008000354.dkr.ecr.eu-west-1.amazonaws.com/vponoiko_test_com-mgmt-ecr-django-ai-site:84575e5440c65ca01c9b53ea1e30d51e1800946d"],"namespace":"django-ai-website"}

I will keep posting in this issue, in case if somebody will later search for related errors, so they will have additional context or possible solution in following comments. Current policy state

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: validate-images-21
spec:
  validationFailureAction: Enforce
  failurePolicy: Fail
  background: true
  webhookTimeoutSeconds: 30
  schemaValidation: false
  rules:
    - name: call-aws-signer-extension
      match:
        any:
          - resources:
              namespaces:
                - django-ai-website
              kinds:
                - Pod
              operations:
                - CREATE
                - UPDATE
      context:
        - name: tlscerts
          apiCall:
            urlPath: "/api/v1/namespaces/security/secrets/kyverno-notation-aws-fullname-override-svc.security.svc.tls-pair"
            jmesPath: "base64_decode( data.\"tls.crt\" )"
        - name: response
          apiCall:
            method: POST
            data:
              - key: namespace
                value: "{{request.namespace}}"
              - key: images
                value: "{{ request.object.spec.[ephemeralContainers, initContainers, containers][].image }}"
              - key: imageReferences
                value:
                  - "*"
            service:
              url:  http://kyverno-notation-aws-fullname-override-svc.namespace.svc.cluster.local:443/checkimages
              caBundle: '{{ tlscerts }}'
      validate:
        message: "THIS ISSUE RESPONSE: {{response}}"
        deny:
          conditions:
            all:
              - key: "{{ response.verified }}"
                operator: EQUALS
                value: false

And

kyverno_admission_controller_image_verison = "v1.10.3"
kyverno_admission_controller_sa_name = "kyverno-admission-controller"
kyverno_helm_version = "3.1.3"
kyverno_namespace = "kyverno"

Build is current latest based on with auth token check disabled

vponoikoait commented 9 months ago

There're few more issues I would like to higlight in this thread, so these could be distributed further and be covered if there would be enough resources to cover these Issue 1 AWS Official guide ain't updated accordingly to current situation in a repository, as processing changed

              - key: images
                value: "{{ request.object.spec.[ephemeralContainers, initContainers, containers][].image }}"

Doesn't work anymore, which may result in confusion further More details: https://aws.amazon.com/blogs/containers/announcing-container-image-signing-with-aws-signer-and-amazon-eks/ Proposal: ask for update on AWS side, or prepare a separate fully working guide on side of Kyverno site. Currently, it's complicated, as guide listed on the Kyverno site within the blog Issue 2 Most of the variables inside of the application itself aren't configurable which creates some issues - it requires to actually rebuild from mine side in order to change some flags of an app which creates additional overhead to usage of this particular integration & debugging it Example: https://github.com/nirmata/kyverno-notation-aws/blob/main/main.go#L59

    var flagNotationPluginConfigMap string
    flag.StringVar(&flagNotationPluginConfigMap, "pluginConfigMap", "notation-plugin-config", "ConfigMap with notation plugin configuration")

Is hardcoded, but we still have option to configure its name on the side of the helm chart which leads to errors. Following configmap map name is overridable with the fullname override.

failed to fetch plugin configmap notation-plugin-config: configmap "notation-plugin-config" not found

There's actually one more issue with the

Verification failed with error failed to create notation verifier: no trust policy found for trust policy aws-signer-trust-policy

As no trust policy is created by default, but I guess it's somewhere okay. Issue 3 There's no actual release of helm chart done despite the fact that some preparation are done for it - it's not clear where from I can fetch helm chart if I am using ArgoCD or any other tool which requires me to make my publish of helm chart. https://github.com/nirmata/kyverno-notation-aws/actions/workflows/helm-release.yaml - 0 workflow run

vponoikoait commented 9 months ago

Worth highlighting, end configuration which works for me is

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: validate-images-
spec:
  validationFailureAction: Enforce
  failurePolicy: Fail
  background: true
  webhookTimeoutSeconds: 30
  schemaValidation: false
  rules:
    - name: call-aws-signer-extension
      match:
        any:
          - resources:
              namespaces:
                - myapp-namespace
              kinds:
                - Pod
              operations:
                - CREATE
                - UPDATE
      context:
        - name: tlscerts
          apiCall:
            urlPath: "/api/v1/namespaces/kyverno/secrets/kyverno-notation-aws-fullname-override-svc.security.svc.tls-pair"
            jmesPath: "base64_decode( data.\"tls.crt\" )"
        - name: response
          apiCall:
            method: POST
            data:
              - key: namespace
                value: "{{request.namespace}}"
              - key: images
                value: "{{images}}"
              - key: trustPolicy
                value: "aws-signer-trust-policy"
              - key: imageReferences
                value:
                  - "*"
            service:
              url:  https://kyverno-notation-aws-fullname-override-svc.security:443/checkimages
              caBundle: '{{ tlscerts }}'
      validate:
        message: "THIS ISSUE RESPONSE: {{response}}"
        deny:
          conditions:
            all:
              - key: "{{ response.verified }}"
                operator: EQUALS
                value: false

with

apiVersion: notation.nirmata.io/v1alpha1
kind: TrustPolicy
metadata:
  name: aws-signer-trust-policy
spec:
  version: '1.0'
  trustPolicies:
    - name: aws-signer-trust-policy
      registryScopes:
        - "*"
      signatureVerification:
        level: strict
        override: {}
      trustStores:
        - signingAuthority:aws-signer-ts
      trustedIdentities:
        - "arn:aws:signer:eu-west-1:310003200000:/signing-profiles/my-profile"
  trustPolicyName: aws-signer-trust-policy

And

apiVersion: notation.nirmata.io/v1alpha1
kind: TrustStore
metadata:
  name: aws-signer-ts
spec:
  trustStoreName: aws-signer-ts
  type: signingAuthority
  caBundle: |-
    -----BEGIN CERTIFICATE-----
    MIICWTCCAd6gAwIBAgIRAMq5Lmt4rqnUdi8qM4eIGbYwCgYIKoZIzj0EAwMwbDEL
    MAkGA1UEBhMCVVMxDDAKBgNVBAoMA0FXUzEVMBMGA1UECwwMQ3J5cHRvZ3JhcGh5
    MQswCQYDVQQIDAJXQTErMCkGA1UEAwwiQVdTIFNpZ25lciBDb2RlIFNpZ25pbmcg
    Um9vdCBDQSBHMTAgFw0yMjEwMjcyMTMzMjJaGA8yMTIyMTAyNzIyMzMyMlowbDEL
    MAkGA1UEBhMCVVMxDDAKBgNVBAoMA0FXUzEVMBMGA1UECwwMQ3J5cHRvZ3JhcGh5
    MQswCQYDVQQIDAJXQTErMCkGA1UEAwwiQVdTIFNpZ25lciBDb2RlIFNpZ25pbmcg
    Um9vdCBDQSBHMTB2MBAGByqGSM49AgEGBSuBBAAiA2IABM9+dM9WXbVyNOIP08oN
    IQW8DKKdBxP5nYNegFPLfGP0f7+0jweP8LUv1vlFZqVDep5ONus9IxwtIYBJLd36
    5Q3Z44Xnm4PY/wSI5xRvB/m+/B2PHc7Smh0P5s3Dt25oVKNCMEAwDwYDVR0TAQH/
    BAUwAwEB/zAdBgNVHQ4EFgQUONhd3abPX87l4YWKxjysv28QwAYwDgYDVR0PAQH/
    BAQDAgGGMAoGCCqGSM49BAMDA2kAMGYCMQCd32GnYU2qFCtKjZiveGfs+gCBlPi2
    Hw0zU52LXIFC2GlcvwcekbiM6w0Azlr9qvMCMQDl4+Os0yd+fVlYMuovvxh8xpjQ
    NPJ9zRGyYa7+GNs64ty/Z6bzPHOKbGo4In3KKJo=
    -----END CERTIFICATE-----

Where -----BEGIN CERTIFICATE----- MIICWTCCAd6gAwIBAgIRAMq5Lmt4rqnUdi8qM4eIGbYwCgYIKoZIzj0EAwMwbDEL MAkGA1UEBhMCVVMxDDAKBgNVBAoMA0FXUzEVMBMGA1UECwwMQ3J5cHRvZ3JhcGh5 ... ... -----END CERTIFICATE----- Is certificate I've downloaded from here https://docs.aws.amazon.com/signer/latest/developerguide/image-signing-prerequisites.html To be more specific is valid link by the time I am adding this comment Trust store and root certificate Some of issues which has occured I have listed so far in this thread @vishal-chdhry please, check these if you'd have some time for fixes there it may create much smoother experience with integration of this repository