ns1 / cert-manager-webhook-ns1

ACME webhook for NS1
Apache License 2.0
10 stars 21 forks source link

"error"="ns1.acme.nsone.net is forbidden #14

Closed haimari closed 3 years ago

haimari commented 3 years ago

Getting this RBAC Error when trying to create test certificate:

I0706 09:59:46.054056       1 dns.go:88] cert-manager/controller/challenges/Present "msg"="presenting DNS01 challenge for domain" "dnsName"="staging.test.com" "domain"="staging.test.com" "resource_kind"="Challenge" "resource_name"="staging.test.com-4bfrn-3780284384-1554266999" "resource_namespace"="stg" "resource_version"="v1" "type"="DNS-01"
E0706 09:59:46.055739       1 controller.go:158] cert-manager/controller/challenges "msg"="re-queuing item  due to error processing" "error"="ns1.acme.nsone.net is forbidden: User \"system:serviceaccount:cert-manager:cert-manager\" cannot create resource \"ns1\" in API group \"acme.nsone.net\" at the cluster scope" "key"="stg/staging.test.com-4bfrn-3780284384-1554266999"

Installation

Installed Via Flux2 using Helm (as described in the instructions)

Cert-manager Helm version tested with this: 1.0.4,1.0.3,1.0.2 cert-manager-webhook-ns1 Helm version: 0.4.0

kubernetes version: v1.20.6

Tried to set up the RoleBinding as mentioned in the instructions. And also tried multiple (many) variations of it with ClusterRole, etc...

For example I've tried :

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cert-manager-webhook-ns1:secret-reader
rules:
- apiGroups: ["", "acme.nsone.net"]
  resources: ["*"]
  resourceNames: ["*"]
  verbs: ["*"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cert-manager-webhook-ns1:secret-reader
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cert-manager-webhook-ns1:secret-reader
subjects:
  - apiGroup: ""
    kind: ServiceAccount
    name: cert-manager
    namespace: cert-manager

Still getting that Error mentioned above. Can you please advise ?

haimari commented 3 years ago

Update

I tried to remove everything: helm charts, CRDs, Role bindings etc... Reinstalled everything and now facing this issue:

I0706 14:02:37.809158       1 dns.go:88] cert-manager/controller/challenges/Present "msg"="presenting DNS01 challenge for domain" "dnsName"="staging.test.com" "domain"="staging.test.com" "resource_kind"="Challenge" "resource_name"="staging.test.com-k8jlx-1827472776-2468205786" "resource_namespace"="production" "resource_version"="v1" "type"="DNS-01"
E0706 14:02:37.812671       1 controller.go:158] cert-manager/controller/challenges "msg"="re-queuing item  due to error processing" "error"="the server could not find the requested resource (post ns1.acme.nsone.net)" "key"="production/staging.test.com-k8jlx-1827472776-2468205786"
I0706 14:03:57.812901       1 dns.go:88] cert-manager/controller/challenges/Present "msg"="presenting DNS01 challenge for domain" "dnsName"="staging.test.com" "domain"="staging.test.com" "resource_kind"="Challenge" "resource_name"="staging.test.com-k8jlx-1827472776-2468205786" "resource_namespace"="production" "resource_version"="v1" "type"="DNS-01"

kubectl get certificaterequest -A

NAMESPACE    NAME                                READY   AGE
production   staging.test.com-k8jlx   False   10m

kubectl describe certificaterequest staging.test.com-k8jlx -n production

Status:
  Conditions:
    Last Transition Time:  2021-07-06T14:01:30Z
    Message:               Waiting on certificate issuance from order production/staging.test.com-k8jlx-1827472776: "pending"
    Reason:                Pending
    Status:                False
    Type:                  Ready
Events:
  Type    Reason        Age   From          Message
  ----    ------        ----  ----          -------
  Normal  OrderCreated  10m   cert-manager  Created Order resource production/staging.test.com-k8jlx-1827472776

kubectl describe challenges -A

    Manager:    controller
    Operation:  Update
    Time:       2021-07-06T14:01:32Z
  Owner References:
    API Version:           acme.cert-manager.io/v1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  Order
    Name:                  staging.test.com-k8jlx-1827472776
    UID:                   bb69efa4-817c-4f01-8add-ae2580e218db
  Resource Version:        22516746
  UID:                     40c834a6-d857-496f-bf6f-747dceb84d4c
Spec:
  Authorization URL:  https://acme-v02.api.letsencrypt.org/acme/authz-v3/14583974184
  Dns Name:           staging.test.com
  Issuer Ref:
    Group:  cert-manager.io
    Kind:   ClusterIssuer
    Name:   letsencrypt-prod
  Key:      XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  Solver:
    dns01:
      Webhook:
        Config:
          API Key Secret Ref:
            Key:       apiKey
            Name:      ns1-credentials
          Endpoint:    https://api.nsone.net/v1/
          Ignore SSL:  false
        Group Name:    acme.nsone.net
        Solver Name:   ns1
  Token:               XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  Type:                DNS-01
  URL:                 https://acme-v02.api.letsencrypt.org/acme/chall-v3/14583974184/Hbk__Q
  Wildcard:            false
Status:
  Presented:   false
  Processing:  true
  Reason:      the server could not find the requested resource (post ns1.acme.nsone.net)
  State:       pending
Events:
  Type     Reason        Age                  From          Message
  ----     ------        ----                 ----          -------
  Normal   Started       15m                  cert-manager  Challenge scheduled for processing
  Warning  PresentError  5m23s (x8 over 15m)  cert-manager  Error presenting challenge: the server could not find the requested resource (post ns1.acme.nsone.net)
haimari commented 3 years ago

Update

Removed Helm and tried to install plain yaml files.

Now facing this error:

Error from server (InternalError): error when creating "certificate.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": x509: certificate signed by unknown authority

Tried to follow: https://github.com/jetstack/cert-manager/issues/2640#issuecomment-601872165

But not sure if this is the same issue. this now happens on 2 RKE clusters.

haimari commented 3 years ago

After checking more, found two issues:

  1. For some reason even after fixing everything (below) Flux has issues installing this Helm Chart, and it seems to be related to the Helm structure validation
  2. Had to manually delete CRDs + the following:

kubectl delete ValidatingWebhookConfiguration cert-manager-webhook

kubectl delete mutatingwebhookconfiguration cert-manager-webhook

Then installed the Helm manually, and now it successfully create challenge and resolves it. The certificate is Valid:

  commonName: staging.test.com
  dnsNames:
  - staging.test.com
  issuerRef:
    group: cert-manager.io
    kind: ClusterIssuer
    name: letsencrypt-prod
  secretName: staging.test.com-tls
status:
  conditions:
  - lastTransitionTime: "2021-07-07T06:40:49Z"
    message: Certificate is up to date and has not expired
    reason: Ready
    status: "True"
    type: Ready
  notAfter: "2021-10-05T05:40:47Z"
  notBefore: "2021-07-07T05:40:48Z"
  renewalTime: "2021-09-05T05:40:47Z"
  revision: 1

b.t.w cert-manager is 1.0.4