karmada-io / karmada

Open, Multi-Cloud, Multi-Cluster Kubernetes Orchestration
https://karmada.io
Apache License 2.0
4.11k stars 805 forks source link

create karmada 1.7.0 by operator offline error #4125

Closed MolisXYliu closed 5 months ago

MolisXYliu commented 6 months ago

create karmada 1.7.0 by operator on offline env fail the operator log is

E1012 07:47:21.428138       1 planner.go:93] "failed to executed the workflow" err="failed to install etcd component, err: error when creating etcd client service, err: Service \"karmada-etcd\" is invalid: metadata.resourceVersion: Invalid value: \"\": must be specified for an update" workflow=init karmada="karmada-system/karmada"
I1012 07:47:21.435441       1 controller.go:51] "Finished syncing karmada" karmada="karmada-system/karmada" duration="71.770163ms"
E1012 07:47:21.435477       1 controller.go:324] "Reconciler error" err="failed to install etcd component, err: error when creating etcd client service, err: Service \"karmada-etcd\" is invalid: metadata.resourceVersion: Invalid value: \"\": must be specified for an update" controller="karmada" controllerGroup="operator.karmada.io" controllerKind="Karmada" Karmada="karmada-system/karmada" namespace="karmada-system" name="karmada" reconcileID=4e417ee7-25e7-4330-8d8c-5580a391460a
I1012 07:47:31.675809       1 controller.go:49] "Started syncing karmada" karmada="karmada-system/karmada" startTime="2023-10-12 07:47:31.675788066 +0000 UTC m=+536.870029385"
I1012 07:47:31.675902       1 controller.go:84] "Reconciling karmada" name="karmada"
I1012 07:47:31.675937       1 planner.go:87] "Start execute the workflow" workflow=init karmada="karmada-system/karmada"
I1012 07:47:31.687851       1 crd.go:48] "[prepare-crds] Running prepare-crds task" karmada="karmada-system/karmada"
I1012 07:47:31.687866       1 crd.go:49] "[prepare-crds] Using crd folder" folder="/var/lib/karmada/1.6.0" karmada="karmada-system/karmada"
I1012 07:47:31.688168       1 crd.go:69] "[download-crds] Skip download crd yaml files, the crd tar exists on disk" karmada="karmada-system/karmada"
I1012 07:47:31.688186       1 crd.go:126] "[unpack] These crds yaml files have been decompressed in the path" path="/var/lib/karmada/1.6.0/crds" karmada="karmada-system/karmada"
I1012 07:47:31.688193       1 crd.go:129] "[unpack] Successfully unpacked crd tar" karmada="karmada-system/karmada"
I1012 07:47:31.690973       1 cert.go:53] "[certs] Successfully loaded certs form secret" secret="karmada-cert" karmada="karmada-system/karmada"
I1012 07:47:31.690989       1 cert.go:54] "[certs] Skip certs task, found previous certificates in secret" karmada="karmada-system/karmada"
I1012 07:47:31.690995       1 namespace.go:28] "[namespace] Running namespace task" karmada="karmada-system/karmada"
I1012 07:47:31.692876       1 upload.go:168] "[upload-certs] Running upload-certs task" karmada="karmada-system/karmada"
I1012 07:47:31.702091       1 upload.go:201] "[upload-KarmadaCert] Successfully uploaded karmada certs to secret" karmada="karmada-system/karmada"
I1012 07:47:31.709210       1 upload.go:235] "[upload-etcdCert] Successfully uploaded etcd certs to secret" karmada="karmada-system/karmada"
I1012 07:47:31.715711       1 upload.go:262] "[upload-webhookCert] Successfully uploaded webhook certs to secret" karmada="karmada-system/karmada"
I1012 07:47:31.715725       1 etcd.go:39] "[etcd] Running etcd task" karmada="karmada-system/karmada"
E1012 07:47:31.732013       1 planner.go:93] "failed to executed the workflow" err="failed to install etcd component, err: error when creating etcd client service, err: Service \"karmada-etcd\" is invalid: metadata.resourceVersion: Invalid value: \"\": must be specified for an update" workflow=init karmada="karmada-system/karmada"
I1012 07:47:31.739111       1 controller.go:51] "Finished syncing karmada" karmada="karmada-system/karmada" duration="63.313038ms"
E1012 07:47:31.739149       1 controller.go:324] "Reconciler error" err="failed to install etcd component, err: error when creating etcd client service, err: Service \"karmada-etcd\" is invalid: metadata.resourceVersion: Invalid value: \"\": must be specified for an update" controller="karmada" controllerGroup="operator.karmada.io" controllerKind="Karmada" Karmada="karmada-system/karmada" namespace="karmada-system" name="karmada" reconcileID=bf68d217-5e8e-4334-983a-3ee95245a2ec
I1012 07:47:52.220098       1 controller.go:49] "Started syncing karmada" karmada="karmada-system/karmada" startTime="2023-10-12 07:47:52.220075637 +0000 UTC m=+557.414316956"
I1012 07:47:52.220199       1 controller.go:84] "Reconciling karmada" name="karmada"
I1012 07:47:52.220225       1 planner.go:87] "Start execute the workflow" workflow=init karmada="karmada-system/karmada"
I1012 07:47:52.235017       1 crd.go:48] "[prepare-crds] Running prepare-crds task" karmada="karmada-system/karmada"
I1012 07:47:52.235034       1 crd.go:49] "[prepare-crds] Using crd folder" folder="/var/lib/karmada/1.6.0" karmada="karmada-system/karmada"
I1012 07:47:52.235348       1 crd.go:69] "[download-crds] Skip download crd yaml files, the crd tar exists on disk" karmada="karmada-system/karmada"
I1012 07:47:52.235368       1 crd.go:126] "[unpack] These crds yaml files have been decompressed in the path" path="/var/lib/karmada/1.6.0/crds" karmada="karmada-system/karmada"
I1012 07:47:52.235376       1 crd.go:129] "[unpack] Successfully unpacked crd tar" karmada="karmada-system/karmada"
I1012 07:47:52.237969       1 cert.go:53] "[certs] Successfully loaded certs form secret" secret="karmada-cert" karmada="karmada-system/karmada"
I1012 07:47:52.237984       1 cert.go:54] "[certs] Skip certs task, found previous certificates in secret" karmada="karmada-system/karmada"
I1012 07:47:52.237991       1 namespace.go:28] "[namespace] Running namespace task" karmada="karmada-system/karmada"
I1012 07:47:52.239915       1 upload.go:168] "[upload-certs] Running upload-certs task" karmada="karmada-system/karmada"
I1012 07:47:52.255555       1 upload.go:201] "[upload-KarmadaCert] Successfully uploaded karmada certs to secret" karmada="karmada-system/karmada"
I1012 07:47:52.262936       1 upload.go:235] "[upload-etcdCert] Successfully uploaded etcd certs to secret" karmada="karmada-system/karmada"
I1012 07:47:52.268960       1 upload.go:262] "[upload-webhookCert] Successfully uploaded webhook certs to secret" karmada="karmada-system/karmada"
I1012 07:47:52.268972       1 etcd.go:39] "[etcd] Running etcd task" karmada="karmada-system/karmada"
E1012 07:47:52.285762       1 planner.go:93] "failed to executed the workflow" err="failed to install etcd component, err: error when creating etcd client service, err: Service \"karmada-etcd\" is invalid: metadata.resourceVersion: Invalid value: \"\": must be specified for an update" workflow=init karmada="karmada-system/karmada"
I1012 07:47:52.294515       1 controller.go:51] "Finished syncing karmada" karmada="karmada-system/karmada" duration="74.429307ms"
E1012 07:47:52.294557       1 controller.go:324] "Reconciler error" err="failed to install etcd component, err: error when creating etcd client service, err: Service \"karmada-etcd\" is invalid: metadata.resourceVersion: Invalid value: \"\": must be specified for an update" controller="karmada" controllerGroup="operator.karmada.io" controllerKind="Karmada" Karmada="karmada-system/karmada" namespace="karmada-system" name="karmada" reconcileID=1e552e68-6720-4670-a911-10bfd9ee32d7
I1012 07:47:56.343634       1 reflector.go:788] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: Watch close - *v1alpha1.Karmada total 105 items received
liangyuanpeng commented 6 months ago

Could you please provide some steps for install operator and apply karmada CR? @MolisXYliu ayn yaml would be great.

MolisXYliu commented 6 months ago

Could you please provide some steps for install operator and apply karmada CR? @MolisXYliu ayn yaml would be great.

the operatoe version is latest the karmada yaml is

apiVersion: operator.karmada.io/v1alpha1
kind: Karmada
metadata:
  name: karmada
  namespace: karmada-system
spec:
  components:
    etcd:
      local:
        imageRepository: xxx/etcd
        imageTag: 3.5.9-0
    karmadaAPIServer:
      imageRepository: xxx/kube-apiserver
      imageTag: v1.25.4
      serviceType: NodePort
    karmadaAggregatedAPIServer:
      imageRepository: xxx/karmada-aggregated-apiserver
      imageTag: v1.7.0
    karmadaControllerManager:
      imageRepository: xxx/karmada-controller-manager
      imageTag: v1.7.0
    karmadaScheduler:
      imageRepository: xxx/karmada-scheduler
      imageTag: v1.7.0
    karmadaWebhook:
      imageRepository: xxx/karmada-webhook
      imageTag: v1.7.0
    kubeControllerManager:
      imageRepository: xxx/kube-controller-manager
      imageTag: v1.25.4

the env is offline and i cp the crds in /var/lib/karmada/1.6.0

RainbowMango commented 6 months ago

I1012 07:47:31.688186 1 crd.go:126] "[unpack] These crds yaml files have been decompressed in the path" path="/var/lib/karmada/1.6.0/crds" karmada="karmada-system/karmada"

It doesn't seem right for the operator@v1.7 to read and use CRDs of v1.6.

I can see the default Karmada version is still v1.6.0 on the master. https://github.com/karmada-io/karmada/blob/master/operator/pkg/constants/constants.go#L18

cc @calvin0327 I think we should update it to v1.7.0 now. Can you confirm if I missed anything?

RainbowMango commented 6 months ago

@MolisXYliu Would you like to send a PR for it? You can find an example from https://github.com/karmada-io/karmada/pull/3718.

MolisXYliu commented 6 months ago

@MolisXYliu Would you like to send a PR for it? You can find an example from #3718.

i send the pr https://github.com/karmada-io/karmada/pull/4127 is the reason why operator can not deploy karmada v1.7.0?

RainbowMango commented 6 months ago

I'm not sure if this is the only reason, can you have a try after #4127? (Note that, you might need to update operator version to latest, instead of v1.7.0).

I don't have a clue about another suspicious log yet:

E1012 07:47:21.428138 1 planner.go:93] "failed to executed the workflow" err="failed to install etcd component, err: error when creating etcd client service, err: Service \"karmada-etcd\" is invalid: metadata.resourceVersion: Invalid value: \"\": must be specified for an update" workflow=init karmada="karmada-system/karmada"

liangyuanpeng commented 6 months ago

is the reason why operator can not deploy karmada v1.7.0?

I have test at yesterday and it's working for karmada v1.7.0,The difference is that I am not in an offline environment.

liangyuanpeng commented 6 months ago

i cp the crds in /var/lib/karmada/1.6.0

Try to using the crd of 1.7.0

chaosi-zju commented 6 months ago

I think the above guess may be not the reason, refer to the karmada.yaml which @MolisXYliu provided, the image version is already v1.7.0.

I have tested in my lcoal env, it installed ok.

Can you try this:

helm install karmada-operator -n karmada-system  --create-namespace --dependency-update ./charts/karmada-operator --debug

kubectl apply -f https://raw.githubusercontent.com/karmada-io/karmada/release-1.7/operator/config/crds/operator.karmada.io_karmadas.yaml

kubectl apply -f karmada.yaml

while the karmada.yaml just like this:

apiVersion: operator.karmada.io/v1alpha1
kind: Karmada
metadata:
  name: karmada
  namespace: karmada-system
spec:
  components:
    etcd:
      local:
        imageRepository: registry.k8s.io/etcd
        imageTag: 3.5.9-0
    karmadaAPIServer:
      imageRepository: registry.k8s.io/kube-apiserver
      imageTag: v1.25.4
      serviceType: NodePort
    karmadaAggregatedAPIServer:
      imageRepository: docker.io/karmada/karmada-aggregated-apiserver
      imageTag: v1.7.0
    karmadaControllerManager:
      imageRepository: docker.io/karmada/karmada-controller-manager
      imageTag: v1.7.0
    karmadaScheduler:
      imageRepository: docker.io/karmada/karmada-scheduler
      imageTag: v1.7.0
    karmadaWebhook:
      imageRepository: docker.io/karmada/karmada-webhook
      imageTag: v1.7.0
    kubeControllerManager:
      imageRepository: registry.k8s.io/kube-controller-manager
      imageTag: v1.25.4
  hostCluster:
    networking:
      dnsDomain: cluster.local
chaosi-zju commented 6 months ago

i cp the crds in /var/lib/karmada/1.6.0

Try to using the crd of 1.7.0

Yes, I think the reason is the crd version is 1.6.0 caused, "try using the crd of 1.7.0" +1

MolisXYliu commented 6 months ago

i cp the crds in /var/lib/karmada/1.6.0

Try to using the crd of 1.7.0

Yes, I think the reason is the crd version is 1.6.0 caused, "try using the crd of 1.7.0" +1

the crd version is 1.7.0 the operator read crd path is /var/lib/karmada/1.6.0 so i put the 1.7.0crd in this path

MolisXYliu commented 6 months ago
I1012 07:47:31.687866       1 crd.go:49] "[prepare-crds] Using crd folder" folder="/var/lib/karmada/1.6.0" karmada="karmada-system/karmada"
I1012 07:47:31.688168       1 crd.go:69] "[download-crds] Skip download crd yaml files, the crd tar exists on disk" karmada="karmada-system/karmada"

I1012 07:47:31.687866 1 crd.go:49] "[prepare-crds] Using crd folder" folder="/var/lib/karmada/1.6.0" karmada="karmada-system/karmada" I1012 07:47:31.688168 1 crd.go:69] "[download-crds] Skip download crd yaml files, the crd tar exists on disk" karmada="karmada-system/karmada"

the operator read crd path is"/var/lib/karmada/1.6.0 so i put the 1.7.0crd in this path

chaosi-zju commented 6 months ago

helm install karmada-operator -n karmada-system --create-namespace --dependency-update ./charts/karmada-operator --debug

kubectl apply -f https://raw.githubusercontent.com/karmada-io/karmada/release-1.7/operator/config/crds/operator.karmada.io_karmadas.yaml

kubectl apply -f karmada.yaml

I have tested this way which described in above comments for twice, it is ok, can you have a try?

image

calvin0327 commented 6 months ago

@MolisXYliu It looks like an error was thrown when creating the etcd service.

MolisXYliu commented 6 months ago

if i put the crd in /var/lib/karmada/1.7.0 the operator logs is

I1013 06:16:12.393055       1 planner.go:87] "Start execute the workflow" workflow=init karmada="karmada-system/karmada"
I1013 06:16:12.505169       1 crd.go:48] "[prepare-crds] Running prepare-crds task" karmada="karmada-system/karmada"
I1013 06:16:12.505192       1 crd.go:49] "[prepare-crds] Using crd folder" folder="/var/lib/karmada/1.6.0" karmada="karmada-system/karmada"
E1013 06:16:14.514554       1 planner.go:93] "failed to executed the workflow" err="failed to download crd tar, err: Get \"https://github.com/karmada-io/karmada/releases/download/v1.6.0/crds.tar.gz\": dial tcp: lookup github.com on 172.16.0.3:53: server misbehaving" workflow=init karmada="karmada-system/karmada"

when i put crd in 1.6.0 the operator log is

Finished syncing karmada" karmada="karmada-system/karmada" duration="70.10357ms"
E1013 06:20:46.289053       1 controller.go:324] "Reconciler error" err="failed to install etcd component, err: error when creating etcd client service, err: Service \"karmada-etcd\" is invalid: metadata.resourceVersion: Invalid value: \"\": must be specified for an update" controller="karmada" controllerGroup="operator.karmada.io" controllerKind="Karmada" Karmada="karmada-system/karmada" namespace="karmada-system" name="karmada" reconcileID=b8f099e9-3d27-4177-9a71-7442376a179d
I1013 06:20:48.849268       1 controller.go:49] "Started syncing karmada" karmada="karmada-system/karmada" startTime="2023-10-13 06:20:48.849238012 +0000 UTC m=+403.966856303"
I1013 06:20:48.849391       1 controller.go:84] "Reconciling karmada" name="karmada"
MolisXYliu commented 6 months ago

@MolisXYliu It looks like an error was thrown when creating the etcd service.

yes but why it happen ? is my env have some problems?

MolisXYliu commented 6 months ago

@MolisXYliu It looks like an error was thrown when creating the etcd service.

yes but why it happen ? is my env have some problems?

and if i use old version operator and crds the errors disapper

RainbowMango commented 6 months ago

E1013 06:16:14.514554 1 planner.go:93] "failed to executed the workflow" err="failed to download crd tar, err: Get \"https://github.com/karmada-io/karmada/releases/download/v1.6.0/crds.tar.gz\": dial tcp: lookup github.com on 172.16.0.3:53: server misbehaving" workflow=init karmada="karmada-system/karmada"

This log clearly shows that the karmada-operator can not download the crds. Does the operator support runs in offline environment?

MolisXYliu commented 6 months ago

E1013 06:16:14.514554 1 planner.go:93] "failed to executed the workflow" err="failed to download crd tar, err: Get "https://github.com/karmada-io/karmada/releases/download/v1.6.0/crds.tar.gz\": dial tcp: lookup github.com on 172.16.0.3:53: server misbehaving" workflow=init karmada="karmada-system/karmada"

This log clearly shows that the karmada-operator can not download the crds. Does the operator support runs in offline environment?

yes offline environment so i copy the crds in this path and the error is

I1013 06:41:40.613207       1 etcd.go:39] "[etcd] Running etcd task" karmada="karmada-system/karmada"
E1013 06:41:40.628679       1 planner.go:93] "failed to executed the workflow" err="failed to install etcd component, err: error when creating etcd client service, err: Service \"karmada-etcd\" is invalid: metadata.resourceVersion: Invalid value: \"\": must be specified for an update" workflow=init karmada="karmada-system/karmada"

the k8s version is v1.20.7

liangyuanpeng commented 6 months ago

Could you please try to run with other k8s version? like v1.26.0/v1.27.0/v1.28.0 @MolisXYliu

liangyuanpeng commented 6 months ago

and if i use old version operator and crds the errors disapper

What's the old version operator?

MolisXYliu commented 6 months ago

and if i use old version operator and crds the errors disapper

What's the old version operator?

the 1.6.0 operator do not have the etcd service error

chaosi-zju commented 6 months ago

yes offline environment so i copy the crds in this path and the error is

Hi @MolisXYliu, I'm curious about a question, though it may not be the root cause:

the operator will download the crds to its container path (if exist then skip), I mean the /var/lib/karmada/1.6.0 shall be the path inner operator pod, I wanna your "replace crds operation" is inner pod?

RainbowMango commented 6 months ago

yes offline environment so i copy the crds in this path and the error is

Please stop doing that until the author @calvin0327 confirms if it is the right way to do so.

MolisXYliu commented 6 months ago

yes offline environment so i copy the crds in this path and the error is

Hi @MolisXYliu, I'm curious about a question, though it may not be the root cause:

the operator will download the crds to its container path (if exist then skip), I mean the /var/lib/karmada/1.6.0 shall be the path inner operator pod, I wanna your "replace crds operation" is inner pod?

yes i copy crds in pod

chaosi-zju commented 6 months ago

Hi @MolisXYliu, I find that v1.7.0 karmada-operator now may have some bugs, we can not correctly apply v1.7.0 crds, I am working on it right now~

since v1.7.0 version crds' path is different:

image

If goes smoothly, after my merge, you will not use this hacky way to replace crds, I am hurry on it~

liangyuanpeng commented 6 months ago

I find that v1.7.0 karmada-operator now may have some bugs, we can not correctly apply v1.7.0 crds

@chaosi-zju I have a PR is work for it, https://github.com/karmada-io/karmada/pull/4130 and this is not the reason of this issue.

more info:

Just using k8s v1.23.0 or higher and the latest operator is working for you @MolisXYliu

The operator say the log of :

E1012 07:47:21.428138 1 planner.go:93] "failed to executed the workflow" err="failed to install etcd component, err: error when creating etcd client service, err: Service "karmada-etcd" is invalid: metadata.resourceVersion: Invalid value: "": must be specified for an update" workflow=init karmada="karmada-system/karmada"

and it happen on https://github.com/karmada-io/karmada/blob/f1f1a82dc73ae3828971fbdaf763c087f87f1291/operator/pkg/util/apiclient/idempotency.go#L78-L79

Have not checkout what's the reason for this change.

chaosi-zju commented 6 months ago

Hi @liangyuanpeng, you did great job~

Basides, I have looked your PR #4130, I see you changed the const default version to 1.7.0, while, when we upgrade to 1.8.0 or higher, we also have to manually submit this PR, so I have another way to fix this problem in #4133

I think we can bind default karmada image version to operator itself's version, I mean, if you want to install v1.7.0 verison karmada, use v1.7.0 version karmada-operator.

besides, we discovered the crds path issue at the same time, the my PR involves this fix too

calvin0327 commented 6 months ago

karmada/operator/pkg/util/apiclient/idempotency.go

@liangyuanpeng @chaosi-zju Thanks a lot for finding the useful err message. Based on the err message, whether we should specify resourceVersion when updating the etcd service?