Open hc2p opened 1 year ago
Can you paste the cns-manager container logs for volume migration?
2023-07-06T08:35:39.617Z INFO AuthHandler go/logger.go:40 HTTP request header dump {"Header": {"Accept":["application/json"],"Accept-Encoding":["gzip, deflate"],"Authorization":["Basic XYZ"],"Connection":["close"],"Content-Length":["0"],"Postman-Token":["403600d6-bc5a-45ea-bbce-16d223006367"],"User-Agent":["PostmanRuntime/7.30.0"]}}
2023-07-06T08:35:39.674Z INFO client/client.go:131 New session ID for 'service-account' = 5251f4cb-54ac-4571-d51b-ae195731d17c
2023-07-06T08:35:39.682Z INFO MigrateVolumes volume/cnsops.go:43 Querying volumes with offset and limit {"TraceId": "92df8cff-8b4c-4c76-a440-2c930787c292", "Offset": 0, "Limit": 500}
2023-07-06T08:35:39.778Z INFO MigrateVolumes volume/cnsops.go:68 Query volume result retrieved for all requested volumes {"TraceId": "92df8cff-8b4c-4c76-a440-2c930787c292"}
2023-07-06T08:35:39.778Z INFO MigrateVolumes datastore/migrate_volumes.go:151 Container volume list returned from CNS to migrate {"TraceId": "92df8cff-8b4c-4c76-a440-2c930787c292", "Volumes": ["287e1630-26f6-419e-a375-b2bdba724710"]}
2023-07-06T08:35:39.781Z INFO MigrateVolumes datastore/utils.go:222 Cluster/PV details for the volume {"TraceId": "92df8cff-8b4c-4c76-a440-2c930787c292", "FcdId": "287e1630-26f6-419e-a375-b2bdba724710", "ClusterId": "demo", "PVName": "pvc-9d6aa201-2bae-4b96-91f7-9a48d574b98d"}
W0706 08:35:39.782001 1 client_config.go:615] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2023-07-06T08:35:39.805Z ERROR MigrateVolumes datastore/utils.go:393 unable to list CSINodeTopologis for cluster {"TraceId": "92df8cff-8b4c-4c76-a440-2c930787c292", "clusterID": "demo", "error": "the server could not find the requested resource"}
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/pkg/datastore.enrichClusterComponents
/go/src/pkg/datastore/utils.go:393
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/pkg/datastore.checkVolumeAccessibilityOnTargetDatastore
/go/src/pkg/datastore/utils.go:241
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/pkg/datastore.MigrateVolumes
/go/src/pkg/datastore/migrate_volumes.go:156
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/go.MigrateVolumes.func1
/go/src/go/api_datastore_operations.go:225
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/go.MigrateVolumes
/go/src/go/api_datastore_operations.go:239
net/http.HandlerFunc.ServeHTTP
/usr/local/go/src/net/http/server.go:2047
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/go.AuthHandler.func1
/go/src/go/logger.go:70
net/http.HandlerFunc.ServeHTTP
/usr/local/go/src/net/http/server.go:2047
github.com/gorilla/mux.(*Router).ServeHTTP
/go/pkg/mod/github.com/gorilla/mux@v1.8.0/mux.go:210
net/http.serverHandler.ServeHTTP
/usr/local/go/src/net/http/server.go:2879
net/http.(*conn).serve
/usr/local/go/src/net/http/server.go:1930
2023-07-06T08:35:39.805Z ERROR MigrateVolumes datastore/utils.go:243 Failed to enrich cluster components {"TraceId": "92df8cff-8b4c-4c76-a440-2c930787c292", "error": "the server could not find the requested resource"}
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/pkg/datastore.checkVolumeAccessibilityOnTargetDatastore
/go/src/pkg/datastore/utils.go:243
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/pkg/datastore.MigrateVolumes
/go/src/pkg/datastore/migrate_volumes.go:156
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/go.MigrateVolumes.func1
/go/src/go/api_datastore_operations.go:225
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/go.MigrateVolumes
/go/src/go/api_datastore_operations.go:239
net/http.HandlerFunc.ServeHTTP
/usr/local/go/src/net/http/server.go:2047
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/go.AuthHandler.func1
/go/src/go/logger.go:70
net/http.HandlerFunc.ServeHTTP
/usr/local/go/src/net/http/server.go:2047
github.com/gorilla/mux.(*Router).ServeHTTP
/go/pkg/mod/github.com/gorilla/mux@v1.8.0/mux.go:210
net/http.serverHandler.ServeHTTP
/usr/local/go/src/net/http/server.go:2879
net/http.(*conn).serve
/usr/local/go/src/net/http/server.go:1930
2023-07-06T08:35:39.805Z ERROR MigrateVolumes go/error_utils.go:29 failed to migrate volumes. {"TraceId": "92df8cff-8b4c-4c76-a440-2c930787c292", "error": "the server could not find the requested resource"}
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/go.WriteError
/go/src/go/error_utils.go:29
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/go.MigrateVolumes.func1
/go/src/go/api_datastore_operations.go:227
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/go.MigrateVolumes
/go/src/go/api_datastore_operations.go:239
net/http.HandlerFunc.ServeHTTP
/usr/local/go/src/net/http/server.go:2047
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/go.AuthHandler.func1
/go/src/go/logger.go:70
net/http.HandlerFunc.ServeHTTP
/usr/local/go/src/net/http/server.go:2047
github.com/gorilla/mux.(*Router).ServeHTTP
/go/pkg/mod/github.com/gorilla/mux@v1.8.0/mux.go:210
net/http.serverHandler.ServeHTTP
/usr/local/go/src/net/http/server.go:2879
net/http.(*conn).serve
/usr/local/go/src/net/http/server.go:1930
2023-07-06T08:35:39.805Z INFO AuthHandler go/logger.go:71 POST /1.0.0/migratevolumes?datacenter=MyDatacenter&targetDatastore=TargetDatastore&fcdIdsToMigrate=287e1630-26f6-419e-a375-b2bdba724710 MigrateVolumes 187.796207ms
2023-07-06T08:35:39.805Z ERROR MigrateVolumes datastore/utils.go:393 unable to list CSINodeTopologis for cluster {"TraceId": "92df8cff-8b4c-4c76-a440-2c930787c292", "clusterID": "demo", "error": "the server could not find the requested resource"}
Can you list and see if you have any csinodetopology object in your source volume k8s cluster? The output should be something like this:
> kubectl get csinodetopologies
NAME AGE
k8s-control-418-1685678575 34d
k8s-control-572-1685678615 34d
k8s-control-830-1685678594 34d
k8s-node-174-1685678656 34d
k8s-node-366-1685678635 34d
k8s-node-873-1685678675 34d
In your setup, CNS manager is unable to find the CSINodeTopology CR that vSphere CSI driver creates during node discovery which contains the topology labels for a CSI node. CNS manager relies on it to get the topology information for k8s cluster nodes.
What's the kubernetes distribution you're using, is it a packaged distribution like Openshift or Anthos? It would be good to know that along with the CSI driver version.
Hey same issue here.
The cluster was deployed with Rancher. The "add-on" section during the cluster deployment in rancher with vsphere node driver was used to deployed CSI storage plugin.
When I try to get csinodetopologies their is no resources :/
kubectl get csinodetopologies
error: the server doesn't have a resource type "csinodetopologies"
It's possible to create it post CSI deployment ?
rancher version : v2.6.8
node CSI
rancher/mirrored-sig-storage-csi-node-driver-registrar:v2.5.1
rancher/mirrored-cloud-provider-vsphere-csi-release-driver:v2.6.2
rancher/mirrored-sig-storage-livenessprobe:v2.7.0
CSI controller
rancher/mirrored-sig-storage-csi-provisioner:v3.2.1
rancher/mirrored-cloud-provider-vsphere-csi-release-syncer:v2.6.2
rancher/mirrored-sig-storage-livenessprobe:v2.7.0
rancher/mirrored-cloud-provider-vsphere-csi-release-driver:v2.6.2
rancher/mirrored-sig-storage-csi-attacher:v3.4.0
k8s.gcr.io/sig-storage/csi-snapshotter:v4.1.1
CCM
rancher/mirrored-cloud-provider-vsphere-cpi-release-manager:v1.24.2
Thanks :)
Hey,
I updated the kube cluster and also updated CSI config (through rancher add-on)
Options used for the CSI:
Now I can submit new volume migration jobs but I have an issue with the x509 for the cluster registered to the cns-manager. During the registration I try to use kubeconfig file with: insecure-skip-tls-verify: true option but same error.
apiVersion: v1
kind: Config
clusters:
- cluster:
insecure-skip-tls-verify: true
server: https://xxx
name: cnsmgr-cluster
...
kubectl get volumemigrationjobs.cnsmanager.cns.vmware.com -n cns-manager
NAME AGE
volumemigrationjob-07d2b76e-4d95-11ee-8a33-764b5b0165e1 36m
The error returned by the cns-manager:
2023-09-07T16:17:54.704Z INFO VolumeMigrationJobController.controller.volumemigrationjob-controller volumemigrationjob/volumemigrationjob_controller.go:175 Reconciling volumemigrationjob {"name": "volumemigrationjob-07d2b76e-4d95-11ee-8a33-764b5b0165e1", "namespace": "cns-manager", "TraceId": "2a2c5d19-2451-40b8-b212-3e7991e9d332"}
2023-09-07T16:17:54.713Z ERROR VolumeMigrationJobController.controller.volumemigrationjob-controller volumemigrationjob/volumemigrationjob_controller.go:216 failed to get volume migration tasks for the job {"name": "volumemigrationjob-07d2b76e-4d95-11ee-8a33-764b5b0165e1", "namespace": "cns-manager", "TraceId": "2a2c5d19-2451-40b8-b212-3e7991e9d332", "error": "Get \"https://rancher.proserv.ovh/k8s/clusters/c-m-bbhf5vqm/apis/cnsmanager.cns.vmware.com/v1alpha1/namespaces/cns-manager/volumemigrationtasks?labelSelector=volume-migration-job%3Dvolumemigrationjob-07d2b76e-4d95-11ee-8a33-764b5b0165e1\": x509: certificate signed by unknown authority"}
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/pkg/cnsoperator/controller/volumemigrationjob.(*ReconcileVolumeMigrationJob).Reconcile
/go/src/pkg/cnsoperator/controller/volumemigrationjob/volumemigrationjob_controller.go:216
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.3/pkg/internal/controller/controller.go:298
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.3/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.3/pkg/internal/controller/controller.go:214
If you have any idea, Thanks
Hey,
Just for tracking, if that can help someone.
I had this x509 issue because rancher act as auth proxy for my workload cluster.
In my case rancher was deployed with cert-manager issuing ssl cert from letsencrypt.
In this case, rancher don't manage ssl certs and so don't generate the certificate-authority-data
field in the kubeconfig files for workload cluster.
If this field is not provided or if insecure-skip-tls-verify: true
in kubeconfig is set, cns-manager return error x509: certificate signed by unknown authority"
After providing base64 encoded full-chain of the rancher server certificate in kubeconfig (in certificate-authority-data field) the volume migration job succeed.
Thanks for the help, kiki
@ccleouf66 We have not tested this with Rancher. Glad you were able to overcome different issues to make it run on Rancher.
IIUC from the screenshot you attached, enabling CSI topology plugin would have created csinodetopologies
objects.
@gohilankit yes, to be honest, I don't remember witch one was not enabled before, but with this features enabled on rancher deploying the CSI plugin, csinodetopologies
was created.
Describe the bug
I used the
/datastoreresources
endpoint to find thefcdId
of the volume I want to migrate.When sending the following request I get the error below:
Reproduction steps
/migratevolumes?
with fcdId from/datastoreresources
Expected behavior
Should return job-id
Additional context
No response