vmware-samples / cloud-native-storage-self-service-manager

Cloud Native Storage (CNS) Manager is a diagnostic and self-service tool that helps detect and auto-remediate some of the known issues in storage control plane.
Apache License 2.0
14 stars 4 forks source link

the server could not find the requested resource #22

Open hc2p opened 1 year ago

hc2p commented 1 year ago

Describe the bug

I used the /datastoreresources endpoint to find the fcdId of the volume I want to migrate.

{
    "datacenter": "MyDatacenter",
    "datastore": "SourceDatastore",
    "totalVolumes": 132,
    "containerVolumes": [
        {
            "fcdId": "287e1630-26f6-419e-a375-b2bdba724710",
            "fcdName": "pvc-9d6aa201-2bae-4b96-91f7-9a48d574b98d",
            "attachmentDetails": {
                "attached": true,
                "vm": "staging-master-pool-7ccf8d9669-97m66"
            },
            "host": "10.165.32.191"
        },

When sending the following request I get the error below:

curl --location --request POST 'http://10.166.153.15:30008/1.0.0/migratevolumes?datacenter=MyDatacenter&targetDatastore=TargetDatastore&fcdIdsToMigrate=287e1630-26f6-419e-a375-b2bdba724710' \
--header 'Accept: application/json' \
--header 'Authorization: Basic XYZ' \
--data-raw ''
{
    "message": "failed to migrate volumes.",
    "error": "the server could not find the requested resource"
}

Reproduction steps

  1. Post to /migratevolumes? with fcdId from /datastoreresources

Expected behavior

Should return job-id

Additional context

No response

gohilankit commented 1 year ago

Can you paste the cns-manager container logs for volume migration?

hc2p commented 1 year ago
2023-07-06T08:35:39.617Z    INFO    AuthHandler go/logger.go:40 HTTP request header dump    {"Header": {"Accept":["application/json"],"Accept-Encoding":["gzip, deflate"],"Authorization":["Basic XYZ"],"Connection":["close"],"Content-Length":["0"],"Postman-Token":["403600d6-bc5a-45ea-bbce-16d223006367"],"User-Agent":["PostmanRuntime/7.30.0"]}}
2023-07-06T08:35:39.674Z    INFO    client/client.go:131    New session ID for 'service-account' = 5251f4cb-54ac-4571-d51b-ae195731d17c
2023-07-06T08:35:39.682Z    INFO    MigrateVolumes  volume/cnsops.go:43 Querying volumes with offset and limit  {"TraceId": "92df8cff-8b4c-4c76-a440-2c930787c292", "Offset": 0, "Limit": 500}
2023-07-06T08:35:39.778Z    INFO    MigrateVolumes  volume/cnsops.go:68 Query volume result retrieved for all requested volumes {"TraceId": "92df8cff-8b4c-4c76-a440-2c930787c292"}
2023-07-06T08:35:39.778Z    INFO    MigrateVolumes  datastore/migrate_volumes.go:151    Container volume list returned from CNS to migrate  {"TraceId": "92df8cff-8b4c-4c76-a440-2c930787c292", "Volumes": ["287e1630-26f6-419e-a375-b2bdba724710"]}
2023-07-06T08:35:39.781Z    INFO    MigrateVolumes  datastore/utils.go:222  Cluster/PV details for the volume   {"TraceId": "92df8cff-8b4c-4c76-a440-2c930787c292", "FcdId": "287e1630-26f6-419e-a375-b2bdba724710", "ClusterId": "demo", "PVName": "pvc-9d6aa201-2bae-4b96-91f7-9a48d574b98d"}
W0706 08:35:39.782001       1 client_config.go:615] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2023-07-06T08:35:39.805Z    ERROR   MigrateVolumes  datastore/utils.go:393  unable to list CSINodeTopologis for cluster {"TraceId": "92df8cff-8b4c-4c76-a440-2c930787c292", "clusterID": "demo", "error": "the server could not find the requested resource"}
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/pkg/datastore.enrichClusterComponents
    /go/src/pkg/datastore/utils.go:393
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/pkg/datastore.checkVolumeAccessibilityOnTargetDatastore
    /go/src/pkg/datastore/utils.go:241
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/pkg/datastore.MigrateVolumes
    /go/src/pkg/datastore/migrate_volumes.go:156
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/go.MigrateVolumes.func1
    /go/src/go/api_datastore_operations.go:225
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/go.MigrateVolumes
    /go/src/go/api_datastore_operations.go:239
net/http.HandlerFunc.ServeHTTP
    /usr/local/go/src/net/http/server.go:2047
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/go.AuthHandler.func1
    /go/src/go/logger.go:70
net/http.HandlerFunc.ServeHTTP
    /usr/local/go/src/net/http/server.go:2047
github.com/gorilla/mux.(*Router).ServeHTTP
    /go/pkg/mod/github.com/gorilla/mux@v1.8.0/mux.go:210
net/http.serverHandler.ServeHTTP
    /usr/local/go/src/net/http/server.go:2879
net/http.(*conn).serve
    /usr/local/go/src/net/http/server.go:1930
2023-07-06T08:35:39.805Z    ERROR   MigrateVolumes  datastore/utils.go:243  Failed to enrich cluster components {"TraceId": "92df8cff-8b4c-4c76-a440-2c930787c292", "error": "the server could not find the requested resource"}
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/pkg/datastore.checkVolumeAccessibilityOnTargetDatastore
    /go/src/pkg/datastore/utils.go:243
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/pkg/datastore.MigrateVolumes
    /go/src/pkg/datastore/migrate_volumes.go:156
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/go.MigrateVolumes.func1
    /go/src/go/api_datastore_operations.go:225
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/go.MigrateVolumes
    /go/src/go/api_datastore_operations.go:239
net/http.HandlerFunc.ServeHTTP
    /usr/local/go/src/net/http/server.go:2047
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/go.AuthHandler.func1
    /go/src/go/logger.go:70
net/http.HandlerFunc.ServeHTTP
    /usr/local/go/src/net/http/server.go:2047
github.com/gorilla/mux.(*Router).ServeHTTP
    /go/pkg/mod/github.com/gorilla/mux@v1.8.0/mux.go:210
net/http.serverHandler.ServeHTTP
    /usr/local/go/src/net/http/server.go:2879
net/http.(*conn).serve
    /usr/local/go/src/net/http/server.go:1930
2023-07-06T08:35:39.805Z    ERROR   MigrateVolumes  go/error_utils.go:29    failed to migrate volumes.  {"TraceId": "92df8cff-8b4c-4c76-a440-2c930787c292", "error": "the server could not find the requested resource"}
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/go.WriteError
    /go/src/go/error_utils.go:29
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/go.MigrateVolumes.func1
    /go/src/go/api_datastore_operations.go:227
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/go.MigrateVolumes
    /go/src/go/api_datastore_operations.go:239
net/http.HandlerFunc.ServeHTTP
    /usr/local/go/src/net/http/server.go:2047
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/go.AuthHandler.func1
    /go/src/go/logger.go:70
net/http.HandlerFunc.ServeHTTP
    /usr/local/go/src/net/http/server.go:2047
github.com/gorilla/mux.(*Router).ServeHTTP
    /go/pkg/mod/github.com/gorilla/mux@v1.8.0/mux.go:210
net/http.serverHandler.ServeHTTP
    /usr/local/go/src/net/http/server.go:2879
net/http.(*conn).serve
    /usr/local/go/src/net/http/server.go:1930
2023-07-06T08:35:39.805Z    INFO    AuthHandler go/logger.go:71 POST /1.0.0/migratevolumes?datacenter=MyDatacenter&targetDatastore=TargetDatastore&fcdIdsToMigrate=287e1630-26f6-419e-a375-b2bdba724710 MigrateVolumes 187.796207ms
gohilankit commented 1 year ago

2023-07-06T08:35:39.805Z ERROR MigrateVolumes datastore/utils.go:393 unable to list CSINodeTopologis for cluster {"TraceId": "92df8cff-8b4c-4c76-a440-2c930787c292", "clusterID": "demo", "error": "the server could not find the requested resource"}

Can you list and see if you have any csinodetopology object in your source volume k8s cluster? The output should be something like this:

> kubectl get csinodetopologies

NAME                         AGE
k8s-control-418-1685678575   34d
k8s-control-572-1685678615   34d
k8s-control-830-1685678594   34d
k8s-node-174-1685678656      34d
k8s-node-366-1685678635      34d
k8s-node-873-1685678675      34d

In your setup, CNS manager is unable to find the CSINodeTopology CR that vSphere CSI driver creates during node discovery which contains the topology labels for a CSI node. CNS manager relies on it to get the topology information for k8s cluster nodes.
What's the kubernetes distribution you're using, is it a packaged distribution like Openshift or Anthos? It would be good to know that along with the CSI driver version.

ccleouf66 commented 1 year ago

Hey same issue here.

The cluster was deployed with Rancher. The "add-on" section during the cluster deployment in rancher with vsphere node driver was used to deployed CSI storage plugin.

When I try to get csinodetopologies their is no resources :/

kubectl get csinodetopologies
error: the server doesn't have a resource type "csinodetopologies"

It's possible to create it post CSI deployment ?

rancher version : v2.6.8

node CSI
rancher/mirrored-sig-storage-csi-node-driver-registrar:v2.5.1
rancher/mirrored-cloud-provider-vsphere-csi-release-driver:v2.6.2
rancher/mirrored-sig-storage-livenessprobe:v2.7.0

CSI controller
rancher/mirrored-sig-storage-csi-provisioner:v3.2.1
rancher/mirrored-cloud-provider-vsphere-csi-release-syncer:v2.6.2
rancher/mirrored-sig-storage-livenessprobe:v2.7.0
rancher/mirrored-cloud-provider-vsphere-csi-release-driver:v2.6.2
rancher/mirrored-sig-storage-csi-attacher:v3.4.0
k8s.gcr.io/sig-storage/csi-snapshotter:v4.1.1

CCM
rancher/mirrored-cloud-provider-vsphere-cpi-release-manager:v1.24.2

Thanks :)

ccleouf66 commented 1 year ago

Hey,

I updated the kube cluster and also updated CSI config (through rancher add-on)

Options used for the CSI: image

Now I can submit new volume migration jobs but I have an issue with the x509 for the cluster registered to the cns-manager. During the registration I try to use kubeconfig file with: insecure-skip-tls-verify: true option but same error.

apiVersion: v1
kind: Config
clusters:
- cluster:
    insecure-skip-tls-verify: true
    server: https://xxx
  name: cnsmgr-cluster
...
kubectl get volumemigrationjobs.cnsmanager.cns.vmware.com -n cns-manager
NAME                                                      AGE
volumemigrationjob-07d2b76e-4d95-11ee-8a33-764b5b0165e1   36m

The error returned by the cns-manager:

2023-09-07T16:17:54.704Z    INFO    VolumeMigrationJobController.controller.volumemigrationjob-controller   volumemigrationjob/volumemigrationjob_controller.go:175 Reconciling volumemigrationjob  {"name": "volumemigrationjob-07d2b76e-4d95-11ee-8a33-764b5b0165e1", "namespace": "cns-manager", "TraceId": "2a2c5d19-2451-40b8-b212-3e7991e9d332"}
2023-09-07T16:17:54.713Z    ERROR   VolumeMigrationJobController.controller.volumemigrationjob-controller   volumemigrationjob/volumemigrationjob_controller.go:216 failed to get volume migration tasks for the job    {"name": "volumemigrationjob-07d2b76e-4d95-11ee-8a33-764b5b0165e1", "namespace": "cns-manager", "TraceId": "2a2c5d19-2451-40b8-b212-3e7991e9d332", "error": "Get \"https://rancher.proserv.ovh/k8s/clusters/c-m-bbhf5vqm/apis/cnsmanager.cns.vmware.com/v1alpha1/namespaces/cns-manager/volumemigrationtasks?labelSelector=volume-migration-job%3Dvolumemigrationjob-07d2b76e-4d95-11ee-8a33-764b5b0165e1\": x509: certificate signed by unknown authority"}
gitlab.eng.vmware.com/calatrava/storage-sre/cns-manager/pkg/cnsoperator/controller/volumemigrationjob.(*ReconcileVolumeMigrationJob).Reconcile
    /go/src/pkg/cnsoperator/controller/volumemigrationjob/volumemigrationjob_controller.go:216
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.3/pkg/internal/controller/controller.go:298
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.3/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.3/pkg/internal/controller/controller.go:214

If you have any idea, Thanks

ccleouf66 commented 1 year ago

Hey,

Just for tracking, if that can help someone.

I had this x509 issue because rancher act as auth proxy for my workload cluster. In my case rancher was deployed with cert-manager issuing ssl cert from letsencrypt. In this case, rancher don't manage ssl certs and so don't generate the certificate-authority-data field in the kubeconfig files for workload cluster.

If this field is not provided or if insecure-skip-tls-verify: true in kubeconfig is set, cns-manager return error x509: certificate signed by unknown authority"

After providing base64 encoded full-chain of the rancher server certificate in kubeconfig (in certificate-authority-data field) the volume migration job succeed.

Thanks for the help, kiki

gohilankit commented 1 year ago

@ccleouf66 We have not tested this with Rancher. Glad you were able to overcome different issues to make it run on Rancher.
IIUC from the screenshot you attached, enabling CSI topology plugin would have created csinodetopologies objects.

ccleouf66 commented 10 months ago

@gohilankit yes, to be honest, I don't remember witch one was not enabled before, but with this features enabled on rancher deploying the CSI plugin, csinodetopologieswas created.