kubernetes-sigs / azuredisk-csi-driver

Azure Disk CSI Driver
Apache License 2.0
147 stars 188 forks source link

Containerize sanity test and deploy it to an Azure cluster #283

Closed chewong closed 3 years ago

chewong commented 4 years ago

Is your feature request related to a problem?/Why is this needed

we hit a test failure when running StageVolume test since that StageVolume is invoked by a container on google cloud.

Describe the solution you'd like in detail

Containerize the sanity test and run it on an Azure cluster.

Describe alternatives you've considered

Additional context

/assign

chewong commented 4 years ago

@andyzhangx I tried to containerize the sanity test and ran it as a pod on an Azure cluster but should fail when volume does not exist on the specified path is still failing.

Node Service NodeGetVolumeStats
  should fail when volume does not exist on the specified path
  /go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/node.go:458
STEP: reusing connection to CSI driver at unix:///tmp/csi.sock
STEP: creating mount and staging directories
I0213 21:16:17.888545       8 utils.go:112] GRPC call: /csi.v1.Identity/GetPluginCapabilities
I0213 21:16:17.888555       8 utils.go:113] GRPC request:
I0213 21:16:17.888559       8 identityserver.go:59] Using default capabilities
I0213 21:16:17.888562       8 utils.go:119] GRPC response: capabilities:<service:<type:CONTROLLER_SERVICE > > capabilities:<service:<type:VOLUME_ACCESSIBILITY_CONSTRAINTS > >
I0213 21:16:17.888791       8 utils.go:112] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0213 21:16:17.888800       8 utils.go:113] GRPC request:
I0213 21:16:17.888803       8 controllerserver.go:448] Using default ControllerGetCapabilities
I0213 21:16:17.888809       8 utils.go:119] GRPC response: capabilities:<rpc:<type:CREATE_DELETE_VOLUME > > capabilities:<rpc:<type:PUBLISH_UNPUBLISH_VOLUME > > capabilities:<rpc:<type:CREATE_DELETE_SNAPSHOT > > capabilities:<rpc:<type:LIST_SNAPSHOTS > > capabilities:<rpc:<type:CLONE_VOLUME > > capabilities:<rpc:<type:EXPAND_VOLUME > >
I0213 21:16:17.889067       8 utils.go:112] GRPC call: /csi.v1.Node/NodeGetCapabilities
I0213 21:16:17.889084       8 utils.go:113] GRPC request:
I0213 21:16:17.889088       8 nodeserver.go:263] Using default NodeGetCapabilities
I0213 21:16:17.889173       8 utils.go:119] GRPC response: capabilities:<rpc:<type:STAGE_UNSTAGE_VOLUME > > capabilities:<rpc:<type:EXPAND_VOLUME > > capabilities:<rpc:<type:GET_VOLUME_STATS > >
I0213 21:16:17.889396       8 utils.go:112] GRPC call: /csi.v1.Node/NodeGetCapabilities
I0213 21:16:17.889403       8 utils.go:113] GRPC request:
I0213 21:16:17.889407       8 nodeserver.go:263] Using default NodeGetCapabilities
I0213 21:16:17.889410       8 utils.go:119] GRPC response: capabilities:<rpc:<type:STAGE_UNSTAGE_VOLUME > > capabilities:<rpc:<type:EXPAND_VOLUME > > capabilities:<rpc:<type:GET_VOLUME_STATS > >
STEP: creating a single node writer volume
I0213 21:16:17.889685       8 utils.go:112] GRPC call: /csi.v1.Controller/CreateVolume
I0213 21:16:17.889695       8 utils.go:113] GRPC request: name:"sanity-node-get-volume-stats-79E3D069-F597B0A6" volume_capabilities:<mount:<> access_mode:<mode:SINGLE_NODE_WRITER > >
I0213 21:16:17.907724       8 azure_diskclient.go:122] Received error in disk.get.request: resourceID: /subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6, error: Retriable: true, RetryAfter: 0s, HTTPStatusCode: 404, RawError: Retriable: true, RetryAfter: 0s, HTTPStatusCode: 404, RawError: {"error":{"code":"ResourceNotFound","message":"The Resource 'Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6' under resource group 'chuwon' was not found."}}
I0213 21:16:17.907761       8 controllerserver.go:177] begin to create azure disk(sanity-node-get-volume-stats-79E3D069-F597B0A6) account type(Standard_LRS) rg(chuwon) location() size(1) selectedAvailabilityZone()
I0213 21:16:17.907771       8 azure_managedDiskController.go:83] azureDisk - creating new managed Name:sanity-node-get-volume-stats-79E3D069-F597B0A6 StorageAccountType:Standard_LRS Size:1
I0213 21:16:18.334675       8 azure_managedDiskController.go:196] azureDisk - created new MD Name:sanity-node-get-volume-stats-79E3D069-F597B0A6 StorageAccountType:Standard_LRS Size:1
I0213 21:16:18.334702       8 controllerserver.go:256] create azure disk(sanity-node-get-volume-stats-79E3D069-F597B0A6) account type(Standard_LRS) rg(chuwon) location() size(1) successfully
I0213 21:16:18.334710       8 utils.go:119] GRPC response: volume:<volume_id:"/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6" content_source:<> accessible_topology:<segments:<key:"topology.disk.csi.azure.com/zone" value:"" > > >
STEP: getting a node id
I0213 21:16:18.335353       8 utils.go:112] GRPC call: /csi.v1.Node/NodeGetInfo
I0213 21:16:18.335437       8 utils.go:113] GRPC request:
I0213 21:16:18.335508       8 nodeserver.go:272] NodeGetInfo: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0213 21:16:18.335583       8 azure_zones.go:75] Availability zone is not enabled for the node, falling back to fault domain
I0213 21:16:18.335661       8 utils.go:119] GRPC response: node_id:"k8s-agentpool1-95854893-0" max_volumes_per_node:8 accessible_topology:<segments:<key:"topology.disk.csi.azure.com/zone" value:"" > >
I0213 21:16:18.336696       8 utils.go:112] GRPC call: /csi.v1.Controller/ControllerPublishVolume
I0213 21:16:18.336715       8 utils.go:113] GRPC request: volume_id:"/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6" node_id:"k8s-agentpool1-95854893-0" volume_capability:<mount:<> access_mode:<mode:SINGLE_NODE_WRITER > >
I0213 21:16:18.336722       8 controllerserver.go:321] ControllerPublishVolume: called with args {VolumeId:/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6 NodeId:k8s-agentpool1-95854893-0 VolumeCapability:mount:<> access_mode:<mode:SINGLE_NODE_WRITER >  Readonly:false Secrets:map[] VolumeContext:map[] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
STEP: controller publishing volume
I0213 21:16:18.354855       8 controllerserver.go:353] GetDiskLun returned: <nil>. Initiating attaching volume "/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6" to node "k8s-agentpool1-95854893-0".
I0213 21:16:18.486303       8 controllerserver.go:370] Trying to attach volume "/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6" to node "k8s-agentpool1-95854893-0"
I0213 21:16:18.507740       8 azure_controller_common.go:167] Trying to attach volume "/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6" lun 0 to node "k8s-agentpool1-95854893-0".
I0213 21:16:18.507835       8 azure_controller_standard.go:89] azureDisk - update(chuwon): vm(k8s-agentpool1-95854893-0) - attach disk(sanity-node-get-volume-stats-79E3D069-F597B0A6, /subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6) with DiskEncryptionSetID()
I0213 21:16:59.340816       8 azure_controller_standard.go:109] azureDisk - attach disk(sanity-node-get-volume-stats-79E3D069-F597B0A6, /subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6) succeeded
STEP: node staging volume
I0213 21:16:59.340947       8 controllerserver.go:376] Attach operation successful: volume "/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6" attached to node "k8s-agentpool1-95854893-0".
I0213 21:16:59.340970       8 controllerserver.go:381] attach volume "/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6" to node "k8s-agentpool1-95854893-0" successfully
I0213 21:16:59.340981       8 utils.go:119] GRPC response: publish_context:<key:"devicePath" value:"0" >
I0213 21:16:59.341818       8 utils.go:112] GRPC call: /csi.v1.Node/NodeStageVolume
I0213 21:16:59.341830       8 utils.go:113] GRPC request: volume_id:"/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6" publish_context:<key:"devicePath" value:"0" > staging_target_path:"/tmp/csi-staging" volume_capability:<mount:<> access_mode:<mode:SINGLE_NODE_WRITER > >
I0213 21:16:59.341837       8 nodeserver.go:63] NodeStageVolume: called with args {VolumeId:/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6 PublishContext:map[devicePath:0] StagingTargetPath:/tmp/csi-staging VolumeCapability:mount:<> access_mode:<mode:SINGLE_NODE_WRITER >  Secrets:map[] VolumeContext:map[] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0213 21:17:03.227044       8 azure_common_linux.go:171] azureDisk - found /devhost/disk/azure/scsi1/lun0 by sdc under /devhost/disk/azure/scsi1/
I0213 21:17:03.227083       8 nodeserver.go:146] NodeStageVolume: formatting /devhost/disk/azure/scsi1/lun0 and mounting at /tmp/csi-staging with mount options([])
I0213 21:17:03.227094       8 mount_linux.go:441] Checking for issues with fsck on disk: /devhost/disk/azure/scsi1/lun0
I0213 21:17:03.255634       8 mount_linux.go:454] `fsck` error fsck from util-linux 2.33.1
fsck.ext2: Bad magic number in super-block while trying to open /devhost/sdc
/devhost/sdc:
The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
 or
    e2fsck -b 32768 <device>

I0213 21:17:03.255652       8 mount_linux.go:460] Attempting to mount disk: ext4 /devhost/disk/azure/scsi1/lun0 /tmp/csi-staging
I0213 21:17:03.255662       8 mount_linux.go:142] Mounting cmd (mount) with arguments ([-t ext4 -o defaults /devhost/disk/azure/scsi1/lun0 /tmp/csi-staging])
E0213 21:17:03.270499       8 mount_linux.go:147] Mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t ext4 -o defaults /devhost/disk/azure/scsi1/lun0 /tmp/csi-staging
Output: mount: /tmp/csi-staging: wrong fs type, bad option, bad superblock on /devhost/sdc, missing codepage or helper program, or other error.

I0213 21:17:03.270566       8 mount_linux.go:515] Attempting to determine if disk "/devhost/disk/azure/scsi1/lun0" is formatted using blkid with args: ([-p -s TYPE -s PTTYPE -o export /devhost/disk/azure/scsi1/lun0])
I0213 21:17:03.286509       8 mount_linux.go:518] Output: "", err: exit status 2
I0213 21:17:03.286539       8 mount_linux.go:489] Disk "/devhost/disk/azure/scsi1/lun0" appears to be unformatted, attempting to format as type: "ext4" with options: [-F -m0 /devhost/disk/azure/scsi1/lun0]
I0213 21:17:03.902939       8 mount_linux.go:493] Disk successfully formatted (mkfs): ext4 - /devhost/disk/azure/scsi1/lun0 /tmp/csi-staging
I0213 21:17:03.902978       8 mount_linux.go:142] Mounting cmd (mount) with arguments ([-t ext4 -o defaults /devhost/disk/azure/scsi1/lun0 /tmp/csi-staging])
I0213 21:17:03.913552       8 nodeserver.go:152] NodeStageVolume: format /devhost/disk/azure/scsi1/lun0 and mounting at /tmp/csi-staging successfully.
I0213 21:17:03.913565       8 utils.go:119] GRPC response:
I0213 21:17:03.914073       8 utils.go:112] GRPC call: /csi.v1.Node/NodePublishVolume
I0213 21:17:03.914104       8 utils.go:113] GRPC request: volume_id:"/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6" publish_context:<key:"devicePath" value:"0" > staging_target_path:"/tmp/csi-staging" target_path:"/tmp/csi-mount/target" volume_capability:<mount:<> access_mode:<mode:SINGLE_NODE_WRITER > >
I0213 21:17:03.914111       8 nodeserver.go:182] NodePublishVolume: called with args {VolumeId:/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6 PublishContext:map[devicePath:0] StagingTargetPath:/tmp/csi-staging TargetPath:/tmp/csi-mount/target VolumeCapability:mount:<> access_mode:<mode:SINGLE_NODE_WRITER >  Readonly:false Secrets:map[] VolumeContext:map[] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0213 21:17:03.914255       8 nodeserver.go:227] NodePublishVolume: mounting /tmp/csi-staging at /tmp/csi-mount/target
I0213 21:17:03.914265       8 mount_linux.go:142] Mounting cmd (mount) with arguments ([-o bind /tmp/csi-staging /tmp/csi-mount/target])
STEP: publishing the volume on a node
I0213 21:17:03.916241       8 mount_linux.go:142] Mounting cmd (mount) with arguments ([-o bind,remount /tmp/csi-staging /tmp/csi-mount/target])
I0213 21:17:03.918335       8 nodeserver.go:234] NodePublishVolume: mount /tmp/csi-staging at /tmp/csi-mount/target successfully
I0213 21:17:03.918352       8 utils.go:119] GRPC response:
STEP: Get node volume stats
I0213 21:17:03.918820       8 utils.go:112] GRPC call: /csi.v1.Node/NodeGetVolumeStats
I0213 21:17:03.918829       8 utils.go:113] GRPC request: volume_id:"/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6" volume_path:"some/path"
E0213 21:17:03.949233       8 utils.go:117] GRPC error: rpc error: code = Internal desc = Failed to determine whether some/path is block device: path "some/path" does not exist
cleanup: deleting sanity-node-get-volume-stats-79E3D069-F597B0A6 = /subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6
I0213 21:17:03.949880       8 utils.go:112] GRPC call: /csi.v1.Node/NodeUnpublishVolume
I0213 21:17:03.949895       8 utils.go:113] GRPC request: volume_id:"/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6" target_path:"/tmp/csi-mount/target"
I0213 21:17:03.949900       8 nodeserver.go:241] NodeUnPublishVolume: called with args {VolumeId:/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6 TargetPath:/tmp/csi-mount/target XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0213 21:17:03.949909       8 nodeserver.go:251] NodeUnpublishVolume: unmounting volume /subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6 on /tmp/csi-mount/target
I0213 21:17:03.949913       8 mount_linux.go:209] Unmounting /tmp/csi-mount/target
I0213 21:17:03.970460       8 nodeserver.go:256] NodeUnpublishVolume: unmount volume /subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6 on /tmp/csi-mount/target successfully
I0213 21:17:03.970472       8 utils.go:119] GRPC response:
I0213 21:17:03.970678       8 utils.go:112] GRPC call: /csi.v1.Node/NodeUnstageVolume
I0213 21:17:03.970688       8 utils.go:113] GRPC request: volume_id:"/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6" staging_target_path:"/tmp/csi-staging"
I0213 21:17:03.970693       8 nodeserver.go:159] NodeUnstageVolume: called with args {VolumeId:/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6 StagingTargetPath:/tmp/csi-staging XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0213 21:17:03.970705       8 nodeserver.go:170] NodeUnstageVolume: unmounting /tmp/csi-staging
I0213 21:17:03.970708       8 mount_linux.go:209] Unmounting /tmp/csi-staging
I0213 21:17:03.990236       8 nodeserver.go:175] NodeUnstageVolume: unmount /tmp/csi-staging successfully
I0213 21:17:03.990281       8 utils.go:119] GRPC response:
I0213 21:17:03.990563       8 utils.go:112] GRPC call: /csi.v1.Controller/ControllerUnpublishVolume
I0213 21:17:03.990573       8 utils.go:113] GRPC request: volume_id:"/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6" node_id:"k8s-agentpool1-95854893-0"
I0213 21:17:03.990580       8 controllerserver.go:390] ControllerUnpublishVolume: called with args {VolumeId:/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6 NodeId:k8s-agentpool1-95854893-0 Secrets:map[] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0213 21:17:03.990605       8 controllerserver.go:407] Trying to detach volume /subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6 from node k8s-agentpool1-95854893-0
I0213 21:17:04.045693       8 azure_controller_common.go:192] detach /subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6 from node "k8s-agentpool1-95854893-0"
I0213 21:17:04.045723       8 azure_controller_standard.go:137] azureDisk - detach disk: name "sanity-node-get-volume-stats-79E3D069-F597B0A6" uri "/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6"
I0213 21:17:04.045732       8 azure_controller_standard.go:157] azureDisk - update(chuwon): vm(k8s-agentpool1-95854893-0) - detach disk(sanity-node-get-volume-stats-79E3D069-F597B0A6, /subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6)
I0213 21:17:24.510103       8 azure_controller_common.go:226] azureDisk - detach disk(sanity-node-get-volume-stats-79E3D069-F597B0A6, /subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6) succeeded
I0213 21:17:24.510121       8 controllerserver.go:418] detach volume /subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6 from node k8s-agentpool1-95854893-0 successfully
I0213 21:17:24.510128       8 utils.go:119] GRPC response:
I0213 21:17:24.510564       8 utils.go:112] GRPC call: /csi.v1.Controller/DeleteVolume
I0213 21:17:24.510585       8 utils.go:113] GRPC request: volume_id:"/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6"
I0213 21:17:24.510593       8 controllerserver.go:304] deleting azure disk(/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6)
I0213 21:17:29.654893       8 azure_managedDiskController.go:224] azureDisk - deleted a managed disk: /subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6
I0213 21:17:29.654914       8 controllerserver.go:314] delete azure disk(/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6) successfully
I0213 21:17:29.654922       8 utils.go:119] GRPC response:

• Failure [71.767 seconds]
Node Service
/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/tests.go:44
  NodeGetVolumeStats
  /go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/node.go:408
    should fail when volume does not exist on the specified path [It]
    /go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/node.go:458

    Expected
        <codes.Code>: 13
    to equal
        <codes.Code>: 5

    /go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/node.go:585

Test cases after running the above test case also failed due to:

  failed to create target directory
  Unexpected error:
      <*os.PathError | 0xc0004208d0>: {
          Op: "mkdir",
          Path: "/tmp/csi-mount",
          Err: 0x11,
      }
      mkdir /tmp/csi-mount: file exists
  occurred

I was wondering if you see anything wrong with the pod manifest or Dockerfile in https://github.com/kubernetes-sigs/azuredisk-csi-driver/compare/master...chewong:fix-sanity?expand=1

andyzhangx commented 4 years ago

great, I think we are close to make it work: @ZeroMagic looks like it's a bug in NodeGetVolumeStats feature:

I0213 21:17:03.918820       8 utils.go:112] GRPC call: /csi.v1.Node/NodeGetVolumeStats
I0213 21:17:03.918829       8 utils.go:113] GRPC request: volume_id:"/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-79E3D069-F597B0A6" volume_path:"some/path"
E0213 21:17:03.949233       8 utils.go:117] GRPC error: rpc error: code = Internal desc = Failed to determine whether some/path is block device: path "some/path" does not exist
ZeroMagic commented 4 years ago

yes, it is a bug. In this case, we need to return code 5.

Thanks, @chewong

chewong commented 4 years ago

@andyzhangx @ZeroMagic Looks like should return empty when the specified snapshot id does not exist does not work when the test is run inside a container. This should get triggered but it didn't. Raw error below is context canceled but it should be {"error":{"code":"ResourceNotFound",.... Also, I've never seen azure_retry.go being executed in previous sanity run. Any idea?

ListSnapshots [Controller Server]
  should return empty when the specified snapshot id does not exist
  /go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/controller.go:1667
STEP: reusing connection to CSI driver at unix:///tmp/csi.sock
STEP: creating mount and staging directories
I0214 15:06:20.421800       8 utils.go:112] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0214 15:06:20.421808       8 utils.go:113] GRPC request:
I0214 15:06:20.421813       8 controllerserver.go:448] Using default ControllerGetCapabilities
I0214 15:06:20.421865       8 utils.go:119] GRPC response: capabilities:<rpc:<type:CREATE_DELETE_VOLUME > > capabilities:<rpc:<type:PUBLISH_UNPUBLISH_VOLUME > > capabilities:<rpc:<type:CREATE_DELETE_SNAPSHOT > > capabilities:<rpc:<type:LIST_SNAPSHOTS > > capabilities:<rpc:<type:CLONE_VOLUME > > capabilities:<rpc:<type:EXPAND_VOLUME > >
I0214 15:06:20.422133       8 utils.go:112] GRPC call: /csi.v1.Controller/ListSnapshots
I0214 15:06:20.422143       8 utils.go:113] GRPC request: snapshot_id:"none-exist-id"
I0214 15:06:20.422147       8 controllerserver.go:598] ListSnapshots called with request {0   none-exist-id map[] {} [] 0}
I0214 15:06:25.766889       8 azure_retry.go:183] Backoff retrying GET "https://management.azure.com/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/snapshots/none-exist-id?api-version=2019-07-01" with error &{true 404 0001-01-01 00:00:00 +0000 UTC {"error":{"code":"ResourceNotFound","message":"The Resource 'Microsoft.Compute/snapshots/none-exist-id' under resource group 'chuwon' was not found."}}}
I0214 15:06:34.491193       8 azure_retry.go:183] Backoff retrying GET "https://management.azure.com/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/snapshots/none-exist-id?api-version=2019-07-01" with error &{true 404 0001-01-01 00:00:00 +0000 UTC {"error":{"code":"ResourceNotFound","message":"The Resource 'Microsoft.Compute/snapshots/none-exist-id' under resource group 'chuwon' was not found."}}}
I0214 15:06:46.848675       8 azure_retry.go:183] Backoff retrying GET "https://management.azure.com/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/snapshots/none-exist-id?api-version=2019-07-01" with error &{true 404 0001-01-01 00:00:00 +0000 UTC {"error":{"code":"ResourceNotFound","message":"The Resource 'Microsoft.Compute/snapshots/none-exist-id' under resource group 'chuwon' was not found."}}}
I0214 15:07:08.817252       8 azure_retry.go:183] Backoff retrying GET "https://management.azure.com/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/snapshots/none-exist-id?api-version=2019-07-01" with error &{true 404 0001-01-01 00:00:00 +0000 UTC {"error":{"code":"ResourceNotFound","message":"The Resource 'Microsoft.Compute/snapshots/none-exist-id' under resource group 'chuwon' was not found."}}}
I0214 15:07:38.467268       8 azure_snapshotclient.go:124] Received error in snapshot.get.request: resourceID: /subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/snapshots/none-exist-id, error: Retriable: true, RetryAfter: 0s, HTTPStatusCode: 404, RawError: context canceled
E0214 15:07:38.467311       8 utils.go:117] GRPC error: rpc error: code = Internal desc = get snapshot none-exist-id from rg(chuwon) error: Retriable: true, RetryAfter: 0s, HTTPStatusCode: 404, RawError: context canceled

• Failure [78.046 seconds]
ListSnapshots [Controller Server]
/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/tests.go:44
  should return empty when the specified snapshot id does not exist [It]
  /go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/controller.go:1667

  Unexpected error:
      <*status.statusError | 0xc00042b180>: {
          Code: 14,
          Message: "transport is closing",
          Details: nil,
          XXX_NoUnkeyedLiteral: {},
          XXX_unrecognized: nil,
          XXX_sizecache: 0,
      }
      rpc error: code = Unavailable desc = transport is closing
  occurred

  /go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/controller.go:1672
andyzhangx commented 4 years ago

@andyzhangx @ZeroMagic Looks like should return empty when the specified snapshot id does not exist does not work when the test is run inside a container. This should get triggered but it didn't. Raw error below is context canceled but it should be {"error":{"code":"ResourceNotFound",.... Also, I've never seen azure_retry.go being executed in previous sanity run. Any idea?

ListSnapshots [Controller Server]
  should return empty when the specified snapshot id does not exist
  /go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/controller.go:1667
STEP: reusing connection to CSI driver at unix:///tmp/csi.sock
STEP: creating mount and staging directories
I0214 15:06:20.421800       8 utils.go:112] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0214 15:06:20.421808       8 utils.go:113] GRPC request:
I0214 15:06:20.421813       8 controllerserver.go:448] Using default ControllerGetCapabilities
I0214 15:06:20.421865       8 utils.go:119] GRPC response: capabilities:<rpc:<type:CREATE_DELETE_VOLUME > > capabilities:<rpc:<type:PUBLISH_UNPUBLISH_VOLUME > > capabilities:<rpc:<type:CREATE_DELETE_SNAPSHOT > > capabilities:<rpc:<type:LIST_SNAPSHOTS > > capabilities:<rpc:<type:CLONE_VOLUME > > capabilities:<rpc:<type:EXPAND_VOLUME > >
I0214 15:06:20.422133       8 utils.go:112] GRPC call: /csi.v1.Controller/ListSnapshots
I0214 15:06:20.422143       8 utils.go:113] GRPC request: snapshot_id:"none-exist-id"
I0214 15:06:20.422147       8 controllerserver.go:598] ListSnapshots called with request {0   none-exist-id map[] {} [] 0}
I0214 15:06:25.766889       8 azure_retry.go:183] Backoff retrying GET "https://management.azure.com/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/snapshots/none-exist-id?api-version=2019-07-01" with error &{true 404 0001-01-01 00:00:00 +0000 UTC {"error":{"code":"ResourceNotFound","message":"The Resource 'Microsoft.Compute/snapshots/none-exist-id' under resource group 'chuwon' was not found."}}}
I0214 15:06:34.491193       8 azure_retry.go:183] Backoff retrying GET "https://management.azure.com/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/snapshots/none-exist-id?api-version=2019-07-01" with error &{true 404 0001-01-01 00:00:00 +0000 UTC {"error":{"code":"ResourceNotFound","message":"The Resource 'Microsoft.Compute/snapshots/none-exist-id' under resource group 'chuwon' was not found."}}}
I0214 15:06:46.848675       8 azure_retry.go:183] Backoff retrying GET "https://management.azure.com/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/snapshots/none-exist-id?api-version=2019-07-01" with error &{true 404 0001-01-01 00:00:00 +0000 UTC {"error":{"code":"ResourceNotFound","message":"The Resource 'Microsoft.Compute/snapshots/none-exist-id' under resource group 'chuwon' was not found."}}}
I0214 15:07:08.817252       8 azure_retry.go:183] Backoff retrying GET "https://management.azure.com/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/snapshots/none-exist-id?api-version=2019-07-01" with error &{true 404 0001-01-01 00:00:00 +0000 UTC {"error":{"code":"ResourceNotFound","message":"The Resource 'Microsoft.Compute/snapshots/none-exist-id' under resource group 'chuwon' was not found."}}}
I0214 15:07:38.467268       8 azure_snapshotclient.go:124] Received error in snapshot.get.request: resourceID: /subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/snapshots/none-exist-id, error: Retriable: true, RetryAfter: 0s, HTTPStatusCode: 404, RawError: context canceled
E0214 15:07:38.467311       8 utils.go:117] GRPC error: rpc error: code = Internal desc = get snapshot none-exist-id from rg(chuwon) error: Retriable: true, RetryAfter: 0s, HTTPStatusCode: 404, RawError: context canceled

• Failure [78.046 seconds]
ListSnapshots [Controller Server]
/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/tests.go:44
  should return empty when the specified snapshot id does not exist [It]
  /go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/controller.go:1667

  Unexpected error:
      <*status.statusError | 0xc00042b180>: {
          Code: 14,
          Message: "transport is closing",
          Details: nil,
          XXX_NoUnkeyedLiteral: {},
          XXX_unrecognized: nil,
          XXX_sizecache: 0,
      }
      rpc error: code = Unavailable desc = transport is closing
  occurred

  /go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/controller.go:1672

@chewong that's because the return error changed due to latest azure cloud provider client PR, I will fix that, you could track issue: https://github.com/kubernetes-sigs/azuredisk-csi-driver/issues/286

Thanks, we are very close to make it work. Just hold on for a while.

andyzhangx commented 4 years ago

@chewong could you use the latest master branch, I think that's due to Backoff retrying, and master branch should not have such issue.

chewong commented 4 years ago

@andyzhangx @ZeroMagic should fail when volume does not exist on the specified path has passed. However, it looks like it caused all the test cases after it to fail with the following error message:

CreateSnapshot [Controller Server]
  should not fail when creating snapshot with maximum-length name
  /go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/controller.go:2038
STEP: reusing connection to CSI driver at unix:///tmp/csi.sock
STEP: creating mount and staging directories

• Failure in Spec Setup (BeforeEach) [0.000 seconds]
CreateSnapshot [Controller Server]
/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/tests.go:44
  should not fail when creating snapshot with maximum-length name [BeforeEach]
  /go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/controller.go:2038

  failed to create target directory
  Unexpected error:
      <*os.PathError | 0xc00041fa10>: {
          Op: "mkdir",
          Path: "/tmp/csi-mount",
          Err: 0x11,
      }
      mkdir /tmp/csi-mount: file exists
  occurred

  /go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/sanity.go:234
andyzhangx commented 4 years ago

@chewong if failed here: https://github.com/kubernetes-csi/csi-test/blob/33de331fc3d2ab181f6d207863f0a85ae0b5d629/pkg/sanity/sanity.go#L280 looks like in the before, sanity test runs in container inside Google cloud, /tmp/csi-mount always not exists, while in your config, it always exists, did you specify volume mounting for /tmp path in your tests?

ZeroMagic commented 4 years ago

@chewong In csi-sanity tests, if it fails to invoke NodeUnpublishVolume, it will not delete the target directory /tmp/csi-mount/target. So the /tmp/csi-mount directory still retains in the filesystem. Ref: https://github.com/kubernetes-csi/csi-test/pull/225

I am not sure what test ran before this failure test. Maybe NodeUnpublishVolume failed to be invoked.

If you run in the local environment, you can delete the directory manually before another round test to solve the problem.

chewong commented 4 years ago

@ZeroMagic Thanks for the info. NodeUnpublishVolume did failed in should fail when volume does not exist on the specified path test case with the following error:

I0219 01:33:12.963421       9 nodeserver.go:237] NodeUnPublishVolume: called with args {VolumeId:/subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-0331AFFD-94238AF6 TargetPath:/tmp/csi-mount/target XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0219 01:33:12.963525       9 nodeserver.go:247] NodeUnpublishVolume: unmounting volume /subscriptions/2d31b5ab-0ddc-4991-bf8d-61b6ad196f5a/resourceGroups/chuwon/providers/Microsoft.Compute/disks/sanity-node-get-volume-stats-0331AFFD-94238AF6 on /tmp/csi-mount/target
I0219 01:33:12.963600       9 mount_linux.go:209] Unmounting /tmp/csi-mount/target
E0219 01:33:12.965082       9 utils.go:117] GRPC error: rpc error: code = Internal desc = Unmount failed: exit status 32
Unmounting arguments: /tmp/csi-mount/target
Output: umount: /tmp/csi-mount/target: not mounted.

cleanup: warning: NodeUnpublishVolume: rpc error: code = Internal desc = Unmount failed: exit status 32
Unmounting arguments: /tmp/csi-mount/target
Output: umount: /tmp/csi-mount/target: not mounted.

It looks like the test case did not throw an error when calling Expect(err).NotTo(HaveOccurred()). I think your PR would fix the issue. Here is the detailed log of the sanity run: sanity.log

ZeroMagic commented 4 years ago

@chewong I think the merged PR #304 should fix the issue. You can try it again.

chewong commented 4 years ago

@ZeroMagic thanks for the fix. Sanity test works now and I will create a PR soon.

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 4 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 4 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

fejta-bot commented 3 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

k8s-ci-robot commented 3 years ago

@fejta-bot: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/azuredisk-csi-driver/issues/283#issuecomment-734060588): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Send feedback to sig-testing, kubernetes/test-infra and/or [fejta](https://github.com/fejta). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.