qemu-img: Failed to lock byte 100 succes upload but can not clone.

zakkg3 commented 4 years ago

/kind bug

What happened: virtctl image-upload a centos7 image success with errors. Then I can start VMs from this PVC but can not clone it (it stays at 0%).

What you expected to happen: Fail if the upload its not ok, or success cloning and start my VM, as it with ubuntu.

Environment:

CDI version v1.12.0
Kubernetes versions 1.15.5
Cloud provider or hardware configuration: Ubuntu on bare metal.

Reproduce like this:

wget https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2                                  
--2020-02-11 14:40:11--  https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2
Resolving cloud.centos.org (cloud.centos.org)... 94.130.67.19
Connecting to cloud.centos.org (cloud.centos.org)|94.130.67.19|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 942407680 (899M)
Saving to: ‘CentOS-7-x86_64-GenericCloud.qcow2’

CentOS-7-x86_64-GenericCloud.qcow2           100%[=============================================================================================>] 898.75M  90.2MB/s    in 10s

2020-02-11 14:40:22 (86.7 MB/s) - ‘CentOS-7-x86_64-GenericCloud.qcow2’ saved [942407680/942407680]

I created the origin pvc like this:

virtctl -n kt image-upload --uploadproxy-url 10.206.144.239 --pvc-name=centos7-x86-64 --pvc-size=9Gi --image-path ./CentOS-7-x86_64-GenericCloud.qcow2 --insecure
PVC kt/centos7-x86-64 created
Waiting for PVC centos7-x86-64 upload pod to be running...
Pod now running
Uploading data to https://10.206.144.239

 898.75 MiB / 898.75 MiB [=======================================================================] 100.00% 1m41s

Uploading ./CentOS-7-x86_64-GenericCloud.qcow2 completed successfully

cdi-uplad pod logs:

I0211 13:54:24.443059       1 uploadserver.go:62] Upload destination: /data/disk.img
I0211 13:54:24.443180       1 uploadserver.go:64] Running server on 0.0.0.0:8443
I0211 13:54:31.928212       1 uploadserver.go:263] Content type header is ""
I0211 13:54:31.928279       1 data-processor.go:253] Calculating available size
I0211 13:54:31.952370       1 data-processor.go:261] Checking out file system volume size.
I0211 13:54:31.953280       1 data-processor.go:265] Request image size not empty.
I0211 13:54:31.953413       1 data-processor.go:270] Target size 9663479808.
I0211 13:54:31.969153       1 data-processor.go:183] New phase: TransferScratch
I0211 13:54:31.973879       1 util.go:169] Writing data...
I0211 13:54:41.011074       1 data-processor.go:183] New phase: Process
I0211 13:54:41.011161       1 data-processor.go:183] New phase: Convert
I0211 13:54:41.011234       1 data-processor.go:189] Validating image
I0211 13:56:43.884051       1 data-processor.go:183] New phase: Resize
W0211 13:56:44.338421       1 data-processor.go:240] Available space less than requested size, resizing image to available space 9663479808.
I0211 13:56:44.338463       1 data-processor.go:246] Expanding image size to: 9663479808
E0211 13:56:50.801641       1 prlimit.go:164] qemu-img failed output is:

E0211 13:56:50.801675       1 prlimit.go:165] Unexpected error in raw_apply_lock_bytes() at /builddir/build/BUILD/qemu-3.0.1/block/file-posix.c:696:
qemu-img: Failed to lock byte 100

I0211 13:57:05.205424       1 data-processor.go:183] New phase: Complete
I0211 13:57:05.206785       1 util.go:37] deleting file: /scratch/tmpimage
I0211 13:57:05.266748       1 uploadserver.go:282] Wrote data to /data/disk.img
I0211 13:57:05.266838       1 uploadserver.go:152] Shutting down http server after successful upload
I0211 13:57:05.267183       1 uploadserver.go:72] UploadServer successfully exited

then I can starts VM using this pvc but I when I try to clone i get logs:

I0211 13:47:10.775101      10 clone-source.go:106] content_type is "filesystem-clone"
I0211 13:47:10.775313      10 clone-source.go:107] upload_bytes is 9573744640
I0211 13:47:10.775344      10 clone-source.go:117] Starting cloner target
I0211 13:47:11.125647      10 clone-source.go:129] Set header to filesystem-clone
I0211 13:47:11.775877      10 prometheus.go:67] 0.00
I0211 13:47:12.776045      10 prometheus.go:67] 0.00
and so on...

not sure if it's related with https://github.com/kubevirt/containerized-data-importer/issues/1073

thanks!

zakkg3 commented 4 years ago

update: Did the process again but now with a .qcow2.xz image and it does not complain on the upload:

virtctl -n kt image-upload --uploadproxy-url 10.206.144.239 --pvc-name=centos7-x86-64 --pvc-size=9Gi --image-path ./CentOS-7-x86_64-GenericCloud.qcow2.xz --insecure
PVC kt/centos7-x86-64 created
Waiting for PVC centos7-x86-64 upload pod to be running...
Pod now running
Uploading data to https://10.206.144.239

 262.52 MiB / 262.52 MiB [================================================================] 100.00% 2m57s

Uploading ./CentOS-7-x86_64-GenericCloud.qcow2.xz completed successfully

logs:

k logs -n kt cdi-upload-centos7-x86-64 -f                                                    1 ↵  10105  15:40:52 
I0211 14:40:58.386178       1 uploadserver.go:62] Upload destination: /data/disk.img
I0211 14:40:58.386529       1 uploadserver.go:64] Running server on 0.0.0.0:8443
I0211 14:41:06.611615       1 uploadserver.go:263] Content type header is ""
I0211 14:41:06.611700       1 data-processor.go:253] Calculating available size
I0211 14:41:06.621214       1 data-processor.go:261] Checking out file system volume size.
I0211 14:41:06.622002       1 data-processor.go:265] Request image size not empty.
I0211 14:41:06.622120       1 data-processor.go:270] Target size 9663479808.
I0211 14:41:06.675138       1 data-processor.go:183] New phase: TransferScratch
I0211 14:41:06.678426       1 util.go:169] Writing data...
I0211 14:42:28.682506       1 data-processor.go:183] New phase: Process
I0211 14:42:28.682545       1 data-processor.go:183] New phase: Convert
I0211 14:42:28.682585       1 data-processor.go:189] Validating image
I0211 14:43:38.738446       1 data-processor.go:183] New phase: Resize
W0211 14:43:39.423928       1 data-processor.go:240] Available space less than requested size, resizing image to available space 9663479808.
I0211 14:43:39.423978       1 data-processor.go:246] Expanding image size to: 9663479808
I0211 14:44:04.336900       1 data-processor.go:183] New phase: Complete
I0211 14:44:04.341251       1 util.go:37] deleting file: /scratch/tmpimage
I0211 14:44:04.415078       1 uploadserver.go:282] Wrote data to /data/disk.img
I0211 14:44:04.415137       1 uploadserver.go:152] Shutting down http server after successful upload
I0211 14:44:04.915451       1 uploadserver.go:72] UploadServer successfully exited

clogging still failing :(

k logs -n kt cdi-clone-source-centos7-x86-64-r8cmh -f     1 ↵  10461  15:47:45 
VOLUME_MODE=filesystem
MOUNT_POINT=/var/run/cdi/clone/source
/var/run/cdi/clone/source /
UPLOAD_BYTES=9529253888
./
I0211 14:47:57.723940      10 clone-source.go:106] content_type is "filesystem-clone"
I0211 14:47:57.724052      10 clone-source.go:107] upload_bytes is 9529253888
I0211 14:47:57.724069      10 clone-source.go:117] Starting cloner target
I0211 14:47:58.112942      10 clone-source.go:129] Set header to filesystem-clone
I0211 14:47:58.724651      10 prometheus.go:67] 0.00
I0211 14:47:59.724894      10 prometheus.go:67] 0.00
I0211 14:48:00.725144      10 prometheus.go:67] 0.00
I0211 14:48:01.725407      10 prometheus.go:67] 0.00
I0211 14:48:02.725800      10 prometheus.go:67] 0.00
I0211 14:48:03.726014      10 prometheus.go:67] 0.00

awels commented 4 years ago

Is there anything in the target pod logs about receiving the data?

zakkg3 commented 4 years ago

in the cdi-upload-volume pod:

k logs -n kt cdi-upload-volume-centos-dv -f     
I0211 14:47:28.099688       1 uploadserver.go:62] Upload destination: /data/disk.img
I0211 14:47:28.099849       1 uploadserver.go:64] Running server on 0.0.0.0:8443
I0211 14:47:58.136162       1 uploadserver.go:263] Content type header is "filesystem-clone"

btw: I can clone with no problems other OS, like Ubuntu.

awels commented 4 years ago

That almost seems like the communication between the pods is failed, but it does receive the header. I need to ponder on this.

kubevirt-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kubevirt-bot commented 4 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

kubevirt-bot commented 4 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

kubevirt-bot commented 4 years ago

@kubevirt-bot: Closing this issue.

In response to [this](https://github.com/kubevirt/containerized-data-importer/issues/1108#issuecomment-656832233): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

awels commented 2 years ago

This was likely on an NFSv3, we don't support NFSv3

kubevirt / containerized-data-importer

qemu-img: Failed to lock byte 100 succes upload but can not clone. #1108