oVirt / ovirt-imageio

The oVirt image transfer daemon and clients.
GNU General Public License v2.0
19 stars 24 forks source link

Uploading huge sparse images is too slow #76

Closed nirs closed 2 years ago

nirs commented 2 years ago

Uploading 8T fedora image containing 1.5G of data takes 5.5 minutes. It should be much faster.

Client log

$ python3 /usr/share/doc/python3-ovirt-engine-sdk4/examples/upload_disk.py -c engine --sd-name fc-01 --disk-format qcow2 --disk-sparse fedora-35-8t.raw 
[   0.0 ] Checking image...
[   0.0 ] Image format: raw
[   0.0 ] Disk format: cow
[   0.0 ] Disk content type: data
[   0.0 ] Disk provisioned size: 8796093022208
[   0.0 ] Disk initial size: 3228303360
[   0.0 ] Disk name: fedora-35-8t.qcow2
[   0.0 ] Disk backup: False
[   0.0 ] Connecting...
[   0.0 ] Creating disk...
[   5.7 ] Disk ID: bdce52a0-d2ac-4391-81ba-88566f323a0d
[   5.7 ] Creating image transfer...
[   7.4 ] Transfer ID: b79eaba6-b633-4f54-bf27-1d206046ac96
[   7.4 ] Transfer host name: host4
[   7.4 ] Uploading image...
[ 100.00% ] 8.00 TiB, 323.11 seconds, 25.35 GiB/s                              
[ 330.5 ] Finalizing image transfer...
[ 333.6 ] Upload completed successfully

Server log

2022-05-25 17:11:17,585 INFO    (Thread-1157) [tickets] [local] ADD transfer={'dirty': False, 'ops': ['write'], 'size': 8796093022208, 'sparse': False, 'inactivity_timeout': 60, 'transfer_id': 'b79eaba6-b633-4f54-bf27-1d206046ac96', 'uuid': '3f677555-3f2
9-47fc-beea-5e6339370a9c', 'timeout': 300, 'url': 'nbd:unix:/run/vdsm/nbd/b79eaba6-b633-4f54-bf27-1d206046ac96.sock'}
2022-05-25 17:11:17,588 INFO    (Thread-1157) [http] CLOSE connection=1157 client=local [connection 1 ops, 0.004801 s] [dispatch 1 ops, 0.003341 s]
2022-05-25 17:11:18,671 INFO    (Thread-1158) [http] OPEN connection=1158 client=::ffff:192.168.122.23
2022-05-25 17:11:18,674 INFO    (Thread-1158) [images] [::ffff:192.168.122.23] OPTIONS transfer=b79eaba6-b633-4f54-bf27-1d206046ac96
2022-05-25 17:11:18,676 INFO    (Thread-1158) [http] CLOSE connection=1158 client=::ffff:192.168.122.23 [connection 1 ops, 0.004505 s] [dispatch 1 ops, 0.002779 s]
...
2022-05-25 17:16:41,724 INFO    (Thread-1162) [http] CLOSE connection=1162 client=local [connection 1 ops, 322.983955 s] [dispatch 18170 ops, 300.784821 s] [zero 16399 ops, 261.142007 s, 1.99 TiB, 7.82 GiB/s] [zero.zero 16399 ops, 260.809016 s, 1.99 TiB, 7.83 GiB/s] [write 1770 ops, 33.256568 s, 396.00 MiB, 11.91 MiB/s] [write.read 1805 ops, 2.648875 s, 396.00 MiB, 149.50 MiB/s] [write.write 1805 ops, 30.517316 s, 396.00 MiB, 12.98 MiB/s] [flush 1 ops, 0.011099 s]
2022-05-25 17:16:41,730 INFO    (Thread-1160) [http] CLOSE connection=1160 client=local [connection 1 ops, 323.007664 s] [dispatch 18258 ops, 300.452418 s] [zero 16572 ops, 263.889244 s, 2.01 TiB, 7.81 GiB/s] [zero.zero 16572 ops, 263.547431 s, 2.01 TiB, 7.82 GiB/s] [write 1685 ops, 30.176603 s, 385.29 MiB, 12.77 MiB/s] [write.read 1716 ops, 2.402275 s, 385.29 MiB, 160.39 MiB/s] [write.write 1716 ops, 27.683614 s, 385.29 MiB, 13.92 MiB/s] [flush 1 ops, 0.016295 s]
2022-05-25 17:16:41,734 INFO    (Thread-1161) [http] CLOSE connection=1161 client=local [connection 1 ops, 323.002675 s] [dispatch 18175 ops, 300.702131 s] [write 1711 ops, 31.959381 s, 376.45 MiB, 11.78 MiB/s] [write.read 1741 ops, 2.570673 s, 376.45 MiB, 146.44 MiB/s] [write.write 1741 ops, 29.300085 s, 376.45 MiB, 12.85 MiB/s] [zero 16463 ops, 262.263352 s, 2.00 TiB, 7.81 GiB/s] [zero.zero 16463 ops, 261.923150 s, 2.00 TiB, 7.82 GiB/s] [flush 1 ops, 0.016891 s]
2022-05-25 17:16:41,753 INFO    (Thread-1159) [http] CLOSE connection=1159 client=local [connection 1 ops, 323.073054 s] [dispatch 18034 ops, 300.610736 s] [write 1651 ops, 30.661714 s, 373.35 MiB, 12.18 MiB/s] [write.read 1684 ops, 2.506353 s, 373.35 MiB, 148.96 MiB/s] [write.write 1684 ops, 28.072643 s, 373.35 MiB, 13.30 MiB/s] [zero 16382 ops, 263.549683 s, 1.99 TiB, 7.74 GiB/s] [zero.zero 16382 ops, 263.212377 s, 1.99 TiB, 7.75 GiB/s] [flush 1 ops, 0.018416 s]
2022-05-25 17:16:42,261 INFO    (Thread-1201) [http] OPEN connection=1201 client=local
2022-05-25 17:16:42,262 INFO    (Thread-1201) [tickets] [local] REMOVE transfer=b79eaba6-b633-4f54-bf27-1d206046ac96

Connection stats

connection 1 ops, 322.983955 s
dispatch 18170 ops, 300.784821 s
zero 16399 ops, 261.142007 s, 1.99 TiB, 7.82 GiB/s
zero.zero 16399 ops, 260.809016 s, 1.99 TiB, 7.83 GiB/s
write 1770 ops, 33.256568 s, 396.00 MiB, 11.91 MiB/s
write.read 1805 ops, 2.648875 s, 396.00 MiB, 149.50 MiB/s
write.write 1805 ops, 30.517316 s, 396.00 MiB, 12.98 MiB/s
flush 1 ops, 0.011099 s

zero takes most of the time - 16k requests, 261 seconds write.write looks very slow - 11.91 MiB/s

Fix

nirs commented 2 years ago

I tested this change with qcow2 file on block storage:

diff --git a/ovirt_imageio/_internal/io.py b/ovirt_imageio/_internal/io.py
index 93aa758..0ba285b 100644
--- a/ovirt_imageio/_internal/io.py
+++ b/ovirt_imageio/_internal/io.py
@@ -21,7 +21,7 @@ from . backends import Wrapper

 # Limit maximum zero and copy size to spread the workload better to multiple
 # workers and ensure frequent progress updates when handling large extents.
-MAX_ZERO_SIZE = 128 * 1024**2
+MAX_ZERO_SIZE = 1 * 1024**3
 MAX_COPY_SIZE = 128 * 1024**2

 # NBD hard limit.

This does not make significant difference. Testing qemu-img convert show similar slowdown, proving that the issue is in qemu qcow2 driver. Looks like the drive is calling fdatasync() when sending NBD_WRITE_ZEROES command.

Regardless of the qemu issue, there is no reason to zero a new qcow2 image. We could get the destination image extents, find that it is already zeroed, and skip all zero calls. This works only if we know that the qcow2 image does not have a backing file. The NBD protocol does not expose such info but we can add more details in the ticket.