tkhai / dm-qcow2

Block driver to mount QCOW2 images
Other
3 stars 1 forks source link

How to avoid data loss? #3

Closed huzhifeng closed 1 year ago

huzhifeng commented 1 year ago

I found that when the qcow2 image is already mounted, power down may result in data loss.

Here are the steps to reproduce:

[huzhifeng@localhost ~]$ sudo modprobe dm-qcow2 kernel_sets_dirty_bit=y
[huzhifeng@localhost ~]$ qemu-img create -f qcow2 test.qcow2 1G
[huzhifeng@localhost ~]$ sudo ./qcow2-dm.sh create test.qcow2 qcow2-test
[huzhifeng@localhost ~]$ sudo mkfs.ext4 /dev/mapper/qcow2-test
[huzhifeng@localhost ~]$ sudo mount /dev/mapper/qcow2-test /mnt/
[huzhifeng@localhost ~]$ sudo mkdir -p /pub
[huzhifeng@localhost ~]$ echo hello > test.txt
[huzhifeng@localhost ~]$ cat cp.sh
#!/bin/bash

for i in {1..100}; do sudo cp test.txt /mnt/test-${i}.txt; sudo cp test.txt /pub/test-${i}.txt; done
sleep 3
echo 'done'
[huzhifeng@localhost ~]$ ./cp.sh 
done

Power off when printing done, after reboot, the files copied to /pub are all there, but the files copied to /mnt (qcow2 image) are lost.

[huzhifeng@localhost ~]$ sudo modprobe dm-qcow2 kernel_sets_dirty_bit=y
[huzhifeng@localhost ~]$ qemu-img check -r all test.qcow2
ERROR cluster 4 refcount=0 reference=1
...
ERROR cluster 789 refcount=0 reference=1
Rebuilding refcount structure
Repairing cluster 1 refcount=1 reference=0
Repairing cluster 2 refcount=1 reference=0
The following inconsistencies were found and repaired:

    0 leaked clusters
    786 corruptions

Double checking the fixed image now...
No errors were found on the image.
784/16384 = 4.79% allocated, 0.51% fragmented, 0.00% compressed clusters
Image end offset: 51904512
[huzhifeng@localhost ~]$
[huzhifeng@localhost ~]$ sudo ./qcow2-dm.sh create test.qcow2 qcow2-test
[huzhifeng@localhost ~]$ sudo mount /dev/mapper/qcow2-test /mnt/
[huzhifeng@localhost ~]$ ls /mnt/
[huzhifeng@localhost ~]$ ls /pub/
test-10.txt  test-14.txt  test-18.txt  test-21.txt  test-25.txt  test-29.txt  test-32.txt  test-36.txt  test-3.txt   test-43.txt  test-47.txt  test-50.txt  test-54.txt  test-58.txt  test-61.txt  test-65.txt  test-69.txt  test-72.txt  test-76.txt  test-7.txt   test-83.txt  test-87.txt  test-90.txt  test-94.txt
test-11.txt  test-15.txt  test-19.txt  test-22.txt  test-26.txt  test-2.txt   test-33.txt  test-37.txt  test-40.txt  test-44.txt  test-48.txt  test-51.txt  test-55.txt  test-59.txt  test-62.txt  test-66.txt  test-6.txt   test-73.txt  test-77.txt  test-80.txt  test-84.txt  test-88.txt  test-91.txt  test-95.txt
test-12.txt  test-16.txt  test-1.txt   test-23.txt  test-27.txt  test-30.txt  test-34.txt  test-38.txt  test-41.txt  test-45.txt  test-49.txt  test-52.txt  test-56.txt  test-5.txt   test-63.txt  test-67.txt  test-70.txt  test-74.txt  test-78.txt  test-81.txt  test-85.txt  test-89.txt  test-92.txt  test-96.txt
test-13.txt  test-17.txt  test-20.txt  test-24.txt  test-28.txt  test-31.txt  test-35.txt  test-39.txt  test-42.txt  test-46.txt  test-4.txt   test-53.txt  test-57.txt  test-60.txt  test-64.txt  test-68.txt  test-71.txt  test-75.txt  test-79.txt  test-82.txt  test-86.txt  test-8.txt   test-93.txt  test-9.txt
[huzhifeng@localhost ~]$

If I change sleep 3 to sync, then the files copied to the /mnt (qcow2 image) will not be lost either.

[huzhifeng@localhost ~]$ cat cp.sh
#!/bin/bash

for i in {1..100}; do sudo cp test.txt /mnt/test-${i}.txt; sudo cp test.txt /pub/test-${i}.txt; done
sync
echo 'done'
[huzhifeng@localhost ~]$

It looks like a qm-qcow2 driver problem, is there any good way to avoid data loss?

tkhai commented 1 year ago

No, this is not dm-qcow2 driver problem, it's just the way of how linux buffered write works. There is no a guarantee of data was actually written on disk until you directly call fsync/sync. See "Notes" in man 2 write for the details.

The difference between /pub and /mnt in your example is that /mnt has additional layer, which is dm-qcow2 block device. So, the flushing actual data on disk is even later in /mnt case.

Note, that after I run your test, all files in /pub are empty, despite they are created in the directory:

root@qemu:~# ls -s /pub/test-* 0 /pub/test-100.txt 0 /pub/test-28.txt 0 /pub/test-46.txt 0 /pub/test-64.txt 0 /pub/test-82.txt 0 /pub/test-10.txt 0 /pub/test-29.txt 0 /pub/test-47.txt 0 /pub/test-65.txt 0 /pub/test-83.txt 0 /pub/test-11.txt 0 /pub/test-2.txt 0 /pub/test-48.txt 0 /pub/test-66.txt 0 /pub/test-84.txt 0 /pub/test-12.txt 0 /pub/test-30.txt 0 /pub/test-49.txt 0 /pub/test-67.txt 0 /pub/test-85.txt 0 /pub/test-13.txt 0 /pub/test-31.txt 0 /pub/test-4.txt 0 /pub/test-68.txt 0 /pub/test-86.txt 0 /pub/test-14.txt 0 /pub/test-32.txt 0 /pub/test-50.txt 0 /pub/test-69.txt 0 /pub/test-87.txt 0 /pub/test-15.txt 0 /pub/test-33.txt 0 /pub/test-51.txt 0 /pub/test-6.txt 0 /pub/test-88.txt 0 /pub/test-16.txt 0 /pub/test-34.txt 0 /pub/test-52.txt 0 /pub/test-70.txt 0 /pub/test-89.txt 0 /pub/test-17.txt 0 /pub/test-35.txt 0 /pub/test-53.txt 0 /pub/test-71.txt 0 /pub/test-8.txt 0 /pub/test-18.txt 0 /pub/test-36.txt 0 /pub/test-54.txt 0 /pub/test-72.txt 0 /pub/test-90.txt 0 /pub/test-19.txt 0 /pub/test-37.txt 0 /pub/test-55.txt 0 /pub/test-73.txt 0 /pub/test-91.txt 0 /pub/test-1.txt 0 /pub/test-38.txt 0 /pub/test-56.txt 0 /pub/test-74.txt 0 /pub/test-92.txt 0 /pub/test-20.txt 0 /pub/test-39.txt 0 /pub/test-57.txt 0 /pub/test-75.txt 0 /pub/test-93.txt 0 /pub/test-21.txt 0 /pub/test-3.txt 0 /pub/test-58.txt 0 /pub/test-76.txt 0 /pub/test-94.txt 0 /pub/test-22.txt 0 /pub/test-40.txt 0 /pub/test-59.txt 0 /pub/test-77.txt 0 /pub/test-95.txt 0 /pub/test-23.txt 0 /pub/test-41.txt 0 /pub/test-5.txt 0 /pub/test-78.txt 0 /pub/test-96.txt 0 /pub/test-24.txt 0 /pub/test-42.txt 0 /pub/test-60.txt 0 /pub/test-79.txt 0 /pub/test-97.txt 0 /pub/test-25.txt 0 /pub/test-43.txt 0 /pub/test-61.txt 0 /pub/test-7.txt 0 /pub/test-98.txt 0 /pub/test-26.txt 0 /pub/test-44.txt 0 /pub/test-62.txt 0 /pub/test-80.txt 0 /pub/test-99.txt 0 /pub/test-27.txt 0 /pub/test-45.txt 0 /pub/test-63.txt 0 /pub/test-81.txt 0 /pub/test-9.txt