mbideau / btrfs-diff-go

Analyze differences between two BTRFS snapshots (like GNU diff for directories)
GNU General Public License v3.0
7 stars 3 forks source link

Performance comparison with the sh version #2

Open Mek101 opened 3 years ago

Mek101 commented 3 years ago
[root@home-server /home/mek101/Progetti/go/bin]# time ./btrfs-diff-go /mountpoints/raid1/anime/data_snapshot.20210816/ /mountpoints/raid1/anime/bk_data_snapshot.2021094/
unexpected ChangeType on original /MyNextLifeAsAVillainess/wget-log: !!!unexpected ChangeType on original /MyNextLifeAsAVillainess/Hamefura_Ep_01_SUB_ITA.mp4: !!!       !!!: /MyNextLifeAsAVillainess/Hamefura_Ep_01_SUB_ITA.mp4
       !!!: /MyNextLifeAsAVillainess/wget-log
     added: /86
     added: /Kobayashi-san Chi no Maid Dragon
     added: /Kyousougiga
     added: /Otome Game no Hametsu Flag shika Nai Akuyaku Reijou ni Tensei shiteshimatta (Hamefura)
...
495G    data_snapshot.20210816/
535G    bk_data_snapshot.20210904

with a time of only!

real    0m2,062s
user    0m0,364s
sys 0m2,162s
Mek101 commented 3 years ago

Although

Due to BTRFS implementation, some files appear as changed, when they are not (according to diff utility). I have absolutely no idea why BTRFS is acting like this… If someone can help me figures this out, I'll be glad.

Makes it hard to use: I redirected the output of both versions into files, and...

[mek101@home-server ~/Progetti/go/bin]$ wc -l sh_out go_out 
    33 sh_out
  1723 go_out
Mek101 commented 3 years ago

In regards to that: ./btrfs-diff-go --debug /mountpoints/raid1/anime/data_snapshot.20210816/ /mountpoints/raid1/anime/bk_data_snapshot.2021094/ All debug outputs are a variant of this

[DEBUG] Cmd [15 0 54 0 49 49 101 121 101 115 47 91 102 114 111 122 101 110 100 97 108 101 93 49 49 95 101 121 101 115 95 48 49 95 91 120 118 105 100 95 105 116 97 93 91 53 56 49 100 52 52 55 102 93 46 97 118 105 11 0 12 0 241 74 30 97 0 0 0 0 160 65 202 43 10 0 12 0 144 150 206 95 0 0 0 0 154 157 48 44 9 0 12 0 109 53 191 96 0 0 0 0 131 75 240 22]; type BTRFS_SEND_C_UTIMES
[DEBUG] TRACE       BTRFS_SEND_C_UTIMES 11eyes/[frozendale]11_eyes_01_[xvid_ita][581d447f].avi
[DEBUG] TRACE    changed 11eyes/[frozendale]11_eyes_01_[xvid_ita][581d447f].avi
[DEBUG] peekAndDiscard() need to read more bytes '4' than there are buffered '0'
[DEBUG] peekAndDiscard() increasing the buffer size to match the need

And with ./btrfs-diff-go --debug /mountpoints/raid1/anime/data_snapshot.20210816/ /mountpoints/raid1/anime/bk_data_snapshot.2021094/ 2>1 | grep " BTRFS_SEND_"

Only false positives appear in the output

mbideau commented 3 years ago

Hey @Mek101,

Regarding your first post in that thread, it reveals that there are still a bug (this one is really tricky) and I created an issue to track it. It also shows an incredible speed comparing to the shell version, which is good news : it makes it practical.

About your second post, it seems that there are inconsistencies between the go and shell report. Which one do you think has the right report ?

Finally regarding your last comment, I would love to be able to reproduce all of that, to debug both issues, and stop bothering you :sweat_smile: To help me reproduce it, could you produce a btrfs send stream file and send it to me (either posting it here, if making the file names public is okay for you, or by send it to me privately, I won't publish it).

The command to run is like the following (timing it is also interesting) :

~> sudo time btrfs send --quiet --no-data -f btrfs.stream -p older_subvolume newer_subvolume

For the first post, (the one with the tricky bug), it would be:

~> sudo time btrfs send --quiet --no-data -f btrfs.stream -p /mountpoints/raid1/anime/data_snapshot.20210816 /mountpoints/raid1/anime/bk_data_snapshot.2021094

If the btrfs.stream file is too big to be sent by email, you can drop it on a website like WeTransfer.

It would allow me to reproduce this bug that I cannot manage to get it until now ...

Thank you a lot in advance.

Mek101 commented 3 years ago

About your second post, it seems that there are inconsistencies between the go and shell report. Which one do you think has the right report ?

It should be the shell one, since the go report includes supposed changes to files I know for sure haven't been modified since the creation of subvolume (ie, before data_snapshot.20210816)

Mek101 commented 3 years ago
time btrfs send --quiet --no-data -f btrfs.stream -p /mountpoints/raid1/anime/data_snapshot.20210816 /mountpoints/raid1/anime/bk_data_snapshot.2021094

real    0m1,621s
user    0m0,015s
sys 0m1,991s
mbideau commented 3 years ago
time btrfs send --quiet --no-data -f btrfs.stream -p /mountpoints/raid1/anime/data_snapshot.20210816 /mountpoints/raid1/anime/bk_data_snapshot.2021094

real  0m1,621s
user  0m0,015s
sys   0m1,991s

OKay, your first run with btrfs-diff-go showed :

real    0m2,062s
user    0m0,364s
sys 0m2,162s

Which we could interpret as the Go processing overhead costing about 0m0,4xx, which is incredibly fast. Good news.

Also, thank you very much for the btrfs.stream file :wink:

Mek101 commented 3 years ago

I'll try diffing a couple subvolumes on the OS ssd and see how it goes

Mek101 commented 3 years ago

I was trying to benc both versions on the snapshots of my root ssd, but I encountered a couple of fatal errors:

4,9G    root_snapshot.20210816/
6,7G    root_snapshot.20210905/
[root@home-server /mountpoints/root/snapshots/root]# time sudo btrfs-diff root_snapshot.20210816/ root_snapshot.20210905/
Warning: unknown raw line 'mksock          ./root_snapshot.20210905/o59141-618-0'
Fatal error: when renaming './root_snapshot.20210905/o59141-618-0' to './root_snapshot.20210905/etc/samba/private/msg.sock/6555', the source wasn't found in the objects buffer

real    12m40,858s
user    13m0,688s
sys 4m48,570s
# time ./btrfs-diff-go /mountpoints/root/snapshots/root/root_snapshot.20210816/ /mountpoints/root/snapshots/root/root_snapshot.20210905/
btrfsSendSyscall returns broken pipe

real    0m8,920s
user    0m1,639s
sys 0m10,044s
mbideau commented 3 years ago

OKay, you are definitely a good crash-tester :wink: :sweat_smile:

Regarding the bug of the shell version, I think I should read the btrfs receive sources to be sure that I haven't forgotten about another case... I wanted to avoid that... (lazy me !).

About the Go version, I am currently improving the debugging information, to better catch and resolve future issues.

I'll keep you posted.

Again, thank you very much for your contributions. I will add you to a Contributors section in the documentation, if you agree ... If so, how do you want me to name you ? Mek101 or your real name ?

mbideau commented 3 years ago

In the meantime, could you generate the btrfs stream of this diff please and send it to me ? It would be a great help to debug...

Command to do so :

~> sudo btrfs send --quiet --no-data -f btrfs.stream.1 -p root_snapshot.20210816 root_snapshot.20210905

Thank you in advance.

Mek101 commented 3 years ago

Sent the btrfs.stream.1 file through email

mbideau commented 3 years ago

I have the pleasure to tell you that I have added an Authors and contributors section to the README (commit 63c329c) and you are appearing as the first contributor ever :smiley: :champagne:

mbideau commented 3 years ago

Hey @Mek101,

No more bugs nor inconsistencies planned (all fixed), so I think you can do more testing if you want :wink: If you find another bug, I buy you a beer :beer: Ahahah

mbideau commented 2 years ago

Ciao @Mek101, How are you ? Did you had the chance to run another test with this Go version ? I still don't have a real life size data set to test against (waiting to transfer my 3Tb to btrfs for that), so I kind of rely on you (for now). I would like to depreciate the sh version if this one is faster and without (known) bugs. Thank you in advance.

Mek101 commented 1 year ago

Excuse me, would it be possible to wipe my name from the repository history?