Wishes for 2022 - Githubissues

vt-alt commented 2 years ago

Not to blame, but list of weakness of burp we sometimes getting. (Btw it seems development is stalled?)

Sometimes big trees (like unpacked Linux kernel source) is backed up very very slowly (many hours) with very small cpu load. We have this on 1/10 basis on notebooks. I wish to debug it more, but it's hard to reproduce (its 100% reproducible for the people when it's started to occur for them, but I cannot take their notebook for experiments).
On normal circumstances (and most important to first backups) - backup speed is limited with a single 100% cpu load by zlib compression. I would suggest to use better fast and parallelizable compression algorithms like zstd.
Restore of particular directory is very slow. Maybe this is related to that we can only restore by a regexp.

Recently I wanted to restore package database for several days like this:

$ burp -ar -b 0000687 -d 2021-11-21 -r '^/var/[^/]+/(apt|rpm)/' -v

It's ~400M, but one restore taking about a hour. Plus, when I wanted to relaunch command with time I cannot re-run restore quickly, because of repository lock and I should still wait a hour when server process finishes. Inability to parallel restore is bad.

I only use protocol 1. Protocol 2 is permanently not production ready. While competitors are already and for a long time use chunking deduplicated backups.

grke commented 2 years ago

Hello,

Yes I am a bit stalled at the moment, due to lack of time, and I am the only developer. I intend to keep working on burp when I get some time.

Thank you for the suggestions. I don't think implementing zstd is as simple as you might think. It requires parallel threads, which I think would basically require rewriting most of the internals of burp. And that wouldn't help if you had multiple clients backing up at the same time. Actually - which part of the backup are you talking about here - phase2 or something else?

Some ideas for two of the speed issues above, if you are not doing these already:

If you have lots of small files to back up, you might want to turn off librsync (set librsync=0).

For faster restores, you might want to try using hardlinked_archive=1. Backups that are hardlinked means that the restore doesn't have to apply any diffs when it comes to restoring a file, so it can just feed the bytes straight off the disk. You can see which backups are already hardlinked by standing in the client's storage directory on the server and doing an 'ls */hardlinked'.

vt-alt commented 2 years ago

Thanks for the reply and suggestions!

phase2 or something else?

Yes, where file transfer occurs.

grke commented 2 years ago

Yes, where file transfer occurs.

Do you see the 100% cpu on the client, or server, or both?

pagalba-com commented 2 years ago

I think if this is Windows clients, it can face Windows Task Scheduler reduced priority issue. Please look at https://aavtech.site/2018/01/windows-task-scheduler-changing-task-priority/ After some update, Windows changed default task priority.

pagalba-com commented 2 years ago

One more thing @vt-alt, while using rsync library for large files, low CPU and network usage can be seen on both client and server, while it is in progress of finding differences, especially for large files. So if there is already duplicate data, it is not sent, as well it is not processed. In some cases it is faster to set rsync library file size cut off in config file.

grke / burp

Wishes for 2022 #896