grke / burp

burp - backup and restore program
http://burp.grke.net
Other
483 stars 77 forks source link

Wishes for 2022 #896

Open vt-alt opened 2 years ago

vt-alt commented 2 years ago

Not to blame, but list of weakness of burp we sometimes getting. (Btw it seems development is stalled?)

Recently I wanted to restore package database for several days like this:

$ burp -ar -b 0000687 -d 2021-11-21 -r '^/var/[^/]+/(apt|rpm)/' -v

It's ~400M, but one restore taking about a hour. Plus, when I wanted to relaunch command with time I cannot re-run restore quickly, because of repository lock and I should still wait a hour when server process finishes. Inability to parallel restore is bad.

grke commented 2 years ago

Hello,

Yes I am a bit stalled at the moment, due to lack of time, and I am the only developer. I intend to keep working on burp when I get some time.

Thank you for the suggestions. I don't think implementing zstd is as simple as you might think. It requires parallel threads, which I think would basically require rewriting most of the internals of burp. And that wouldn't help if you had multiple clients backing up at the same time. Actually - which part of the backup are you talking about here - phase2 or something else?

Some ideas for two of the speed issues above, if you are not doing these already:

If you have lots of small files to back up, you might want to turn off librsync (set librsync=0).

For faster restores, you might want to try using hardlinked_archive=1. Backups that are hardlinked means that the restore doesn't have to apply any diffs when it comes to restoring a file, so it can just feed the bytes straight off the disk. You can see which backups are already hardlinked by standing in the client's storage directory on the server and doing an 'ls */hardlinked'.

vt-alt commented 2 years ago

Thanks for the reply and suggestions!

phase2 or something else?

Yes, where file transfer occurs.

grke commented 2 years ago
Yes, where file transfer occurs.

Do you see the 100% cpu on the client, or server, or both?

pagalba-com commented 2 years ago

I think if this is Windows clients, it can face Windows Task Scheduler reduced priority issue. Please look at https://aavtech.site/2018/01/windows-task-scheduler-changing-task-priority/ After some update, Windows changed default task priority.

pagalba-com commented 2 years ago

One more thing @vt-alt, while using rsync library for large files, low CPU and network usage can be seen on both client and server, while it is in progress of finding differences, especially for large files. So if there is already duplicate data, it is not sent, as well it is not processed. In some cases it is faster to set rsync library file size cut off in config file.