Support incremental backup

rescuezilla / rescuezilla

The Swiss Army Knife of System Recovery

GNU General Public License v3.0

1.6k stars 74 forks source link

Support incremental backup #14

Open shasheene opened 5 years ago

shasheene commented 5 years ago

An Anonymous user in 2010 asked for incremental backups. Many other users have requested incremental/updateable backups.

This is a good idea, but will require some thought. Clonezilla does not support incremental backup. Supporting incremental file-based backup (rather than partition based backup) does not contradict Rescuezilla's roadmap. Indeed, it would be a very useful thing.

This is very low priority and will be years until this gets implemented unless sufficient Patreon funding. Please comment/react if you would like this feature.

shasheene commented 4 years ago

Mid-October 2020 update: The recently released Rescuezilla v2.0 has made Rescuezilla drop-in interoperable with Clonezilla images.

Because of this unilaterally adding this feature would prevent Rescuezilla backups from being able to be restored with Clonezilla (which would not be cool). To implement this feature, work will need to be done in collaboration with Clonezilla author Steven Shiau (and Partclone author Thomas Tsai). The Clonezilla feature request for this issue is here.

In 2016-06-05, Clonezilla author Steven Shiau wrote:

As for the differential & incremental backups using Clonezilla, it's almost impossible due to the nature of "block-based" imaging mechanism in Cloinezilla.

pfrouleau commented 3 years ago

Have you heard of the bup project?

It could be used to achieve differential backup. The way I see it, bup could be used instead of an archiver+split. That's what I wanted to add to Clonezilla back in 2013 😅, and it is also the reason why I contributed to Partclone to make version 2 of the image file.

With bup, the 1st backup size is about the same size as a regular zipped clone, but the next backup only stores the modified sectors. Let's say we back up a 200GB partition. The 1st backup would be ~100GB, but the next backup could be only a few GB. It depends on how much activity the partition had and how often it is backuped.

shasheene commented 3 years ago

Great work @pfrouleau with your authoring the partclone v2 image format, and adding partclone v2 mounting support to a fork of partclone-utils.

I hadn't heard of bup. It might be a good tool upon which to build a graphical incremental backup tool. Previously I was thinking rsync would be best.

I am open to integrating an incremental backup approach into the Rescuezilla frontend (rather than having it as a separate application). Keeping Rescuezilla's user-interface simple and not confusing is vital. As you may have read, as per Rescuezilla Roadmap right now I'm focused on Rescuezilla's bread-and-butter "imaging" functionality including improving the performance of image explorer (beta). but I will be adding a "cloning" feature to match Clonezilla. I also feel like adding a ddrescue-based "data salvage" mode, and such raw "dd" images will be accessed using the existing "Image Explorer (beta)" mode, and a graphical frontend to the powerful TestDisk file undelete program. Finally, I also intend on adding the ability to restore (and explore) images made with other partclone-based tools like Redo Rescue and Foxclone.

Given this roadmap, integrating incremental backup into Rescuezilla will be best done as separate mode such as how AOMEI Backupper does it. This will probably require a redesign of the Rescuezilla's Welcome screen so that even newbies can understand the difference between cloning, imaging and file-based backup.

pfrouleau commented 3 years ago

Well, I did not finish the modifications for partclone-utils. I got side-tracked on other things. At least you salvaged a few commits.

I took a quick look at the code. It looks very clean and nicely documented. I may try to hack it a little bit to see how well bup would do during a live session. My progress will probably be very slow because I like do to something else than programming after work. I already did many tests by decompressing Clonezilla backups and storing them in bup to free diskspace while keeping the backups. That works very well, but if I would have to restore one of these I would I to extract it first.

ddrescue can be very helpful when the harddrive is dying, but with the size of partition we have now, the odd of the disk dying on us before ddrescue completes is pretty high. And the SSD are not making this easier. My experience with SSD is pretty much "No, I don't want to talk to you anymore" right when the computer starts. That's why I try to backup my computer boot disk at least once a month to be able to restore it quickly on a new disk.

As for using rsync from Rescuezilla, I am not sure about the use cases. It would not allow making a backup of the partitions, MBR, and the like. It is a nice tool to back up the files, in particular when the hardlink trick is used, but that can be done from the installed OS, even on Windows. Doing it from a live USB key seems a little bit overkill. However, it could be useful to rescue the files before reinstalling the OS because it became unbootable.

shasheene commented 3 years ago

Fair point on the partclone-utils changes. That reminds me: I will spend some time today and get my open Merge Request closed on the partclone-utils project (after I left it open for 2 months).

Thanks for the complement about the Rescuezilla codebase. Everything in the 'parser' subdirectory is standalone and has associated with unit tests that act as worked examples. As you may know, its a GTK application and you can open the XML file src/apps/rescuezilla/rescuezilla/usr/share/rescuezilla/rescuezilla.glade using (often buggy) application "Glade". Its this XML file which registers the signals (eg, button clicked etc) to a function name. The file handler.py contains the logic to handle the signals for the entire application (not a clean design at all, but I don't know if there's a better way). You may have problems building the application using the non-Docker instructions due to the issues described in #161, so it may be worth sticking to the make deb to avoid building the Linux images, or using Docker. If you don't have time or interest to work on Rescuezilla, no big deal.

Your idea on ddrescue is pretty reasonable. Though ability to make a raw backup may be useful for forensic undeleting using a future TestDisk frontend (but no reason this can't be done directly).

I may have misunderstood the differences between rsync and bup based incremental backups. bup is actually an actual imaging tool operating on partitions and the MBR then, rather than a file-based tool like rsync. That actually might be very useful!

pfrouleau commented 3 years ago

Thanks for the pointers about Glade. I am not a Python developer, so that will help me to find what I need.

bup supports many modes. The most used one is to backup directories and files, ex: bup index /some/dir && bup save -n b200204 /some/dir

But we can also pipe the data directly to it and that what I use to capture partclone output, ex: partclone ... | bup save -n b200204

Bup can also imports backup made with rsnapshot or duplicity.

One of the crasy thing I want to experiment more with is how much deduplication bup is able to give if I backup my disk twice:

first with partclone, to be able to restore the disk in full;
then with bup-index+save.

That way I could restore with partclone OR simply browse the files with bup directly. The 1st backup would take twice as long to run, but the next backups should be faster because step 2 would only read the modified & new files.

shasheene commented 3 years ago

That does sound interesting. Definitely needs derisking the concept with command-line applications before any thought is put into designing a Rescuezilla interface.

I will create a "how to get started developing Rescuzilla" page on the GitHub wiki at some point. By the way I forgot to mention I like found this PyGTK tutorial quite useful: https://python-gtk-3-tutorial.readthedocs.io/en/latest/ I like Python, but I find GTK and its tooling (like Glade) not that great. I think you'll also find GTK and Glade lacking in some places too, so I re-iterate that there's no need to modify Rescuezilla just yet: we'll stick to analyzing bup and its trade-offs until a path forward has been determined.

lucatrv commented 1 year ago

Please have a look at this old discussion.

My proposal was to use a data deduplication tool like rdedup either downstream Clonezilla while backuping, or upstream Clonezilla while restoring.