restic / restic

Fast, secure, efficient backup program
https://restic.net
BSD 2-Clause "Simplified" License
26.41k stars 1.56k forks source link

Restic check on local repository writes the same amount of data as the repository size to OS drive, reducing lifetime of SSD #3375

Closed JsBergbau closed 2 years ago

JsBergbau commented 3 years ago

Output of restic version

restic 0.12.0 compiled with go1.15.8 on linux/arm

How did you run restic exactly?

export RESTIC_REPOSITORY=.
export RESTIC_PASSWORD=<SECRET_PW>
cd /home/pi/extern/restic
/home/pi/restic check --read-data --no-cache

Output:

~/extern/restic $ /home/pi/restic check --read-data --no-cache
repository xxxxxxx opened successfully, password is correct
create exclusive lock for repository
load indexes
check all packs
check snapshots, trees and blobs
read all data%  16 / 16 snapshots
[1:19] 100.00%  16 / 16 snapshots
[3:39:52] 5.90%  20405 / 346096 packs

What backend/server/service did you use to store the repository?

a 3.5" external USB HDD with EXT4 filesystem mounted to /home/pi/extern

Expected behavior

restic reads all data from local repository and checking them without writing a copy to temp folder

Actual behavior

Restic copies all data stored on directly attached harddrive to temp directory.

lsof | grep restic lists a lot of these entrys

restic    2810                              pi    6u      REG      179,2   4386561       5494 /tmp/restic-temp-421096629 (deleted)
restic    2810                              pi    8u      REG      179,2   6581784      11442 /tmp/restic-temp-657619758 (deleted)
restic    2810                              pi    9u      REG      179,2   4294885       1945 /tmp/restic-temp-362046705 (deleted)
restic    2810                              pi   10u      REG      179,2   5544330      10340 /tmp/restic-temp-922992203 (deleted)

This is just an excerpt. List is really long.

Steps to reproduce the behavior

Check a local restic repostory with restic check command (on a Raspberry PI3). On an idle system you can check the caused writeload of restic via cat /sys/fs/ext4/mmcblk0p2/lifetime_write_kbytes replace mmcblk0p2 with where your root partition "/" is mounted.

After checking 1000 packs I had about 5 GB more lifetime writes

restic/data/00 $ ls -lh | head
total 6,8G
restic/data/00 $ ls -lh | wc -l
1381

So average size of one pack is 4,92 MB. Reading 1000 of this giving about 5 GB more written data.

Do you have any idea what may have caused this?

Restic treats it as remote repository where it has to download the data from before it can do useful stuff with it.

I think this behaviour probably will also occur when restoring data.

Probably it is because used system is too slow so that temp files get written do disk before restic deletes them again.

Do you have an idea how to solve the issue?

Current workaround is

sudo mkdir /run/restic
sudo chown pi:pi /run/restic
export TMPDIR=/run/restic

I know that restic needs a temp directory when backing up before uploading. It was just very unexpected that restic copies whole local repository when doing a check. So far I've done this several times, meaning a few TB were written on the SD card, so I should now replace it and buy a new one.

Due to the kind restic works it may be a lot of work to change behaviour for local repositories not to copy the file data. Perhaps this only occurs because I do this on a raspberry PI 3B which is so slow that pack files got written to disk before they were deleted again. So just a simple warning when executing check that this operation may cause write of the whole repository to local disk and one should consider using export TMPDIR with tempdir on a ramdisk.

Did restic help you today? Did it make you happy in any way?

Restic is a really great program. I really like it very much and I'm looking very forward to when it is possible to increase filesize of packfiles via commandline, see https://github.com/restic/restic/pull/2750

I backup now all my machines with restic, even raspberry PIs complete OS and made a full restore procedure which I can share I someone is interested. I really like restic. It is the best backup program of all those I've tried (Duplicati, Hardlinkbackup, Windows integrated Backup, using Winrar, robocopy, synchthing used as backup software).

rawtaz commented 3 years ago

I have never seen restic copy the entire repository (so that all of it is contained on disk at a given point in time) when checking, with or without --read-data. I have however never checked a local repository since all my repositories are on other systems. So either it's a matter of something unusual happening when you check local repositories, or you have found something that happens to only some people.

Can you please edit your initial post such that it contains all of the restic command and any environment variables used (presumably you use some because you don't provide a -r to specify the repo) and also all of the command's output? Thanks!

JsBergbau commented 3 years ago

or you have found something that happens to only some people.

I think it is because Raspberry PI is rather slow for checking repository. Here https://github.com/restic/restic/pull/2750#issuecomment-761210369 I've read that after a few seconds in linux temporary files are written do disk.

In addition, on Linux the temporary files start to get written to disk after a few seconds, which causes quite a lot of disk traffic when not using a memdisk. With large packfiles, it is basically guaranteed that these get old enough to get written to disk.

So if restic hasn't fully processed this file before that period it gets written to disk where it is then deleted a few seconds later. More than 90 % of CPU on all 4 Cores is used, so in htop > 360 % CPU is shown. This leads to thermal throttling of Raspberry PI 3B which makes it even slower, having more time to write the temp files to disk. As you can see in the first post: to check 21,000 pack files it needs almost 4 hours.

Current system status: grafik

Can you please edit your initial post such that it contains all of the restic command and any environment variables used (presumably you use some because you don't provide a -r to specify the repo) and also all of the command's output? Thanks!

Done. If you need any further information, don't hestitate to ask.

rawtaz commented 3 years ago

I realize now that you are not saying that restic makes a full copy of your entire repository and then starts checking it, but that you're just saying that restic writes the parts of the repository that it checks to disk instead of just keeping those parts in memory, correct?

The temp files are explicitly created by the DownloadAndHash function in repository.go and this is only done when you use the --read-data command, so I don't think it's something that just happens because of the performance of the RPi, but someone with more insight into check will have to comment on whether these temporary files are needed or not.

JsBergbau commented 3 years ago

I realize now that you are not saying that restic makes a full copy of your entire repository and then starts checking it, but that you're just saying that restic writes the parts of the repository that it checks to disk instead of just keeping those parts in memory, correct?

Yes, sorry my subject was misleading. I've changed it. Is it more clear now?

The temp files are explicitly created by the DownloadAndHash function in repository.go and this is only done when you use the --read-data command, so I don't think it's something that just happens because of the performance of the RPi, but someone with more insight into check will have to comment on whether these temporary files are needed or not.

Perhaps it just creates temporary files, but on faster systems it checks them, deletes them and thats why they are never written to disk. So one notices this only on slow systems. But lets see what people with more knowledge about restic write.

JsBergbau commented 3 years ago

Tested with a PI400 which is capable of checking about 25 MB/s with full CPU load. There also all data is written to disk (first). So it affects not only throttled slow Raspberry PI3.

For just checking data this speed also seems quite low. I mean it just has to decrypt and build a checksum on each block, that shouldn't take so much CPU cycles.

JsBergbau commented 3 years ago

Just a few more tests. On Windows with restic check --read-data nothing is written to OS drive / temp folder. Now another linux machine with Intel(R) Core(TM) i5-8265U, so definitively a powerful machine. OS is booted from a USB thumb drive. So you see about the data that is read is also written to OS drive. grafik

Very interesting: When putting the Temp dir to a 4TB spinning disk, there is almost nothing written to the disk, sdc is a 4 TB spinning disk only used as tempdrive here

grafik

However when inserting another 8 GB USB thumbdrive (sdd) then temp data is written onto it grafik

All disks in this test are USB attached disks.

MichaelEischer commented 3 years ago

As a stopgap measure it would be relatively simple to let the check command keep small pack files (<8MB or so) only in memory. However, that will increase the memory usage a bit and won't work with larger pack file sizes. An alternative would be to have temporary files which are only written to disk when absolutely necessary, but I'm not sure whether such a mechanism exists on all operating systems. A more complex variant would be to stream pack files and only keep a small part of the file around. However, that would require larger changes to the check command as the pack file header is at the end of the file which complicates streaming a lot. It would also be possible to read the pack file header first and then stream the whole pack file again, but that would at least double the number of backend requests necessary.

[Edit]on Linux apparently /dev/shm is always a tmpfs which would prevent unnecessary disk writes. But it is unclear to me whether that would provide enough free space to store the temporary files.[/Edit]

JsBergbau commented 3 years ago

From my point of view it would help a lot if restic would just warn that doing a full check can be very stressful for harddrives and SSDs. Even on raspberry PI Zero W with 512 MB RAM /dev/shm size is 239 MB and not used by default. So for standard packsize it would be sufficient. If larger packsizes will be possible, like in this prospecting pull request https://github.com/restic/restic/pull/2750 than it may not be sufficient.