laktak / chkbit

Check your files for data corruption
MIT License
115 stars 7 forks source link
backup bitrot-detection cloud-backup data-degradation data-integrity disk-check storage-media

chkbit

chkbit is a tool that ensures the safety of your files by checking if their data integrity remains intact over time, especially during transfers and backups. It helps detect issues like disk damage, filesystem errors, and malware interference.

Some filesystems (like Btrfs and ZFS, but not APFS or NTFS) already protect your files with checksums. However when you move files between locations, separate checks have the advantage of confirming that the data was not modified during transit. So you know the photo on your disk is the same as the copy in your cloud backup. This also protects you from overwriting good data with bad copies.

gif of chkbit

How it works

Remember to always maintain multiple backups for comprehensive data protection.

Installation

Binary releases

You can download the official chkbit binaries from the releases page and place it in your PATH.

Prereleased versions can be found directly on the GitHub Action. Click on the latest ci action and look for prerelease-artifacts at the bottom.

Homebrew (macOS and Linux)

For macOS and Linux it can also be installed via Homebrew:

brew install chkbit

Build from Source

Building from the source requires Go.

go install github.com/laktak/chkbit/v5/cmd/chkbit@latest
git clone https://github.com/laktak/chkbit
chkbit/scripts/build
# binary:
ls -l chkbit/chkbit

Usage

Run chkbit -u PATH to create/update the chkbit index.

chkbit will

Run chkbit PATH to verify only.

Usage: chkbit [<paths> ...] [flags]

Ensures the safety of your files by verifying that their data integrity remains
intact over time, especially during transfers and backups.

    For help tips run "chkbit -H" or go to
    https://github.com/laktak/chkbit

Arguments:
  [<paths> ...]    directories to check

Flags:
  -h, --help                    Show context-sensitive help.
  -H, --tips                    Show tips.
  -m, --[no-]show-missing       show missing files/directories
  -d, --[no-]include-dot        include dot files
  -S, --[no-]skip-symlinks      do not follow symlinks
  -R, --[no-]no-recurse         do not recurse into subdirectories
  -D, --[no-]no-dir-in-index    do not track directories in the index
      --force                   force update of damaged items (advanced usage
                                only)
  -l, --log-file=STRING         write to a logfile if specified
      --[no-]log-verbose        verbose logging
      --algo="blake3"           hash algorithm: md5, sha512, blake3 (default:
                                blake3)
      --index-name=".chkbit"    filename where chkbit stores its hashes,
                                needs to start with '.' (default: .chkbit)
      --ignore-name=".chkbitignore"
                                filename that chkbit reads its ignore list from,
                                needs to start with '.' (default: .chkbitignore)
  -w, --workers=5               number of workers to use (default: 5)
      --[no-]plain              show plain status instead of being fancy
  -q, --[no-]quiet              quiet, don't show progress/information
  -v, --[no-]verbose            verbose output
  -V, --version                 show version information

mode
  -c, --check                check mode: chkbit will verify files in readonly
                             mode (default mode)
  -u, --update               update mode: add and update indices
  -a, --add-only             add mode: only add new and modified files, do not
                             check existing (quicker)
  -i, --show-ignored-only    show-ignored mode: only show ignored files
$ chkbit -H

.chkbitignore rules:
- each line should contain exactly one name
- you may use Unix shell-style wildcards
  - *       matches everything except /
  - ?       matches any single character except /
  - [seq]   matches any character/range in seq
  - [^seq]  matches any character/range not in seq
  - \\      escape to match the following character
- lines starting with '#' are skipped
- lines starting with '/' are only applied to the current directory

Status codes:
  DMG: error, data damage detected
  EIX: error, index damaged
  old: warning, file replaced by an older version
  new: new file
  upd: file updated
  ok : check ok
  del: file/directory removed
  ign: ignored (see .chkbitignore)
  EXC: exception/panic

Configuration file (json):
- location /home/spark/.config/chkbit/config.json
- key names are the option names with '-' replaced by '_'
- for example --include-dot is written as:
  { "include_dot": true }

chkbit is set to use only 5 workers by default so it will not slow your system to a crawl. You can specify a higher number to make it a lot faster if the IO throughput can also keep up.

Repair

chkbit is designed to detect "damage". To repair your files you need to think ahead:

Ignore files

Add a .chkbitignore file containing the names of the files/directories you wish to ignore

chkbit as a Go module

chkbit is can also be used in other Go programs.

go get github.com/laktak/chkbit/v5

For more information see the documentation on pkg.go.dev.

FAQ

Should I run chkbit on my whole drive?

You would typically run it only on content that you keep for a long time (e.g. your pictures, music, videos).

.chkbit files vs .chkbitdb database

Note: a .chkbitdb database approach is being worked on in #22 if you want to help with testing.

The advantage of the .chkbit files is that

The disadvantage is obviously that you get hidden .chkbit files in your content folders.

How does chkbit work?

chkbit operates on files.

When run for the first time it records a hash of the file contents as well as the file modification time.

When you run it again it first checks the modification time,

I wish to use a different hash algorithm

chkbit now uses blake3 by default. You can also specify --algo sha512 or --algo md5.

Note that existing index files will use the hash that they were created with. If you wish to update all hashes you need to delete your existing indexes first. A conversion mode may be added later (PR welcome).

How can I delete the index files?

List them with

find . -name .chkbit

and add -delete to delete.

Can I test if chkbit is working correctly?

On Linux/macOS you can try:

Create test and set the modified time:

$ echo foo1 > test; touch -t 201501010000 test
$ chkbit -u .
new ./test

Processed 1 file.
- 0:00:00 elapsed
- 192.31 files/second
- 0.00 MB/second
- 1 directory was updated
- 1 file hash was added
- 0 file hashes were updated

new indicates a new file was added.

Now update test with a new modified:

$ echo foo2 > test; touch -t 201501010001 test # update test & modified
$ chkbit -u .
upd ./test

Processed 1 file.
- 0:00:00 elapsed
- 191.61 files/second
- 0.00 MB/second
- 1 directory was updated
- 0 file hashes were added
- 1 file hash was updated

upd indicates the file was updated.

Now update test with the same modified to simulate damage:

$ echo foo3 > test; touch -t 201501010001 test
$ chkbit -u .
DMG ./test

Processed 1 file.
- 0:00:00 elapsed
- 173.93 files/second
- 0.00 MB/second
chkbit detected damage in these files:
./test
error: detected 1 file with damage!

DMG indicates damage.