kopia / kopia

Cross-platform backup tool for Windows, macOS & Linux with fast, incremental backups, client-side end-to-end encryption, compression and data deduplication. CLI and GUI included.
https://kopia.io
Apache License 2.0
7.74k stars 389 forks source link

Kopia Cache Directory Exceeds Specified Limits During Snapshot Restore #4140

Open r4rajat opened 3 weeks ago

r4rajat commented 3 weeks ago

Environment:

Description:

When creating and restoring a snapshot containing around 1 million small files ~1.5KB Each, the size of the Kopia cache directory exceeds the specified cache limits (--content-cache-size-limit-mb=500 and --metadata-cache-size-limit-mb=500). Despite setting a hard limit of ~1000 MiB combined, the cache directory grows to more than 1.2 GiB.

Steps to Reproduce:

  1. Create 1 million small files using the following command:
    mkdir data_to_backup
    for ((i=1; i<=1000000; i++)); do
     head -c 1536 /dev/urandom > ./data_to_backup/"file_$i.txt"
    done
  2. Create a Kopia repository using:
    kopia repository create s3    --bucket=$AWS_S3_BUCKET    --access-key=$AWS_ACCESS_KEY_ID    --secret-access-key=$AWS_SECRET_ACCESS_KEY    --region=$AWS_REGION    --cache-directory=./kopia-cache    --content-cache-size-limit-mb=500    --metadata-cache-size-limit-mb=500    --config-file=./kopia.config
  3. Connect to the repository using:
    kopia repository connect s3    --bucket=$AWS_S3_BUCKET    --access-key=$AWS_ACCESS_KEY_ID    --secret-access-key=$AWS_SECRET_ACCESS_KEY    --region=$AWS_REGION    --cache-directory=./kopia-cache    --content-cache-size-limit-mb=500    --metadata-cache-size-limit-mb=500    --config-file=./kopia.config
  4. Take a snapshot of the data:
    kopia --config-file=./kopia.config --log-dir=./kopia-log snapshot create ./data_to_backup
  5. Retrieve the snapshot ID:
    kopia --config-file=./kopia.config --log-dir=./kopia-log snapshot list
  6. Restore the snapshot:
    kopia --config-file=./kopia.config --log-dir=./kopia-log snapshot restore <snapshot-id> ./to_restore_dir

Expected Result:

The size of the Kopia cache directory should remain under the specified hard limit (~1000 MiB).

Actual Result:

The Kopia cache directory size exceeds the specified limit, reaching more than 1.2 GiB.

Additional Information:

Screenshot 2024-09-27 at 2 25 32 PM

As we can see from above Screenshot, Even though we have set hard limits for content cache as 500MiB, it's far exceeding 690MiB and keeps on increasing and reaches more than 1.2 GiB

kaovilai commented 2 weeks ago

cc: @mpryc

r4rajat commented 2 weeks ago

It seems like an issue caused by how in kopia we're calculating the file size using filePointer.Size() which gives us the logical size of the file, not the actual size it is taking on the disk.

For Example, if an Operating System has having Minimum Block Size as 4KB, And we created a file of size 1KB, The logical size is 1KB, but the actual size it's occupying on the disk is 4KB which is nearly 4 times more.

Which is resulting this behaviour where if the size of the file is less than the Operating System's Minimum Block Size (Usually 4KB), than the size of kopia cache gets seems to go out of bound even if we have set hard limits to the cache directory.

Maybe we could use something like

        // Get file size (logical size)
    logicalSize := fileInfo.Size()
    fmt.Printf("Logical Size: %d bytes\n", logicalSize)

    // Get actual disk size (physical size)
    stat := fileInfo.Sys().(*syscall.Stat_t)
    physicalSize := stat.Blocks * 512 // Convert blocks to bytes
    fmt.Printf("Physical Size on Disk: %d bytes\n", physicalSize)