tasket / wyng-backup

Fast backups for logical volumes & disk images
GNU General Public License v3.0
245 stars 16 forks source link

Alt deduplication mode to handle extremely large archives #180

Open tasket opened 7 months ago

tasket commented 7 months ago

Problem

Currently Wyng's deduplication code is RAM-bound (as are most deduplicators) which puts an effective limit on the size of an archive than can be deduplicated.

Possible solution

  1. detect the large archive condition and available RAM resources
  2. move the lions' share of dedup indexes out of RAM (and out of /tmp)

This would trade-off performance for the ability to perform the dedup.

Alternate solution (workaround)

For un-encrypted archives, users could have jdupes (or similar utility) do a hardlink or reflink dedup on the archive dir. Otherwise, a dedup-capable filesystem like Btrfs or ZFS could be utilized. (These options would not work on encrypted archives unless Wyng started offering a deterministic encryption mode.)