mandarons / icloud-drive-docker

Dockerized iCloud Client - make a local copy of your iCloud documents and photos, and keep it automatically up-to-date.
BSD 3-Clause "New" or "Revised" License
962 stars 46 forks source link

[FEATURE] Save (lots of) storage space by using hard links for duplicate photos and videos when running full backup #176

Open d-EScape opened 7 months ago

d-EScape commented 7 months ago

I just discovered iCloud-drive-docker and first impressions are great. Thank you for your work and sharing this project.

Use case As a user with lots of photos and videos organized in many albums on iCloud, I want to make a full (structured) backup so that I can always restore my files in case of a iCloud disaster.

What is the problem? iCloud-drive-docker seems to do exactly what I want with the configuration "all_albums: true", BUT... as described in the config file it will store duplicates for the same file in different albums. There will always be at least one duplicate because of the "All Photos" album. Videos are even worse. These large files are duplicated in the album(s) folder AND Videos folder AND All Photos folder. Gopro files even have there own GoPro album by default, so that adds a another duplicate. The used file storage space is adding up quickly.

Describe the solution you'd like First sync the "All Photo's" album and upon syncing other albums create a hard link to the already existing file in "All Photo's" instead of copying the entire file again.

Considerations: If All Photos is synced first it should not be necessary to check every other album on the (target) filesystem for existing duplicates. So having "All Photos" synced first becomes a requirement to make this deduplication as simple as possible. Hardlinks behave just like the original file and the physical file will remain intact until the last link to it is removed, so this should be a safe approach. Even if someone would manually remove files from the "All Photos" or other folders, the files will still be accessibel through the hard links.

So why do I look at iCloud-drive-docker for a backup use case? I can't do a client side backup using the apple software, because I have set al my apple photo clients to "optimize storage", so the original versions are not always available on every client. There are more photo's in iCloud than would fit on the local ssd.

mandarons commented 7 months ago

This is a good one. Thanks for submitting. 👍🏼

d-EScape commented 6 months ago

I created a little proof of concept by editing the download_photo function (see below). Without this modification my (test) iCloud photo's backup took 79GB. It is now 42GB with exactly the same photo library!

def download_photo(photo, file_size, destination_path):
    """Download photo from server."""
    ALLPATH="/app/icloud/photos/All Photos/"
    if not (photo and file_size and destination_path):
        return False
    LOGGER.info(f"Downloading {destination_path} ...")
    existing_path=ALLPATH + '/' + destination_path.split("/")[-1]
    LOGGER.info(f"Check if exists {existing_path}")
    if photo_exists(photo, file_size, existing_path):
        LOGGER.info(f"Existing photo. Try and link {destination_path} to {existing_path}")
        try:
            os.link(existing_path, destination_path)
        except Exception as e:
            LOGGER.error(f"Failed to link {destination_path} to {existing_path}: {str(e)}")
            return False
    else:
        try:
            download = photo.download(file_size)
            with open(destination_path, "wb") as file_out:
                shutil.copyfileobj(download.raw, file_out)
            local_modified_time = time.mktime(photo.added_date.timetuple())
            os.utime(destination_path, (local_modified_time, local_modified_time))
        except (exceptions.ICloudPyAPIResponseException, FileNotFoundError, Exception) as e:
            LOGGER.error(f"Failed to download {destination_path}: {str(e)}")
            return False
    return True
mandarons commented 6 months ago

Please feel free to submit a PR, if possible.

d-EScape commented 6 months ago

It’s just a proof of concept and far from ready for a PR. The all photos path is hardcoded and I’m getting the photo filename by splitting the destination path. I haven’t really figured out how you are generating and sharing this kind of variables. I was hoping you could use this in a future release.