tommyblue / smugmug-backup

Makes a full backup of a SmugMug account
MIT License
88 stars 16 forks source link

duplicate filenames in album not handled #9

Closed dsporter closed 4 years ago

dsporter commented 4 years ago

A complete backup is not possible if an album has images with duplicate filenames (e.g. two distinct items named DSC_0001.jpg). Filenames in the backup should probably always include ImageKey to avoid this problem.

tommyblue commented 4 years ago

Hey, thanks for reporting. I never noticed as I always use unique names in albums. I'm wondering what could be the best way to handle the issue. Always adding the ImageKey is something I'd like to avoid so that the dumped files have the same names as the original files. A mixed approach could be to only add the key to the duplicated filenames. What do you think?

russross commented 4 years ago

I agree that keeping the original filename is a good default. When detecting a duplicate, I think adding a unique ID to the filename (something like "originalname-imagekey.ext") for both/all duplicate names would be the most consistent (as opposed to only mangling the 2nd file detected).

dsporter commented 4 years ago

Adding the key only for duplicate names would work, but is thornier to implement. Edge cases such as transition from one to duplicate and back are annoying to handle.

Another issue that occurred to me is that simply adding ImageKey won't be sufficient for correctness on case insensitive filesystems.

tommyblue commented 4 years ago

Adding the key only for duplicate names would work, but is thornier to implement. Edge cases such as transition from one to duplicate and back are annoying to handle.

yes, I didn't think to the option that a duplicated file is added later, that would be tricky to handle to avoid file duplication. Adding the key to all files is probably less prone to errors

Another issue that occurred to me is that simply adding ImageKey won't be sufficient for correctness on case insensitive filesystems.

do you mean that both <Filename>+<ImageKey> are not enough on case insensitive fs? even if possible, it means that you have a collision on both of them (regardless of the case), I'd have thought to it as a quite-impossible occurrence 😅 On the image json I see ArchivedMD5 or UploadKey that could be used in addition to the other fields, in particular ArchivedMD5 seems interesting: quite unique and lower-case

tommyblue commented 4 years ago

I'm thinking about adding a configuration file so I can fix the issue with the names and rid of the environment variables (they'll still be used if the configuration values are missing). In the configuration file I'd like to add a template for the filenames. It will be something like:

filenames = "{{.ImageKey}}-{{.Filename}}" // or any other field from the image struct

If the configuration is missing, the current behaviour is maintained. In this way it's possible to fix the object of this issue without introducing breaking changes. In fact if the fix would be to change the default file naming (filename+imageKey), that would lead to a lot of file duplication. In my case, where I use a case-sensitive filesystem and no duplicated filenames in smugmug, that would totally duplicate the files (old names with only the filename and new names with filename+imageKey)

tommyblue commented 4 years ago

Fixed with https://github.com/tommyblue/smugmug-backup/pull/11 and https://github.com/tommyblue/smugmug-backup/pull/10 (that introduces configuration file). I've updated the README file with instructions