Resume import functionality

jmathai / elodie

An EXIF-based photo assistant, organizer and workflow automation tool.

https://bit.ly/introducing-elodie

Apache License 2.0

1.27k stars 139 forks source link

Resume import functionality #292

Open hsuominen opened 5 years ago

hsuominen commented 5 years ago

I'm running elodie to backup on the order +150,000 images scattered across a fairly messy file/folder structure. I've had elodie crash a few times and most recently crashed after 2 days after having processed 50,000 files or so.

It seems that restarting the import requires rereading each file to check if it has already been imported, which by my estimate should take about 13h at least (about 1s per file, presumably to calculate the hash). Is there any built in way to speed this up?

It would seem pretty straightforward to add functionality to pick up again where the import left off - I may address this in a PR if I get around to it

jmathai commented 5 years ago

I think in order to support this there would need to be a --resumable flag passed into the import command. And in that process every import command run with --resumable should store a list of files it has attempted to import.

I believe it requires a new file to store the progress of resumable imports but open to other ideas.

DZamataev commented 5 years ago

Maybe try to deal with that by changing the workflow and add a couple of features (especially that PR https://github.com/jmathai/elodie/pull/297 and that issue https://github.com/jmathai/elodie/issues/299 ). If you have all your unsorted files in one folder and all the others in another folder (not inside each other) then you can use these features (--move-source) to move source files from unsorted directory into the sorted directory and also clear the duplicates (--delete-duplicates maybe?) as you go. So if Elodie crashes (BTW pay attention to this PR https://github.com/jmathai/elodie/pull/298 cuz it fixes one nasty crash) or other interuption occurs you basically start over with no overhead of previously read files because all the files get moved or deleted on import. You see what I mean @jmathai ? Thats the approach I'm trying to implement.

DZamataev commented 5 years ago

I've added PR for delete duplicates functionality. https://github.com/jmathai/elodie/pull/301

DZamataev commented 5 years ago

@hsuominen if you want to test out my approach combined it's here in my fork https://github.com/DZamataev/elodie/tree/feature/move-source-and-separate-media-folders And the full command will be like this:

elodie.py import --debug --delete-duplicates --move-source --destination="G:\LIBRARY\sorted" G:\LIBRARY\unsorted

Jogai commented 4 years ago

I'd recommend acting on a sorted list of files. Then on resume the script can check the last file written and find that in the list and resume there. For me it crashed on 90%. Now I'll wait for the second run.