Open hsuominen opened 5 years ago
I think in order to support this there would need to be a --resumable
flag passed into the import
command. And in that process every import command run with --resumable
should store a list of files it has attempted to import.
I believe it requires a new file to store the progress of resumable imports but open to other ideas.
Maybe try to deal with that by changing the workflow and add a couple of features (especially that PR https://github.com/jmathai/elodie/pull/297 and that issue https://github.com/jmathai/elodie/issues/299 ). If you have all your unsorted files in one folder and all the others in another folder (not inside each other) then you can use these features (--move-source) to move source files from unsorted directory into the sorted directory and also clear the duplicates (--delete-duplicates maybe?) as you go. So if Elodie crashes (BTW pay attention to this PR https://github.com/jmathai/elodie/pull/298 cuz it fixes one nasty crash) or other interuption occurs you basically start over with no overhead of previously read files because all the files get moved or deleted on import. You see what I mean @jmathai ? Thats the approach I'm trying to implement.
I've added PR for delete duplicates functionality. https://github.com/jmathai/elodie/pull/301
@hsuominen if you want to test out my approach combined it's here in my fork https://github.com/DZamataev/elodie/tree/feature/move-source-and-separate-media-folders And the full command will be like this:
elodie.py import --debug --delete-duplicates --move-source --destination="G:\LIBRARY\sorted" G:\LIBRARY\unsorted
I'd recommend acting on a sorted list of files. Then on resume the script can check the last file written and find that in the list and resume there. For me it crashed on 90%. Now I'll wait for the second run.
I'm running elodie to backup on the order +150,000 images scattered across a fairly messy file/folder structure. I've had elodie crash a few times and most recently crashed after 2 days after having processed 50,000 files or so.
It seems that restarting the import requires rereading each file to check if it has already been imported, which by my estimate should take about 13h at least (about 1s per file, presumably to calculate the hash). Is there any built in way to speed this up?
It would seem pretty straightforward to add functionality to pick up again where the import left off - I may address this in a PR if I get around to it