Closed smcgivern closed 9 years ago
but if I do that manually and remove the entries from the journal, is that a safe operation?
yes. you can extract files-to-delete to new journal and issue purge-vault
with that new journal.
then you can wait 24h, retrieve-inventory
+download-inventory
https://github.com/vsespb/mt-aws-glacier#restoring-journal
I am not sure why you have duplicates - some make sure those are real duplicates on server side - i.e. they have different archive id, and that you keep at least one archive for each filename.
my vault has a lot of duplicate files
Possible ways to run into this:
1) Use always-positive
2) Drop your journal file, then start backup to new journal. Then download-inventory
. Repeat 7 times.
If you could remember in details how you worked with journal I can try to investigate and find why this happened?
These definitely appear to be duplicates on the server side, I retrieved a new journal file to check - and the vault size is bigger than expected (although not so big that it's really expensive, just annoying).
I'm sorry that I don't have more details on how this happened - I only noticed once I got my bill. My best guess at the moment is that I set up cron
wrongly, and I was running multiple mtglacier
instances against the same journal at once. The duplicate files are close to each other in the journal file, which might indicate that (I don't know if mtglacier
locks the journal file).
Thanks so much for the quick response, let me know if there's anything else you want me to do to help debug this.
$ grep -n 'Björk/Volta/Björk-02-Wanderlust.flac' Music.journal
10007:B 1410599059 CREATED 0b7GVI-Lxjp_jVyYiMizZZb084aLN-uktjP6IbmG7iLlJxd289C6CHfKWwEP8IF_TDFbHI7KWZPr24paLrKPTzAIH8qYzAUcpDTjanSNBAxhjfcNbst4zMsPPm2edP3i5AODZibJBQ 43807793 1214763861 40206cb83dfdf929900013b839d69c2618b68fa6c47716bd65ec28b6176fcf8e Björk/Volta/Björk-02-Wanderlust.flac
10016:B 1410599806 CREATED QkwdbLDjEeaS8D32x1wk0WIBcXfuSKd6kizW7rNEuAzu85f-eNO52XXQqS7i98RRNB8sDLRoLEnFmwpZ6d9NnYKR-JyfbxciUYbpcW-HKBLrdAtMtrPUtNNCoHJdjpq11L8s__ONNQ 43807793 1214763861 40206cb83dfdf929900013b839d69c2618b68fa6c47716bd65ec28b6176fcf8e Björk/Volta/Björk-02-Wanderlust.flac
10018:B 1410600114 CREATED o3see3QYM4ikqWFEpgZyRT7T0OYAemlk2sxJvCIn7fJtxxGkNwNivg_G30_m4WXF1i81vZoNJzV0uKb_m1INOC42jQM-pfJ1lx17tdMpNol5b6qbqIRjGdiHX5U-O-h1zYA4CyGpdg 43807793 1214763861 40206cb83dfdf929900013b839d69c2618b68fa6c47716bd65ec28b6176fcf8e Björk/Volta/Björk-02-Wanderlust.flac
10023:B 1410600344 CREATED ir7_cl3xLY3NhkLbCx6Ei69lg28kqj6cQeDcrTwQQnqKz-1i2n1aDJ78H8rCnm08g_fDiSD4kjxiL7GfT8sh5RKKt20t9-bopY1g4lMDHn_NrXCtVF6PbK15-hrJi6DtngeafNHUsA 43807793 1214763861 40206cb83dfdf929900013b839d69c2618b68fa6c47716bd65ec28b6176fcf8e Björk/Volta/Björk-02-Wanderlust.flac
10033:B 1410601229 CREATED PVYLZscYQi_5mbPH-xDNklbXzXLHDIG3-SeONzUcGJioychyWWhoidvScZUtVTi3t0cuE-Q79TjQXGjM39TqCjvvt66iXl4ohdeEOQkdk8g3m4Q1KvYiwwGZ8aMCtTflX_MppnokoA 43807793 1214763861 40206cb83dfdf929900013b839d69c2618b68fa6c47716bd65ec28b6176fcf8e Björk/Volta/Björk-02-Wanderlust.flac
10045:B 1410602363 CREATED Ckp7h9pP3oyM-cZLGfdXGyQ9-wY6U0qnDCDhXx0C2nwpvFEsY8vXFJjaLm3XyugENdfkuYWVj1w95Hab3LIKCBoOYK3jKk4ALekYMgS7z5Ez1WcmZpF6hlU0esD1nDFPtKsL67HZ8A 43807793 1214763861 40206cb83dfdf929900013b839d69c2618b68fa6c47716bd65ec28b6176fcf8e Björk/Volta/Björk-02-Wanderlust.flac
10073:B 1410604494 CREATED NbrmfrGWwPkV_lS9_nFPbuzahskcvmoTvkoTGWgAq_K1pqu88SignldoWR-4hEj_m0L1LKJpWJNJjE0i3MkXVZzjR_tNPfS2-9LWa-Zv6gsJL3FLTdI7uHgJj5n2H6RYAAhOzXajXA 43807793 1214763861 40206cb83dfdf929900013b839d69c2618b68fa6c47716bd65ec28b6176fcf8e Björk/Volta/Björk-02-Wanderlust.flac
I was running multiple mtglacier instances against the same journal at once
yes, that could be a reason.
I don't know if mtglacier locks the journal file
no :( I opened issue #96 - enhancement. meantime you could use flock
if run in situation when yo're not sure if concurrent processes run against same journal ( I use flock
for my own backups using mtglacier
)
The duplicate files are close to each other in the journal file
1410604494 - 1410599059 is 90 minutes range. That could be true if whole backup process is longer that 90 minutes and you've started 7 mtglaciers
at a time.
let me know if there's anything else you want me to do to help debug this
probably no, concurrent access to journal explains this.
Thanks! I'll be more careful in future :smiley:
I'm not sure how, but my vault has a lot of duplicate files (not all files have been duplicated, and not all have been duplicated the same amount of times):
So all seven of these files in the archive have the same size, mtime, treehash, and filename. I don't see a way to delete files by archive ID, but if I do that manually and remove the entries from the journal, is that a safe operation?
(I won't be deleting the files for a little while as it's still < 3 months since I put them there.)