vsespb / mt-aws-glacier

Perl Multithreaded Multipart sync to Amazon Glacier
http://mt-aws.com/
GNU General Public License v3.0
536 stars 57 forks source link

purge-vault filtering doesn't work as expected #106

Open rayrapetyan opened 9 years ago

rayrapetyan commented 9 years ago

Supposedly I'm doing something wrong, but at least things are not working as expected... I've uploaded few individual files into the vault like that:

./glacier/mt-aws/mtglacier upload-file --config ./glacier/mt-aws/glacier.cfg --vault qqq --journal ./glacier/mt-aws/qqq.journal --partsize 256 --filename ./foo.txt --set-rel-filename foo.txt

./glacier/mt-aws/mtglacier upload-file --config ./glacier/mt-aws/glacier.cfg --vault qqq --journal ./glacier/mt-aws/qqq.journal --partsize 256 --filename ./test.txt --set-rel-filename test.txt

Now I want to delete 'test.txt' from the vault. The problem is: whatever I put into "--filter" - it always results in deleting ALL files in the vault:

./glacier/mt-aws/mtglacier purge-vault --config ./glacier/mt-aws/glacier.cfg --vault qqq --journal ./glacier/mt-aws/qqq.journal --filter '+test.txt'

PID 57899 Started worker PID 57900 Started worker PID 57901 Started worker PID 57902 Started worker PID 57901 Deleted test.txt archive_id [mmAc8t54R3GRQOe5L5nxCFmZlVqjdVsPVGMjua63zgn0siJqrRZ1YDZB1GGxYCskLaMDsdwc5E6fswDQ-XBUZaUGp7eqpfw7jJOpQaTn2yvDU-zCo2IPilr0Ow180t9PnfMnGdC4pA] PID 57899 Deleted foo.txt archive_id [s_bgL356OdJJgCQ8b2Dfsrqiu09RLpeLfxtTaqIaSbrm-mZBFhkZFRit2OiO5oVmRz6d7gcRIjwyLUVUQi5AsvWWFp93BGNHdA_ShyKsyR9AeawzJBm9ySSM5iEt-8PnDkQx-71Ewg] OK DONE

I've already lost one of my vaults completely lol, and still have no idea how to delete a single file...

EQXTFL commented 9 years ago

I use the following options, after removing the file locally and from the journal: sync --delete-removed --config=glacier.conf --vault=Vault --journal=journal.log --dir=/path/to/dir

vsespb commented 9 years ago

Hello. See docs: 2) If no rules matched - file is included (default rule is INCLUDE rule).

vsespb commented 9 years ago

I've already lost one of my vaults completely lol

That's sad. But when making backups opposite can happen, one can include and exclude several files, and default rule will work for others. If default rule would be EXCLUDE, other files won't be backed up. So at some point i've choosen to make default rule INCLUDE.

vsespb commented 9 years ago

also, see dry-run - good for testing.

rayrapetyan commented 9 years ago

Although tool is primarily designed for syncing local dir with a vault, I'm always using it for uploading individual archives from different dirs so "sync" will not work for me...

IMHO INCLUDE rule is much more dangerous then EXCLUDE, user can fix "files not backed up" issue any moment, while "all files in vault has been deleted" issue is unrecoverable.

I think base form of "delete" operation must be as simple as with local file-system, something like delete-file filename_pattern, without +/- and other stuff..

And btw - how to delete a single file by name, if default rule is INCLUDE everything?

vsespb commented 9 years ago

And btw - how to delete a single file by name, if default rule is INCLUDE everything?

9) if PATTERN is empty, it matches anything.

3. PATTERN can be empty (Example:--filter +data/ --filter -- excludes everything except any directory with namedata, last pattern is empty)

In your case --filter '+test.txt -'

I think base form of "delete" operation must be as simple as with local file-system, something like delete-file filename_pattern, without +/- and other stuff..

Yes, there should be commands to work with single file, upload-file delete-file etc. Currently not implemented. Things like sync designed to work with multiple files.

IMHO INCLUDE rule is much more dangerous then EXCLUDE, user can fix "files not backed up" issue any moment, while "all files in vault has been deleted" issue is unrecoverable

That's questionable. One can say that usually tool is used for backup, and operation is automated. And deletion usually not automated, but performed manually (well, except when rotation implemented, then deletion is automated too).

Then "all files in vault has been deleted" can be easy fixed by reuploading files. Because that's just a backup, not original copy.

And missing files when doing automated backup is usually unnoticable thus more danger.

Anyway, when I designed this I analyzed several tools which work with group of files and its filtering options and come to conclusion that default INCLUDE is better, maybe I don't remember now all details which lead to this decision,

For example duplicity/rsync default is INCLUDE too.

http://duplicity.nongnu.org/duplicity.1.html

Each file selection condition either matches or doesn’t match a given file. A given file is excluded by the file selection system exactly when the first matching file selection condition specifies that the file be excluded; otherwise the file is included. 

http://linux.die.net/man/1/rsync

As the list of files/directories to transfer is built, rsync checks each name to be transferred against the list of include/exclude patterns in turn, and the first matching pattern is acted on: if it is an exclude pattern, then that file is skipped; if it is an include pattern then that filename is not skipped; if no matching pattern is found, then the filename is not skipped. 

Also note that I will be unable to change this without breaking backward compatibility.

rayrapetyan commented 9 years ago

Agree with everything, except how it is covered in docs:

2) If no rules matched - file is included (default rule is INCLUDE rule). (this line is OK, although I would put it at the top of "--filter" option description section).

--filter '+*.jpeg' File file.txt is INCLUDED, as it does not match any rules

That's what lead me to make a mistake. Formally the statement is right, when you also know however that expression --filter '+*.jpeg' has no any effect at all. Instead it looks like someone want to do something with jpeg files...

I would replace this part in docs with group of examples (something like that):

--filter '+' All files will be deleted (default case as when "--filter" param is not specified at all) --filter '-*.jpeg' All files except *.jpeg will be deleted --filter '+test.txt -' Delete only test.txt file

UPD: just realized that "--filter" part in docs relate to all commands, not just deletion... Anyway, I think it's worth to create "Delete files" section and put details there.