laktak / chkbit

Check your files for data corruption
MIT License
112 stars 7 forks source link

question: how to match the same `.chkbitignore` files at different depth levels? #8

Closed spock closed 8 months ago

spock commented 10 months ago

I'm struggling with a line definition in .chkbitignore that would, for example, match all Thumbs.db files independent of where they are in the file tree.

I've tried just the name itself, with a leading asterisk, and even things like */*/Thumbs.db - but Thumbs.db seem to still be added.

What is the right way to do this? fnmatch manual suggests that *Thumbs.db should be correct...

laktak commented 10 months ago

Currently chkbit only uses a very simple mode that allows it to ignore files in the current directory.

To allow what you are asking the index would need a reference to its parent so it could also check those ignore rules.

spock commented 10 months ago

In principle fnmatch should also work as described above, but I'm not sure why it doesn't.

What if .chkbitignore entries are treated as "end of path match"?

Practically speaking,

if filename.endswith(ignore_filename):
    # ignore this file
    continue

(implementation would likely need to rely on re instead [with a combined single pattern], as endswith would require looping through all ignored filenames for each incoming filename)

Another alternative could be to use glob or Pathlib to interpret wildcarded path specifications - which do allow arbitrary directory depth; they both normally iterate the filesystem, but the same principle should be somehow applicable for ignoring files.

laktak commented 10 months ago

What I meant was that chkbit takes the ignore file from the current directory and applies it to the files in the current directory.

To do what you are asking it would also have to use the index files from its parents.

laktak commented 10 months ago

I found this interesting and had some time :)

Please try the new version in the ignlvl branch

Thumbs.db

Will now ignore Thumbs.db in all subdirectories.

Also possible are:

# only in the root
/Thumbs.db
# in the specified directory
some/dir/Thumbs.db

There is also --show-ignored-only for testing.

laktak commented 8 months ago

I didn't hear back from you but IMO this works in the release.