to-mc / checksumdir

Simple package to compute a single deterministic hash of the file contents of a directory.
MIT License
94 stars 28 forks source link

Fix 16: new option to include filename in the hash - consistent across operating systems #17

Closed gpiccinni closed 4 years ago

gpiccinni commented 4 years ago

Issue #16

Description of changes: When specified the option --include-paths file name (including path relative to the current dir) are included in the hash.

This change allows to get a different hash when a file inside a folder is renamed.

Resulting hash is consistent across operating systems (windows, unix) and with either absolute or relative paths. For each file the path is:

  1. Normalized to a relative path
  2. split into each path component
  3. joined without separator (cross operating system compatibility)
  4. included into the hashvalues array

Example of python testing (works on windows and unix with the same hash):

import os
import checksumdir

path='test'
rel_path='./../checksumdir/../checksumdir/test'
abs_path=os.path.abspath(path)

print 'Classic checksum'
print(checksumdir.dirhash(path))
print(checksumdir.dirhash(rel_path))
print(checksumdir.dirhash(abs_path))

print 'Checksum with file paths'
print(checksumdir.dirhash(path,include_paths=True))
print(checksumdir.dirhash(rel_path,include_paths=True))
print(checksumdir.dirhash(abs_path,include_paths=True))

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.