gurnec / HashCheck

HashCheck Shell Extension for Windows with added SHA2, SHA3, and multithreading; originally from code.kliu.org
Other
1.73k stars 193 forks source link

When saving let's say Excel spreadsheets, the hash changes even if the spreadsheet content doesn't change. Why? #52

Open tobefound opened 6 years ago

tobefound commented 6 years ago

I'm thinking this is because HashCheck included document meta data. Is there any way to tell HashCheck to not look for meta data? Or in this case, make sure Excel doesn't store meta data permanently? I've tried the "Remove personal properties" from an Excel file. But as soon as you save a file (although not changing the contents), the hash regenerates.

What to do?

cfbao commented 6 years ago

HashCheck only includes file content, not metadata.

It's Excel's problem that it changes the file content even when no visible content is changed. This can happen, for example, when there's a volatile function in your workbook.

tobefound commented 6 years ago

Well, this happens even on empty spreadsheets. Surely somebody must have found a workaround to this?

/T

On 11 Sep 2018, at 17:52, Chenfeng Bao notifications@github.com wrote:

HashCheck only includes file content, not metadata.

It's Excel's problem that it changes the file content even when no visible content is changed. This can happen, for example, when there's a volatile function in your workbook.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

cfbao commented 6 years ago

This concerns the internal structure of Excel documents. A relevant MS forum would be a more appropriate place to ask. HashCheck is already doing what it's supposed to do. I don't think there's anything else HashCheck can or should do.

tobefound commented 6 years ago

Got it, thx!

/T

On 11 Sep 2018, at 20:31, Chenfeng Bao notifications@github.com wrote:

This concerns the internal structure of Excel documents. A relevant MS forum would be a more appropriate place to ask. HashCheck is already doing what it's supposed to do. I don't think there's anything else HashCheck can or should do.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

LanceUMatthews commented 6 years ago

An Excel workbook keeps metadata for the creation time, last save time, and the last person to save it; this is separate from the similar attributes stored by the filesystem. Even if you have not changed the contents of a workbook, re-saving it will change at least the last save timestamp, which changes the on-disk representation of the file, which changes the resulting hash.

I think you would be better served by an application that can specifically understand and compare Excel workbook files and potentially ignore that embedded metadata vs. HashCheck which, like most any hashing utility, doesn't care about the format of its input files and simply treats them as opaque byte streams.