thoughtworks / talisman

Using a pre-commit hook, Talisman validates the outgoing changeset for things that look suspicious — such as tokens, passwords, and private keys.
https://thoughtworks.github.io/talisman/
MIT License
1.9k stars 243 forks source link

How to detect divering file chesums before a commit #412

Open weilbith opened 1 year ago

weilbith commented 1 year ago

Hey 👋🏾

In our project we have the issue that it quite cumbersome that diverging file checksums are detected very late. only before pushing the commits. We use the pre-commit hook as well as the pre-push hook. It would be great if we could always update the .talismanrc with the latest checksum values within the commit that changes that file to that hash.

From what I have read in the documentation, your code and some manual experiments, a pre-commit check of Talisman only verifies that the staged diff for this commit does not include any secrets. If that is not the case Talisman runs through just fine. But if the file is listed as an exception with a specific cheksum in the .talismanrc file it does not report anything about the changed cheksum. But a few commits later, just before pushing, Talisman will tell us. From the code I read, the pre-push check of Talisman actually collects the list of all changed files within all commits that get pushed. It now looks at the whole files and not only the diff. Here it now also calculates the new checksum and compares it with the configured hash.

I'm trying to figure out how it could be possible to detect checksum changes earlier, before the commit happens. The pre-push check expects the list of commit hashes. For a pre-commit the commit simply does not exist yet. It would be possible to run a post-commit hook that runs the Talisman and feeds it with the just made commit. But Git does not provide the new commit hash at this stage. Furthermore it is quite interesting to look into the scenario that developers only stage hunks of files for a commit. This means to calculate the correct checksum, it is necessary to take the file content as staged. A simple bash script like this could be used for this: git diff --cached --name-only | xargs --replace='$file' git show ':$file'. But I fail to marry this with Talisman.

Any insights or help how to proceed on this here? Doing an interactive rebase every time a checksum changed and there is not only a single commit to push is very tedious and cumbersome.

Thank you very much for any help in advance! 🙏🏾