skvadrik / re2c

Lexer generator for C, C++, Go and Rust.
https://re2c.org
Other
1.06k stars 169 forks source link

Use full hashes in changelog #452

Open SuperSandro2000 opened 1 year ago

SuperSandro2000 commented 1 year ago

The latest release notes on https://re2c.org/releases/changelog/changelog.html#id1 contains a lot of short hashes. This is problematic because github shared commit hashes between forks and a fork could trivially brute force those hashes and those break the links. If GitHub cannot identify to which commit a short hash belongs it just returns 404.

For example https://github.com/nixos/nixpkgs/commit/27250f7 works but https://github.com/nixos/nixpkgs/commit/27250f already does not

skvadrik commented 1 year ago

not on

what do you mean, is this a typo?

skvadrik commented 1 year ago

I didn't use the full hashes because they clutter the changelog too much. On the website it is possible to make them look short but still use the full hash in the link, so that's not a problem.

trofi commented 1 year ago

Linux kernel did a bit of back-fo-the-napkin math for reasonable abbreviation for reasonable sizes of the repos in 2010: https://lkml.org/lkml/2010/10/28/287

SuperSandro2000 commented 1 year ago

not on

*notes on

I didn't use the full hashes because they clutter the changelog too much. On the website it is possible to make them look short but still use the full hash in the link, so that's not a problem.

that would totally work.

Just as an example how easy it is to generate commits with certain prefixes https://github.com/NixOS/nixpkgs/commit/222222bedb944bf20a678b76e84f512e8e45150a or https://github.com/NixOS/nixpkgs/commit/ddddcff73ede27d3fe7c21b7157447e4eaa5cabd

skvadrik commented 1 year ago

Just as an example how easy it is to generate commits with certain prefixes

I like the idea of 12-digit prefixes as the kernel does (suggested by @trofi above). Note that re2c is smaller than the kernel and Torvalds' script finds zero buckets at 9 digits not 11: git rev-list --objects --all | cut -c1-9 | sort | uniq -dc finds nothing.

By "easy to generate", do you mean that some evil person could fork re2c and deliberately add commits until they have the desired duplicate checksum? That seems like a tedious and useless process, given that the outcome is a mere 404 page on GitHub.

I think it makes sense to have 12-digit prefixes to keep the text version of CHANGELOG more readable, which seems more important than guarding against the unlikely possibility that someone will generate a duplicate prefix.