The SnowHash algorithm is designed to avoid any collisions in vastly different domains, while allowing collisions in similar-sounding ones.
For example, these produce the same output, 879:
tech.lgbt
tegh.lgbt
tech.lcbt
tegh.lcbt
degh.lgbt
degh.lcbt
dech.lcbt
etc...
However, introducing just a little more change, in this case by changing the e for example - dfch.lcbt - causes a giant jump to 1007.
Any two used domains you can think of will have a 99% chance of being different.
I also find it very important to mention that SnowHash is not the only thing planned to make sure hashes don't collide. Mastodon for example uses the entire post URL as IDs in federation, and the snowflake is only valid instance-wide. I plan on doing this differently by having the plaintext domain only included when the ID is colliding.
The SnowHash algorithm is designed to avoid any collisions in vastly different domains, while allowing collisions in similar-sounding ones.
For example, these produce the same output, 879:
tech.lgbt
tegh.lgbt
tech.lcbt
tegh.lcbt
degh.lgbt
degh.lcbt
dech.lcbt
etc...However, introducing just a little more change, in this case by changing the e for example -
dfch.lcbt
- causes a giant jump to 1007.Any two used domains you can think of will have a 99% chance of being different.
I also find it very important to mention that SnowHash is not the only thing planned to make sure hashes don't collide. Mastodon for example uses the entire post URL as IDs in federation, and the snowflake is only valid instance-wide. I plan on doing this differently by having the plaintext domain only included when the ID is colliding.