yomorun / hashids-java

Hashids algorithm v1.0.0 implementation in Java
http://hashids.org
MIT License
1.02k stars 156 forks source link

Hashids doesn't respect alphabet #41

Closed nusco closed 7 years ago

nusco commented 7 years ago

Try this (it takes a while):

final String ALPHABET = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
final String SALT = "Encode all the things";
final Hashids hashids = new Hashids(SALT, 0, ALPHABET);

final Set<Character> codesCharacters = new HashSet<>();
for (int i = 0; i <= 10_000_000; i += 1) {
  String code = hashids.encode(i);
  for (char c : code.toCharArray()) {
    codesCharacters.add(c);
  }
}
final LinkedList<Character> list = new LinkedList<>(codesCharacters);
Collections.sort(list);
System.out.println(list);

Output:

[A, B, D, E, J, K, L, M, N, P, Q, R, V, W, X, Y, Z]

Why do 9 characters out of 26 never appear? That looks like a bug.

It might also be a side effect of the algorithm, and these characters might finally appear when you encode extremely large numbers. But doesn't that behavior result in hashes that are generally longer than they could be? I would expect that the algorithm tries to make good use of the entire alphabet.

For the record, this is not a nice-to-have for us - it's a necessity. We'd like to use Hashids to generate user account codes for a system with potentially hundreds of thousands of users. Users will punch these codes into kiosks, read them to operator, etc. So the candidate alphabet has to be small (no confusing letters like "1", "l", "0" and "O", no lowercase letters which make it harder to spell the code...), and at the same time we need to keep codes short to make the system more user-friendly.

nusco commented 7 years ago

I took a look at the code and then the main project page, and I found the answer to my question: separators. 😊

I understand the use case for using these characters as separators. It's a pity that you cannot change them though. (For example, our system is targeted at the German public, so they have different swear words. 😄 I'll close this issue as it's clear that the current behavior is the intended one, but if there is any plan to make separators configurable, that'd be awesome. For now, we'll have to look elsewhere, which is a pity.

arcticicestudio commented 7 years ago

I've implemented Hashids for one of my private modularized libraries and partially released as public standalone libraries like IceCore Hashids which is also listed as a further Java implementation on the official Hashids website. It is designed to be more OOP aware, using a Hashid class (persistent-friendly) to store encoded numbers and the resulting character sequence and providing a Builder (Hashids.Builder) class including a method to use custom separators.

Please understand me correctly, I do not want to talk badly about this implementation which tries to stay as close as possible to the original JavaScript implementation regarding the features and coding style. I only want to point out that there is another implementation that may fit your needs before you try to find another hash algorithm.

nusco commented 7 years ago

Thank you, @arcticicestudio. I'll give it a look. We rolled out a simple in-house solution in the meantime, but we'll still need a library such as Hashids once we reach more than a few thousands of codes in production.