rurban / smhasher

Hash function quality and speed tests
https://rurban.github.io/smhasher/
Other
1.8k stars 174 forks source link

null test #67

Open wangyi-fudan opened 5 years ago

wangyi-fudan commented 5 years ago
bool NullTest ( struct HashInfo *info ){
  pfHash hash = info->hash;
  unsigned long long h0,    h1; 
  hash(NULL,    0,  0,  &h0);   
  hash(NULL,    0,  1,  &h1);
  printf("nullkey_seed0\t%llu\nnullkey_seed1\t%llu\n%s\n",  h0, h1, 
           h0==h1?"FAIL":(h0==0||h1==1?"Imperfect":"PASS"));
  return    h0!=h1;
}  

clhash FAIL

fasthash64 Imperfect

City64 Imperfect

MUM Imperfect

xxh3 Imperfect

t1ha2_atonce Imperfect

rurban commented 5 years ago

Thanks, but I'm not sure if this is a valid assumption. The return value with a NULL key is undefined, it just should not access out of bounds. Several hashes were OOB.

wangyi-fudan commented 5 years ago

It is just an extreme case test to see if these hashes works properly. if they do memory access, they will fail. xxh3 leaks its key if 0 byte string is processed. some hashes returns zero is key is zero at 0 byte, which is not so good.

erthink commented 5 years ago

I think this is valid test, i.e. function should not read data when len = 0. On the other hand "Imperfect" criteria is wrong.

For instance, t1ha2_atonce() pass avalanche test for len=0 (i.e. for seeds only) by https://github.com/demerphq/smhasher. But t1ha2_atonce() intentionally return 0, when both seed and data-length are zero (and I think that is normal behavior).

Cyan4973 commented 5 years ago

xxh3 makes a distinction between its secret key, and the seed. The secret key is hard to generate, while the seed is just a parameter of the function, which could change on a whim.

xxh3 with a length of 0 and a seed of 0 will return 0, and, like @leo-yuriev, I also believe this is correct behavior. At least, I received requests for such a feature.

When length is 0 but seed is not 0, xxh3 returns the seed (not the secret key !). If this is considered too much of a secret to share, I can change the function so that it returns something less obvious. After all, optimizing the return value for len==0 is not terribly high in priority order. Note that, since it's basically a bijection, whatever its complexity, it will always be possible to brute-force the reverse transformation (given enough resource).

rurban commented 3 years ago

I'll add something like that. It is similar to some BadSeeds tests, where a 0 seed with a NULL key (or len 0) returns mostly 0, which is fair, but not ok for len > 0 and only null bytes. Yann made a good point revealing the secret. I think we should print that case.

This is a cheap test for some edge cases.