Closed palimondo closed 5 years ago
@palimondo
Yes, I did not name him in this blog post and the reason is quite simple... he reached out to me with a pseudonym and I was uncertain about his name. I was also uncertain about whether he was the inventor, it took me some time to determine credit (wyhash is closely related to prior approaches). But we can ping him... @wangyi-fudan
I will add his name both to the README file and to my blog post.
I verified that the code, with the constant I am using now, passes Big Crush. I think Wang changed the constant and kept working on the code afterward. We can certainly match the current version.
Credit issue should be resolved now.
Also… I see the code from @wangyi-fudan contains wyrand
, that is even simpler than wyhash64
.
wyhash64(uint64_t A, uint64_t B){ return _wymum(_wymum(A^_wyp0, B^_wyp1), _wyp2); }
…
wyrand(uint64_t *seed){ *seed+=_wyp0; return _wymum(*seed^_wyp1,*seed); }
Do I understand correctly that your wyhash64
seems to be a hybrid of these two?
I made no hybrid. I have no designer credit in any of this. I just took the original code that he had back when he first reached out to me. I tested it. That is all.
I am pretty sure that any difference you see are the results of further work on his side of things.
Of course, we can update accordingly.
(I did tweak his code somewhat, e.g., he was using things like unsigned long long and stuff... but I kept the C logic identical.)
I see. BTW I also found this C# implementation, if anybody is interested: https://github.com/cocowalla/wyhash-dotnet
Here are the constants, March 21st...
https://github.com/wangyi-fudan/wyhash/commit/a9f8f38eb0832aacaa983d99ccc8e7b0c55e8ccb
const unsigned long long _wyp0=0x60bee2bee120fc15ull, _wyp1=0xa3b195354a39b70dull, _wyp2=0x1b03738712fad5c9ull;
Here are the constants after March 21st...
const unsigned long long _wyp0=0xa0761d6478bd642full, _wyp1=0xe7037ed1a0b428dbull, _wyp2=0x8ebc6af09c88c6e3ull;
So the constants changed March 21st...
dear all: i come here as i received your emails. the magic constant changed after i developed a more powerful statistic test https://github.com/rurban/smhasher/issues/65 This test breaks several hashes, and wyhash's statistics is not very good. So I redesigned the primers with following criterion: each primer is random sampled. each prime should have exactly 32 1bit and 32 0bit. the xor of each pair of primes should have 32 1bit and 32 0 bit. So these primer are exactly balanced.
@lemire:
Of course, we can update accordingly.
Updating to match the latest stand of wyhash brings up issues with naming and the implementation.
My thinking in PR #3 was:
I see you refer to this generator as
wyhash64
in your article. It makes sense to me to rename the implementation toWyhash64
. This makes it follow the conventional naming likeSplitMix64
orXoshiro128
.
I see that back in March 13th, the key operation was called wyhashmix64
(later renamed to _wymum
) and PRNG was called wyrng_u64
, so I can see why you called it wyhash64
in the blog post. That codebase was still in flux: at the time of your original publication it was called wyrng
(hence the name WyRng
in the #C and Rust ports, I guess?) ... But now I see few problems with this:
wyhash64
which does something else;wyrand
Based on that, can we rename the Swift struct one last time to WyRand
?
Updating to latest stand is not only about using the new constants, the algorithm was also simplified: the latest version uses just a single MUM operation instead of two, as was the case of your original publication, and it mixes in the prime before multiplication. This should be faster than the old version. I think rewritten to match your style it now looks like this:
static inline uint64_t wyrand_stateless(uint64_t *seed) {
*seed += UINT64_C(0xa0761d6478bd642f);
__uint128_t tmp;
uint64_t m1 = *seed ^ UINT64_C(0xe7037ed1a0b428db);
tmp = (__uint128_t)*seed * m1;
uint64_t m2 = (tmp >> 64) ^ tmp;
return m2;
}
Could you please re-verify this still passes the BigCrunch? (EDIT: I've prepared a PR in testingRNG for you, but I don't have the capacity to run that.)
Given that you've updated your blog post with the mention of this project, you might want to also make it reflect this latest wyhash
development?
The Swift implementation that passes the same test* as the C# port looks like this: * only first generated value, see https://github.com/cocowalla/wyhash-dotnet/issues/1 for discussion of how to correctly modify the generator's state.
public struct WyRand: RandomNumberGenerator {
private var state : UInt64
init(seed : UInt64) {
state = seed
}
public mutating func next() -> UInt64 {
state &+= 0xa0761d6478bd642f
let mul = state.multipliedFullWidth(by: state ^ 0xe7037ed1a0b428db)
return mul.high ^ mul.low
}
}
I would update this project to match, if you give me green light on the passing BigCrush.
@palimondo It passes Big Crunch, but I have not done an in-depth analysis (which takes time and manual labor). I'll need more time to update my sites with the information... However, we can move forward, I think.
Flaws are always a possibility, with any generator.
Did you get a chance to bench it on ARM yet? IIRC your follow up post mentioned the old implementation did not fare that well there due to instruction set differences?
@palimondo
Why don't we include a benchmark in Swift? There we can run it on an ARM server and know. I'll open an issue.
Closing this, since PR #7 has been merged to master.
I see the implementation here matches the
wyhash64
from your blog.I was searching the net for the original author, as your article does not mention him. I think it is Wang Yi and the original code is here, if I'm correct: https://github.com/wangyi-fudan/wyhash
Looking at his original code, I've noticed that the magic constants in your implementation do not match those in original.
It looks like the original 3 constants (
_wyp0
–2
) in order of appearance are:0xa0761d6478bd642f
0xe7037ed1a0b428db
0x8ebc6af09c88c6e3
The current implementation here is:
Is this not an issue for the quality of the output? I'm not sure that using different constants the claim of passing the BigCrush is correct...