Closed phipsgabler closed 3 years ago
Do you have any specific drawbacks in mind with the magic constant approach? If not I would prefer to have one. (Also for ComposedLens). Possible definition for the constant:
const SALT_IndexLens = objectid(IndexLens)
None other than me having no idea whether this is accepted good practice. The only objection I could remotely think of is: is it guaranteed that objectid(IndexLens)
(or even hash(:IndexLens)
) is constant in a distributed setting? Is that ever (considered) a problem?
If it's OK with you, I'll go ahead with it.
None other than me having no idea whether this is accepted good practice. The only objection I could remotely think of is: is it guaranteed that
objectid(IndexLens)
(or evenhash(:IndexLens)
) is constant in a distributed setting? Is that ever (considered) a problem?
I don't know the answer to these either. To be safer, let's just hardcode the following values:
julia> using Setfield
julia> objectid(Setfield.IndexLens)
0x8aa710fd8cb95e7e
julia> objectid(Setfield.ComposedLens)
0x1c9765b995720e93
To be safer, let's just hardcode the following values
If by hardcode you mean const value = objectid(object)
, objectid(object)
is highly likely to change with different uses of the package and also with different runs of some use of the package. In a distributed setting, too. It would be trouble to have hash values for the same item in distinct contexts within a single run differ if they were to be checked for equality:
isequal is the comparison function used by hash tables (Dict). isequal(x,y) must imply that hash(x) == hash(y). ...
Furthermore, isequal is linked with isless, and they work together to define a fixed total ordering, where exactly one of isequal(x, y), isless(x, y), or isless(y, x) must be true (and the other two false).
Thanks a lot @JeffreySarnoff for the clarification. By hardcoding I meant
const value = 0x8aa710fd8cb95e7e # objectid(Setfield.IndexLens)
objectid(object)
is highly likely to change with different uses of the package and also with different runs of some use of the package.
But the default hash
implementation uses objectid
right? So the default Base.hash
would suffer from the same instabilities?
not for all things -- Strings, Tuples of Ints and other bitstypes for example keep their objectids
struct S{T}
value::T
end
function hash_cheat(obj, h::UInt)
hash(obj.value, h)
end
const SALT = 0x8aa710fd8cb95e7e
function hash1(obj, h::UInt)
hash(obj.value, hash(SALT, h))
end
function hash2(obj, h::UInt)
hash(obj.value, h+SALT)
end
obj = S((1,))
seed = UInt(1)
using BenchmarkTools
@btime hash_cheat($(Ref(obj))[], $(Ref(seed))[])
@btime hash1($(Ref(obj))[], $(Ref(seed))[])
@btime hash2($(Ref(obj))[], $(Ref(seed))[])
# 1.833 ns (0 allocations: 0 bytes)
# 1.813 ns (0 allocations: 0 bytes)
# 1.833 ns (0 allocations: 0 bytes)
I think we should incorporate some type specific salt into our hash and based on the above benchmark,
it seems there is no performance difference between using +
or hash
for the mixing.
So I suggest we use the following implementation:
function Base.hash(o::IndexLens, h::UInt)
salt = 0x8aa710fd8cb95e7e #objectid(Setfield.IndexLens)
hash(o.indices, hash(salt, h))
end
it seems there is no performance difference between using + or hash for the mixing.
There is under heavy use (Base knows best); simple benchmarking will not reveal that.
You should define your salt constant outside of and just above the hash function (better practice) and you must provide both a 64bit and a 32bit constant as hash
uses UInt
which may be either one.
it seems there is no performance difference between using + or hash for the mixing.
There is under heavy use (Base knows best); simple benchmarking will not reveal that.
Base operates under many constraints like generality and backward compatibility, so "Base knows best" is not obvious to me in this case. But anyway lets do it the Base way:
function make_salt(s64::UInt64)::UInt
if UInt === UInt64
return s64
else
return UInt32(s64 >> 32) ^ UInt32(s64 & 0x00000000ffffffff)
end
end
const SALT_IndexLens = make_salt(0x8aa710fd8cb95e7e) # objectid(Setfield.IndexLens)
Base.hash(o::IndexLens, h::UInt) = hash(o.indices, SALT_IndexLens + h)
Cool. Point taken. Base knows best about hashing Julia constructs (I should have been more explicit). A great deal of effort has gone into the hashing approach and its most efficient implementation.
Thanks again @JeffreySarnoff for all your patience and explanations, I appreciate them.
Great! I'll update to the latest version in the next couple of days, plus a few more tests.
@phipsgabler thanks again! This is now available in Setfield 0.8.
@jw3126 Sorry if you've already thought through this, but, I think this is actually technically non-breaking. The only documented public interface we need to satisfy is isequal(x,y) implies hash(x)==hash(y)
. So, unless we've already documented that hash(lens)
satisfies some additional property, we can freely re-define hash
even for patch releases (that'd be annoying, but technically OK) provided it satisfies the above invariance. It's even fine to increment the salt for each release or just use hash(::Lens) = UInt(0)
(of course, I'm not suggesting this :slightly_smiling_face:).
(I just thought to ramble this, since I noticed JuliaObjects/Accessors.jl#31)
Thanks @tkf So for this PR we already tagged a breaking release. But in the case of Accessors your advise is to only bump the patch version?
Yeah, I think Setfield.jl is good as-is. But I don't think Accessors.jl needs a breaking release.
Now I'm not an expert in these matters, but is seems the bottleneck is not hashing the tuples, but the types or symbols (since
hash(::Symbol)
falls back toobjectid
, too). In the commit without adding type symbols tohash
(https://github.com/jw3126/Setfield.jl/commit/2ca540462d30de73d3474d80c84fa72aec6115d1):With the symbols (https://github.com/jw3126/Setfield.jl/commit/880daf445f735b2c94fc1b037688f3daf92dfe30):
Without the symbol, an index lens hashes to the same as the underlying tuple, though:
That could be solved by using a magic constant instead, but is that a good idea?
CC @torfjelde