robert-strandh / SICL

A fresh implementation of Common Lisp
Other
1.07k stars 78 forks source link

Consider sxhash as a generic function? #173

Open mseddon opened 4 years ago

mseddon commented 4 years ago

In many situations, a domain object can be more efficiently hashed in some context (e.g. it has a giant blob of irrelevant cache data that a naive hashing algorithm would have to traverse, or there is a particular restriction that we, the designers, understand).

Should sxhash be a generic function, defaulting to the current implementation, but extendable for users of SICL based compilers?

Care must be taken with subclasses, however, so perhaps it should only apply to a particular class via a mechanism outside of generic functions, or compose the default fallback sxhash onto subclass members not implementing sxhash.

It must also always be true that when (equal x y) then (= (sxhash x) (sxhash y)), of course.

mseddon commented 4 years ago

Hm, coming back, I worry, since following this to it's natural conclusion, now (equal x y) would also need to be a generic function. which, to me seems right; if we lived in an ideal world, but at the same time sort of hoses the entire semantics of CL... closing because the consequences are probably far hairier than I can imagine.

With enough time there may yet be gold in them hills though. But at this point it's probably not really CL.

Bike commented 4 years ago

sxhash generally does different things depending on the type of object it's given anyway, so making it generic seems reasonable to me. if a client wants to use its own implementation it can just do that, though, it's not like cleavir depends on sxhash deets

no-defun-allowed commented 4 years ago

In the case of SICL's current SXHASH implementation, SXHASH is implemented as the value of an internal function which passes a hash state through limited depth-first traversal of an object. Here, SXHASH doesn't recursively call itself, and the protocol is a bit more complicated, as the implementor must combine whatever entropy-producing values they hash using the internal hash function (currently FNV-1a).

Hash tables are also a little more complicated; we use a random initial state for the hash function, so that different sets of keys will be stored differently, preventing complexity attacks. (lwn.net's Denial of service via hash collisions describes why we want this pretty well.) That also precludes using SXHASH there, and we use the aforementioned internal functions in hash tables as well.

We could make the internal hash functions generic, and either have the client provide data for the hash function to combine, or perform the combination themselves; but that probably would not manifest in specializing SXHASH itself. It might look more like the murmurhash generic function in cl-murmurhash.

Another less invasive option would be to give a specialised :hash-function to hash tables that have domain objects that should be hashed specially.

mseddon commented 4 years ago

I actually realise after some thought that sxhash as a gf could lead to all manner of subtle bugs, and the hash algorithm MUST be associated with the table. One could, for example, destroy other system hashtables by innocently adding a definition of sxhash for a particular type after it had already been inserted into another hashtable. That or you would have to re-hash all tables every time someone adds a definition to sxhash, at least. Maybe timestamp the latest change, and lazily rehash when needed.

no-defun-allowed commented 4 years ago

Lazy rehashing would increase the minimum complexity of a hash table implementation quite a lot; providing a :hash-function would probably be the easiest and least dangerous solution.