Closed primo-ppcg closed 1 year ago
What do you think about reserving a value to avoid the allocation?
@@ src/include/janet.h, 976 @@
- int32_t *hash;
+ int32_t hash;
@@ src/core/tuple.c, 40 @@
- head->hash = NULL;
+ head->hash = -1;
@@ src/core/tuple.c, 57 @@
int32_t janet_tuple_hash(const Janet *tuple) {
JanetTupleHead *head = janet_tuple_head(tuple);
- if (head->hash == NULL) {
- head->hash = (int32_t*)janet_malloc(sizeof(int32_t));
- *head->hash = janet_array_calchash(tuple, head->length);
+ if (head->hash == -1) {
+ int32_t hash = janet_array_calchash(tuple, head->length);
+ head->hash = hash == -1 ? -2 : hash;
}
return head->hash;
}
What do you think about reserving a value to avoid the allocation?
A hash isn't really a number per se, but rather just some raw bytes that can be more easily handled by pretending it's a number. Internally, hashes are mixed via unsigned integer operations, including bit shifting etc.
It's not impossible to end up with a legitimate hash value that is -1
.
Consequently, having any sort of magic value would mean that those values are no longer representable.
As an alternative, there could be a general metadata bitfield.
Something like an int32 called meta, with a MHASHED = 1 << 0
in it.
It would then be possible to test for additional optimizations later (e.g sorted? anything really) while keeping the fact that it has been / hasn't been hashed on hand.
This would also allow the use of this sort of amortizing technique for real array hashes (i.e ones calculated like they are for tuples -, simply unset that bit whenever the array is modified.
What do you think about reserving a value to avoid the allocation?
I thought of that, but wasn't clever enough to think of this:
head->hash = hash == -1 ? -2 : hash;
-1 and -2 are unfortunate choices, though, because of this: https://github.com/janet-lang/janet/blob/7475362c85831da31430dc91e0c01aba0a80bcf9/src/core/value.c#L325 Perhaps -1 and 1 would be better. I've amended the branch to do this instead.
As an alternative, there could be a general metadata bitfield.
Also seems like it could be a good option, and not only for tuples!
If we are on the topic of micro optimizations, using 0
as the sentinel hash is probably a bit faster on most architectures, and will compile down to a single conditional instruction. I believe this is how the JVM does string hashing.
As an alternative, there could be a general metadata bitfield.
Actually, tuples already have flags, don't they?
int32_t janet_tuple_hash(const Janet *tuple) {
JanetTupleHead *head = janet_tuple_head(tuple);
if (janet_tuple_flag(tuple) & JANET_TUPLE_FLAG_HASHED)
return head->hash;
head->hash = janet_array_calchash(tuple, head->length);
janet_tuple_flag(tuple) |= JANET_TUPLE_FLAG_HASHED;
return head->hash;
}
Seems like a good option: https://github.com/primo-ppcg/janet/commit/51166ece48f6bff1288b995c2a6507936a69a1a9
I don't intend to make a PR about this, I'm merely presenting an idea for consideration.
Tuples are hashed upon creation. At first glance it seems like a good idea, as it speeds up repeated comparisons considerably.
However, this pre-computed hash is only used in two instances:
hash
functionI think that both of the uses are fairly rare, especially if
struct
hashing were also to be deferred (in its entirety, which would avoid hashing its values unless necessary. The keys obviously must be hashed). It seems to me that the vast majority of tuples will complete their life cycle without their pre-computed hashes having been examined even once.I've prepared an experimental branch (version 2) (version 3) which defers tuple hashing until needed. Primarily, I've changed
janet_tuple_hash
from a macro that extracts the hash from the header, to a function that checks if the hash has already been computed. The results seem pretty good:master:
branch:
In particular for the splice form, hashing the tuple accounts for approximately 70% of the time taken. I think this is significant, as tuples are used essentially everywhere. Structs, and possibly also strings could also have their hashing deferred.