Open dead-claudia opened 3 years ago
I just found this bit about lower-spec VMs, and this just reinforces this suggestion - this really feels like an algorithmic oversight.
The latest spec still seems to have this issue. Would be great to decide on a solution.
Right now there is no check after the callback is invoked
https://github.com/tc39/proposal-upsert/blob/8668664a3d2e298d183d3d57fb6a548869ec3dbf/spec.emu#L45
That mapData hasn't been updated before we get to
https://github.com/tc39/proposal-upsert/blob/8668664a3d2e298d183d3d57fb6a548869ec3dbf/spec.emu#L47
Meaning there could end up with two records with matching keys.
I personally like the idea of throwing a re-entrancy error in map.set
. Though as that is a method that does not throw today I can see how this may be unpopular
Alternatively, a spec note could be used to recommend a special boolean "has reserved entry" flag to make the added check cheap:
This could be further optimized by keeping the old value (as just a raw integer reference value - it can't be followed by GC) and comparing that for equality instead. Then, the "clear" woukd be setting it to a sentinel, the "set" would be setting it to the key, and the test would be comparing for equality. And this pattern could then be extended to has/get/update by storing the value of the most recently looked-up entry. (IIRC this is not far off of how WebKit optimized has+get pairs, but this is much more general.)
Essentially, what I'm proposing is that
emplace
should set a flag to block recursive updates to the same map or weak map, whether it be throughset
,delete
, oremplace
. This enables implementations to not have to do nearly the bookkeeping when managing updates.It also exposes potential bugs to the user - recursive updates are rarely intended, and if absolutely desired, you can just do something like this to still work around the lock and avoid it:
*This is only technically the case if the hash table is represented as an array of inline key/value pairs (as in something like
std::vector<HashTable>
) rather than an array of pointers to entries (as in something likestd::vector<std::shared_ptr<HashTable>>
). For efficiency, nearly every hash table implementation in practice does the former if they want to aim for any serious performance, and of course, every engine I'm aware of also does it that way, and so yeah, it's widely enough applicable for me to say "always" here as that's in practice the case because of how everyone implements it.