Suggestion: lock the map during access to prevent recursive modification

dead-claudia commented 3 years ago

Essentially, what I'm proposing is that emplace should set a flag to block recursive updates to the same map or weak map, whether it be through set, delete, or emplace. This enables implementations to not have to do nearly the bookkeeping when managing updates.

In the current form, engines may have to re-look up the hash - this is always necessary if the entry is deleted on the update callback, and engines optimizing for space may end up doing a second hash table lookup always anyways as it's just simpler and smaller to implement.*
With this suggestion, they can just do it once and know for certain it will continue to exist at that exact position when the callback returns.
- This also means all they need to keep around on the stack is a pointer to the target hash table entry, and all they do before returning is a single pointer-sized store. They need neither the map itself nor the hash key to store, and so it becomes a very lightweight process overall.

It also exposes potential bugs to the user - recursive updates are rarely intended, and if absolutely desired, you can just do something like this to still work around the lock and avoid it:

let deleteMarker = Symbol("deleteMarker")
// Push `[key, deleteMarker]` to trigger delete or `[key, value]` to trigger set
let queued = [[initialKey, initialValue]]

while (queued.length) {
  let [key, value] = queued.shift()
  if (value === deleteMarker) {
    map.delete(key)
  } else {
    map.emplace(key, {
      insert: () => {
        // do insert things
        return newValue
      },
      update: v => {
        // do update things
        return newValue
      },
    })
  }
}

*This is only technically the case if the hash table is represented as an array of inline key/value pairs (as in something like std::vector<HashTable>) rather than an array of pointers to entries (as in something like std::vector<std::shared_ptr<HashTable>>). For efficiency, nearly every hash table implementation in practice does the former if they want to aim for any serious performance, and of course, every engine I'm aware of also does it that way, and so yeah, it's widely enough applicable for me to say "always" here as that's in practice the case because of how everyone implements it.

dead-claudia commented 3 years ago

I just found this bit about lower-spec VMs, and this just reinforces this suggestion - this really feels like an algorithmic oversight.

acutmore commented 1 week ago

The latest spec still seems to have this issue. Would be great to decide on a solution.

Right now there is no check after the callback is invoked

https://github.com/tc39/proposal-upsert/blob/8668664a3d2e298d183d3d57fb6a548869ec3dbf/spec.emu#L45

That mapData hasn't been updated before we get to

https://github.com/tc39/proposal-upsert/blob/8668664a3d2e298d183d3d57fb6a548869ec3dbf/spec.emu#L47

Meaning there could end up with two records with matching keys.

I personally like the idea of throwing a re-entrancy error in map.set. Though as that is a method that does not throw today I can see how this may be unpopular

dead-claudia commented 1 week ago

Alternatively, a spec note could be used to recommend a special boolean "has reserved entry" flag to make the added check cheap:

On emplace:
1. Lookup entry.
2. Set "has reserved entry".
3. Execute insert/update callback.
4. If "has reserved entry" is clear, re-lookup entry again.
5. Clear "has reserved entry".
6. Insert/update/remove value.
On every other mutable operator:
1. Clear "has reserved entry".
2. Proceed as before.

This could be further optimized by keeping the old value (as just a raw integer reference value - it can't be followed by GC) and comparing that for equality instead. Then, the "clear" woukd be setting it to a sentinel, the "set" would be setting it to the key, and the test would be comparing for equality. And this pattern could then be extended to has/get/update by storing the value of the most recently looked-up entry. (IIRC this is not far off of how WebKit optimized has+get pairs, but this is much more general.)

tc39 / proposal-upsert

Suggestion: lock the map during access to prevent recursive modification #40