Should Nim VM callbacks be a table instead of a sequence?

PMunch commented 4 years ago

While looking at how the NimScript VM was implemented I couldn't help to notice that callbacks in the VM are defined as a sequence of tuples with a string key to identify them: https://github.com/nim-lang/Nim/blob/6969a468ceaab7384d7448bfd88e47a5b24c3a97/compiler/vmdef.nim#L258. I was wondering why this isn't a table? If you declare a lot of callback this could potentially be a huge bottleneck (and interestingly it means that you should sort your registrations by assumed usage frequency).

timotheecour commented 4 years ago

seq actually makes some code more perhaps a bit more efficient once index is pre-computed

      let prc = if not isClosure: bb.sym else: bb[0].sym
      if prc.offset < -1:
        # it's a callback:
        c.callbacks[-prc.offset-2].value(

but we can achieve "best of both worlds" by having keeping callbacks + adding callbacksTable, so that the following would then be fast too:

proc procIsCallback(c: PCtx; s: PSym): bool =
  if s.offset < -1: return true
  var i = -2
  for key, value in items(c.callbacks):
    if s.matches(key):
      doAssert s.offset == -1
      s.offset = i
      return true
    dec i

i can explain more in case it's unclear, but it seems like a very easy change

Araq commented 4 years ago

Or use a "BiTable". (Introduced in my araq-crazy branch, but generally useful.)

PMunch commented 4 years ago

Or use a "BiTable". (Introduced in my araq-crazy branch, but generally useful.)

How would a BiTable solve anything? What @timotheecour pointed out was that indices into the callbacks sequence is fast, not going from VmCallback to key, something I don't think is done a lot. Or am I missing something here?

timotheecour commented 4 years ago

how about we add:

# in tables.nim
proc getByIndex*[K,V](t: Table[K,V], index: int): (lent K, lent V)
  ## returns the key, value pair stored at `s[index]` where `s` is the underlying container for t
proc getByIndex*[K,V](t: var Table[K,V], index: int): (var K, var V)

which retrieves a key,value pair in guaranteed O(1) time and avoids the need to hold an auxiliary table. We just make it clear in doc comment that mutations on table could change the key,value for a given index.

it's useful in cases (like here) where we need fast access to a table for given pre-computed indexes, and where user knows what he's doing so that entries in table don't move around.

PMunch commented 4 years ago

How would you implement that though? The sequence held by tables is full of holes (that's how hash tables work..)

timotheecour commented 4 years ago

holes don't matter; the table is prepopulated in vmops, after that the index don't change.

step 1: populate table entries in vmops

callbacks*: seq[tuple[key: string, value: VmCallback]] 
=>
callbacks*: Table[key: string, value: VmCallback]

step 2:

proc procIsCallback(c: PCtx; s: PSym): bool =
  if s.offset < -1: return true
  var i = -2
  for key, value in items(c.callbacks):
    if s.matches(key):
      doAssert s.offset == -1
      s.offset = i
      return true
    dec i
=>
proc procIsCallback(c: PCtx; s: PSym): bool =
  if s.offset < -1: return true
  let index = c.callbacks.getIndex(s.toKey)
  if index != -1:
    doAssert s.offset == -1
    s.offset = -2 - index
    return true

step 3:

        # it's a callback:
        c.callbacks[-prc.offset-2].value(..)

=>
        # it's a callback:
        c.callbacks.getByIndex(-prc.offset-2).value(..)

so all we need is getIndex and getByIndex in tables.nim

it'll benefit not just vmops.nim but any application that needs fast queries for pre-computed index offsets; the table just needs to not have entries moving while getByIndex and getIndex are called (and there are easy ways to make this safe, I can explain if needed)

note

current implementation for matches is:
```
proc matches(s: PSym; x: string): bool =
let y = x.split('.')
var s = s
for i in 1..y.len:
if s == nil or (y[^i].cmpIgnoreStyle(s.name.s) != 0 and y[^i] != "*"):
  return false
s = s.owner
result = true
```
that's to support the wildcard * in stdlib.*.staticWalkDir; however this wildcard really is not needed and it's the only vmops that uses it; without it, toKey simply amounds to somthing like fully_qualified_normalized_symbol(s), which would avoid the current O(N*cost(matches))

even then, note we can still support wildcards even with the suggested table implementation

note2

even if speedup won't be significant with a table (as suggested above), we should still do it IMO because it's no more complex, and (more importantly) it will benefit other use cases beyond vmops with similar setting where fast O(1) retrieval is needed, and where performance will be noticeable

it enables also other use cases as well (eg: random iteration into a table) which aren't possible currently

PMunch commented 4 years ago

Oh I see, it essentially caches the location of the identifier in the structure. Normally what you'd do for that in a hash table is to get a pointer to the value and store that. As long as it isn't removed from the table the GC shouldn't clear it and the pointer should stay valid.

timotheecour commented 4 years ago

I don't see how it'd help here; the vm uses an integer offset (see c.callbacks[-prc.offset-2].value(...)) encoded in the VM instructions; so even if we had a pointer, we'd still need to convert it to an integer so that VM can store it; for that you'd also need address of 0'th element so you can compute offset as (in pseudocode) offset = addr(ithElement)-addr(0thElemnet);

on top of that this won't help when value type is ptr-like (ref|ptr|pointer|cstring|proc), since the Table API for get in this case returns the address of the element, not of the seq cell containing the element; EDIT: check whether this is true

whereas with getIndex and getByIndex you avoid all those problems (integer indexes instead of pointers have other advantages too)

PMunch commented 4 years ago

Oh yeah, it wasn't a solution to this problem in particular, I was just thinking out loud how I would normally do that when designing a solution from the bottom up (but that might just be my C background talking). And to "convert it to an integer so that the VM can store it" you can just use the value of the pointer, it's just a number. Only problem is that you wouldn't be able to store all pointers if you have special semantics for positive/negative numbers like here (although you could probably work around that with a relative pointer kind of like what you're mentioning). But I'm all for adding a more Royale way of doing it (after all pointers in Nim are a bit of a no-no), even though it comes with some implied limitations that you can't use an index after modifying the table (this could of course be safety checked in the table implementation, but then you're adding more overhead to something that's meant to be an optimisation. At least remove it with -d:danger).

timotheecour commented 4 years ago

you can just use the value of the pointer, it's just a number.

you can't retrieve in O(1) unless it's an array, so storing just the pointer is no-go (would just amount to using the regular hash table retrieval directly, which is not quite O(1), or at least a big constant factor); you'd need pointer offset addr(p[i]) - addr(p[0]), and the you'd still need getByIndex anyways. I don't see any other approach besides what I'm suggesting to guarantee O(1).

more overhead to something that's meant to be an optimisation

it's simply checking a boolean flag that gets set on rehashes; easy for branch prediction; if it has any measurable impact -d:danger can remove that check but it sounds negligible compared to the other costs involved in hash computation + hash=>index computation.

Like I said though, I doubt this will have any performance impact for vmops (only few registered procs) but it can definitely have performance benefits in other scenarios. For vmops proper, the bigger impact features I'd like to have (i've already implemented all/most of these in my private branch but no PR) are:

macro to automate converting a proc to a VmCallback (generalizing compiler/vmconv.nim to make it completely automatic)
exposing vmops to allow user defined vm callbacks (eg: printing to stderr at CT, using regex at CT); requires recompiling nim with your callbacks, converted using the previous step
exposing vmops to allow user defined vm callback without requiring recompiling nim; the callbacks get dlopen'd instead when compiling your user code, so it's more flexible; it's related to CT FFI but done a bit differently.

PMunch commented 4 years ago

you can't retrieve in O(1) unless it's an array

I think you misunderstood what I meant. What I do in cases where I want to put things in a table but have faster access to already found elements is that I store a pointer or reference to the object in the table. This means I can keep this pointer around after having found it in the table, and it will still point to the some data (and access is of course O(1)). But again, this might not be possible with how the offsets are stored in this specific case.

Oh yeah, it's not a huge overhead, but sometimes every last bit counts.

Just did a rough count and the NimScript environment used for configuration files appears to have around 90 registered callbacks, so not a massive amount but still not something I'd like to linearly search through too often.

nim-lang / Nim

Should Nim VM callbacks be a table instead of a sequence? #14409

note

note2