Closed PMunch closed 1 year ago
seq
actually makes some code more perhaps a bit more efficient once index is pre-computed
let prc = if not isClosure: bb.sym else: bb[0].sym
if prc.offset < -1:
# it's a callback:
c.callbacks[-prc.offset-2].value(
but we can achieve "best of both worlds" by having keeping callbacks
+ adding callbacksTable
, so that the following would then be fast too:
proc procIsCallback(c: PCtx; s: PSym): bool =
if s.offset < -1: return true
var i = -2
for key, value in items(c.callbacks):
if s.matches(key):
doAssert s.offset == -1
s.offset = i
return true
dec i
i can explain more in case it's unclear, but it seems like a very easy change
Or use a "BiTable". (Introduced in my araq-crazy branch, but generally useful.)
Or use a "BiTable". (Introduced in my araq-crazy branch, but generally useful.)
How would a BiTable solve anything? What @timotheecour pointed out was that indices into the callbacks sequence is fast, not going from VmCallback
to key, something I don't think is done a lot. Or am I missing something here?
how about we add:
# in tables.nim
proc getByIndex*[K,V](t: Table[K,V], index: int): (lent K, lent V)
## returns the key, value pair stored at `s[index]` where `s` is the underlying container for t
proc getByIndex*[K,V](t: var Table[K,V], index: int): (var K, var V)
which retrieves a key,value pair in guaranteed O(1)
time and avoids the need to hold an auxiliary table.
We just make it clear in doc comment that mutations on table could change the key,value for a given index.
it's useful in cases (like here) where we need fast access to a table for given pre-computed indexes, and where user knows what he's doing so that entries in table don't move around.
How would you implement that though? The sequence held by tables is full of holes (that's how hash tables work..)
holes don't matter; the table is prepopulated in vmops, after that the index don't change.
step 1: populate table entries in vmops
callbacks*: seq[tuple[key: string, value: VmCallback]]
=>
callbacks*: Table[key: string, value: VmCallback]
step 2:
proc procIsCallback(c: PCtx; s: PSym): bool =
if s.offset < -1: return true
var i = -2
for key, value in items(c.callbacks):
if s.matches(key):
doAssert s.offset == -1
s.offset = i
return true
dec i
=>
proc procIsCallback(c: PCtx; s: PSym): bool =
if s.offset < -1: return true
let index = c.callbacks.getIndex(s.toKey)
if index != -1:
doAssert s.offset == -1
s.offset = -2 - index
return true
step 3:
# it's a callback:
c.callbacks[-prc.offset-2].value(..)
=>
# it's a callback:
c.callbacks.getByIndex(-prc.offset-2).value(..)
so all we need is getIndex
and getByIndex
in tables.nim
it'll benefit not just vmops.nim but any application that needs fast queries for pre-computed index offsets; the table just needs to not have entries moving while getByIndex
and getIndex
are called (and there are easy ways to make this safe, I can explain if needed)
matches
is:
proc matches(s: PSym; x: string): bool =
let y = x.split('.')
var s = s
for i in 1..y.len:
if s == nil or (y[^i].cmpIgnoreStyle(s.name.s) != 0 and y[^i] != "*"):
return false
s = s.owner
result = true
that's to support the wildcard *
in stdlib.*.staticWalkDir
; however this wildcard really is not needed and it's the only vmops that uses it; without it, toKey
simply amounds to somthing like fully_qualified_normalized_symbol(s)
, which would avoid the current O(N*cost(matches))
even then, note we can still support wildcards even with the suggested table implementation
even if speedup won't be significant with a table (as suggested above), we should still do it IMO because it's no more complex, and (more importantly) it will benefit other use cases beyond vmops with similar setting where fast O(1) retrieval is needed, and where performance will be noticeable
it enables also other use cases as well (eg: random iteration into a table) which aren't possible currently
Oh I see, it essentially caches the location of the identifier in the structure. Normally what you'd do for that in a hash table is to get a pointer to the value and store that. As long as it isn't removed from the table the GC shouldn't clear it and the pointer should stay valid.
I don't see how it'd help here; the vm uses an integer offset (see c.callbacks[-prc.offset-2].value(...)
) encoded in the VM instructions; so even if we had a pointer, we'd still need to convert it to an integer so that VM can store it; for that you'd also need address of 0'th element so you can compute offset as (in pseudocode) offset = addr(ithElement)-addr(0thElemnet)
;
on top of that this won't help when value type is ptr-like (ref|ptr|pointer|cstring|proc), since the Table API for get
in this case returns the address of the element, not of the seq cell containing the element; EDIT: check whether this is true
whereas with getIndex
and getByIndex
you avoid all those problems (integer indexes instead of pointers have other advantages too)
Oh yeah, it wasn't a solution to this problem in particular, I was just thinking out loud how I would normally do that when designing a solution from the bottom up (but that might just be my C background talking). And to "convert it to an integer so that the VM can store it" you can just use the value of the pointer, it's just a number. Only problem is that you wouldn't be able to store all pointers if you have special semantics for positive/negative numbers like here (although you could probably work around that with a relative pointer kind of like what you're mentioning). But I'm all for adding a more Royale way of doing it (after all pointers in Nim are a bit of a no-no), even though it comes with some implied limitations that you can't use an index after modifying the table (this could of course be safety checked in the table implementation, but then you're adding more overhead to something that's meant to be an optimisation. At least remove it with -d:danger
).
you can just use the value of the pointer, it's just a number.
you can't retrieve in O(1) unless it's an array, so storing just the pointer is no-go (would just amount to using the regular hash table retrieval directly, which is not quite O(1), or at least a big constant factor); you'd need pointer offset addr(p[i]) - addr(p[0])
, and the you'd still need getByIndex
anyways. I don't see any other approach besides what I'm suggesting to guarantee O(1).
more overhead to something that's meant to be an optimisation
it's simply checking a boolean flag that gets set on rehashes; easy for branch prediction; if it has any measurable impact -d:danger can remove that check but it sounds negligible compared to the other costs involved in hash computation + hash=>index computation.
Like I said though, I doubt this will have any performance impact for vmops (only few registered procs) but it can definitely have performance benefits in other scenarios. For vmops proper, the bigger impact features I'd like to have (i've already implemented all/most of these in my private branch but no PR) are:
you can't retrieve in O(1) unless it's an array
I think you misunderstood what I meant. What I do in cases where I want to put things in a table but have faster access to already found elements is that I store a pointer or reference to the object in the table. This means I can keep this pointer around after having found it in the table, and it will still point to the some data (and access is of course O(1)). But again, this might not be possible with how the offsets are stored in this specific case.
Oh yeah, it's not a huge overhead, but sometimes every last bit counts.
Just did a rough count and the NimScript environment used for configuration files appears to have around 90 registered callbacks, so not a massive amount but still not something I'd like to linearly search through too often.
While looking at how the NimScript VM was implemented I couldn't help to notice that
callbacks
in the VM are defined as a sequence of tuples with a string key to identify them: https://github.com/nim-lang/Nim/blob/6969a468ceaab7384d7448bfd88e47a5b24c3a97/compiler/vmdef.nim#L258. I was wondering why this isn't a table? If you declare a lot of callback this could potentially be a huge bottleneck (and interestingly it means that you should sort your registrations by assumed usage frequency).