Closed aw closed 2 years ago
To follow-up on the above comment, which relates to what I hinted at in #10:
Length
of word plays an important role, in that it's possible to quickly skip a word lookup by simply comparing its length. I'm aware the hashes all have the same length (32-bits), but that now makes it mandatory to first hash the word to find a match, as opposed to using a much quicker/simpler heuristic.Code
field is enter
?OK I think I can answer my own questions here:
Length
is not saved because it's not necessary. If it was used for lookups and a match is found, we would still need to compare either a hash of the words or the words themselves... Hashing is much more efficient.Still not sure about the Code
field... is it just a pointer to the body of the word definition? i.e: always changing?
Sorry for the delay here! I'm considering doing a "fancy" write-up of these questions / answers and making a nice chunk of reference documentation from it (these are solid questions).
But yea, we don't need Length
since DerzForth uses a fixed-size (4 byte) hash value instead. Some old school Forths use a strategy of encoding the length of the word name (1 byte) plus the first three letters of the name (3 bytes) to identify words. IMO, that's really just a terrible string hash. DJB2 is a much better solution that dwarfs its extremely low chance of collision.
Each word requires two bits of metadata: immediate and hidden. I could move these bits into a whole new byte but that'd waste 6 bits. I think that cutting down the hash value range to 2^30
is better tradeoff for this particular embedded-focused Forth implementation. As you mentioned, this likely still exceeds how many words you'd be able to define given the tight memory limit of each target device.
The Code
field holds the address of the machine code for a given word. For builtin / primitive words, it just points to 4 bytes ahead (since the ASM code for a primitive word immediately follows its header definition). For secondary words, the Code
field points to the inner interpreter routine named ENTER
(sometimes known as DOCOLON
). To quote Brad Rodriguez's amazing Moving Forth series:
But, SQUARE is a high-level "colon" definition -- it holds a "thread", a list of addresses. In order
to perform this definition, the Forth interpreter must be re-started at a new location: the Parameter
Field of SQUARE. Of course, the interpreter's old location must be saved, to resume the "other"
Forth word once SQUARE is finished. This is just like a subroutine call! The machine language action
of SQUARE is simply to push the old IP, set IP to a new location, run the interpreter, and when
SQUARE is done pop the IP. (As you can see, the IP is the "program counter" of high-level Forth.)
This is called DOCOLON or ENTER in various Forths:
PUSH IP onto the "return address stack"
W+2 -> IP W still points to the Code Field, so W+2 is
the address of the Body! (Assuming a 2-byte
address -- other Forths may be different.)
JUMP to interpreter ("NEXT")
DerzForth's implementation of ENTER
is functionally equivalent:
sw IP, 0(RSP) # IP -> [RSP]
addi RSP, RSP, 4 # RSP += 4
addi IP, W, 4 # IP = W + 4 (skip code field)
j next
Thank you for the explanation. I think DOCOL
was the missing link here. I read about it but didn't realize your ENTER
has the same role. This makes much more sense now!
I'm trying to understand
derzforth
's word header but I can't seem to grok how you're defining it.The sectorforth's diagram of a word's dictionary entry looks like this:
I made a diagram for
derzforth
:My questions to better understand
derzforth
:enter
?I think with this knowledge I'll be better positioned to contribute here.
Thanks!