hlorenzi / customasm

💻 An assembler for custom, user-defined instruction sets! https://hlorenzi.github.io/customasm/web/
Apache License 2.0
715 stars 56 forks source link

Add strlen builtin #187

Closed MineRobber9000 closed 11 months ago

MineRobber9000 commented 11 months ago

According to the docs:

If the string's length is needed, we can use a bit of arithmetic to derive it:

helloworld:
    #d "Hello, world!\0"

helloworldLen = $ - helloworld

But sometimes, we need the length of the string in order to output the string (most notably, Pascal/length-prefixed strings). I personally came across this issue while trying to write a Lua bytecode ruledef (yes, I know I'm a weirdo); Lua strings are variable-int length-prefixed strings, so a 126-character string (excessive, but entirely possible to have in your program) would be encoded as 0xff followed by 126 characters, while a 127-character string would be encoded as 0x01 0x80 followed by 127 characters.

This usecase can't use "a bit of arithmetic", since by the time customasm has emitted the string we're already too late to do anything with that length. Using an asm block and the whole "you can refer to a variable before it exists" thing doesn't work either, since customasm just chokes on not being able to find the variable's value. You can use the "bit of arithmetic" outside of a ruledef, but then it just looks sloppy and is more difficult to use (see below):

#ruledef {
        size {num} => {
                assert(num>=0)
                assert(num<=0x7f)
                0b1 @ (num+1)`7
        }
        ; presumably other definitions of size {num} for larger sizes
}

; you have to do this every time you want to emit a string
; (you'd have to do something similar to emit a string containing binary data,
; since strings are stored as String on the backend and not OsString but that's for another time)
size len ; 87
old = $
#d "=stdin"
len = $-old

The solution I came up with is to add a builtin strlen function, which just returns the string's length (in bytes) as an integer. This solves the previous usecase, as I can simply do the following (compare the above codeblock):

#ruledef
{
        size {num} => {
                assert(num>0)
                assert(num<=0x7e)
                0b1 @ (num+1)`7
        }
    ; presumably other definitions of size {num} for larger sizes
        str {x} => asm {
                size strlen({x})
        } @ x
}

str "=stdin" ; 87 "=stdin"
hlorenzi commented 11 months ago

This looks good! In the future, I'd even like to go even further, and add some functionality to get the bit-size of any kind of value (#95), or the data pointed to by a label (as in #167).