hlorenzi / customasm

💻 An assembler for custom, user-defined instruction sets! https://hlorenzi.github.io/customasm/web/
Apache License 2.0
704 stars 55 forks source link

Feature request: allow use of \x in multi byte character string functions #188

Open TheRektafire opened 9 months ago

TheRektafire commented 9 months ago

So I mentioned this on discord a couple days ago but decided I'd put it here too, I'm working on a ruledef for vircon32 http://www.vircon32.com/ and I ran into a bit of an issue regarding compatibility between the official vircon32 assembler and customasm regarding strings. With the official assembler there is a "string" directive that outputs a string as a null terminated set of 32 bit little endian encoded characters. This combined with the fact that #bits on vircon32 is 32 means that this is the preferred format for strings. However unlike with the current customasm utf32le() function the vircon32 assembler "string" directive allows putting hex declarations using \x in the strings and I've been using that pretty extensively for an escape code parsing feature I've been working on. Unfortunately customasm doesn't seem to allow this as trying to use \x in a utf32le() string leads to an "invalid escape sequence" error. This is a bit of an issue since it means my strings take wayyyy longer to type out because every time I need to add an escape code I have to put the part of the string before it in a utf32le(), then add 0x1B and the escape code number as 2 separate le()\32s, THEN I can add the argument(s) (but they also have to be separatele()s obviously), then write the next part of the string after in yet anotherutf32le()`. And if I want the string to have multiple of them in it< well as you can imagine the string will become realllllly long to type out and hard to read. For example my escape code for changing text color is 0x01 and the arguments are the blue green and red values for the new text color, in that order. So take this string for example

string "hello\x1B\x01\xFF\x00\x00 world"

With my current print function this would print "hello world" at the specified pixel position on the screen with "hello" being white and "world" being blue. However because of the inability to use \x in a utf32le string I would have to write it out in customasm like so...

#d utf32le("hello"), le(ESCAPECODE_START)`32, le(ESCAPECODE_CHANGE_TEXT_COLOR)`32, le(0xFF)`32, 0`32, 0`32, utf32le(" world"), NULL`32

And that's just a simple example, i have more complex stuff and longer strings to work with too and I'd much rather not have to rewrite them all like that. So it would be nice if I could just use \x in the strings so I don't have to completely rewrite both all my strings and my string parsing code itself (even compressing arguments into single words instead of a single byte per word wouldn't really help too much but it would at least help a little I guess but that would require a rewrite of my string code since it expects a single character byte per 32 bit word)

MineRobber9000 commented 8 months ago

We already have escape sequences for that:

#d utf32le("hello\u{1B}\u{01}\u{FF}\u{00}\u{00} world")
customasm v0.13.4-8-g4c543f5 (2023-12-27, x86_64-unknown-linux-gnu)
assembling `tmp.asm`...
writing `/dev/stdout`...
 outp | addr | data (base 16)

  0:0 |    0 | 68 00 00 00 65 00 00 00 6c 00 00 00 6c 00 00 00 6f 00 00 00 1b 00 00 00 01 00 00 00 ff 00 00 00 00 00 00 00 00 00 00 00 20 00 00 00 77 00 00 00 6f 00 00 00 72 00 00 00 6c 00 00 00 64 00 00 00 ; utf32le("hello\u{1B}\u{01}\u{FF}\u{00}\u{00} world")
resolved in 1 iteration

I mean, it's a little clunky (not sure why we need the brackets? I guess so we don't need to add a bunch of zeros for 1-byte codepoints?), but it's there.

TheRektafire commented 8 months ago

We already have escape sequences for that:

#d utf32le("hello\u{1B}\u{01}\u{FF}\u{00}\u{00} world")
customasm v0.13.4-8-g4c543f5 (2023-12-27, x86_64-unknown-linux-gnu)
assembling `tmp.asm`...
writing `/dev/stdout`...
 outp | addr | data (base 16)

  0:0 |    0 | 68 00 00 00 65 00 00 00 6c 00 00 00 6c 00 00 00 6f 00 00 00 1b 00 00 00 01 00 00 00 ff 00 00 00 00 00 00 00 00 00 00 00 20 00 00 00 77 00 00 00 6f 00 00 00 72 00 00 00 6c 00 00 00 64 00 00 00 ; utf32le("hello\u{1B}\u{01}\u{FF}\u{00}\u{00} world")
resolved in 1 iteration

I mean, it's a little clunky (not sure why we need the brackets? I guess so we don't need to add a bunch of zeros for 1-byte codepoints?), but it's there.

Oh I wasn't aware of that, that certainly does make things simpler. Though it isn't really compatible with the official assembler which isn't 100% ideal but oh well, I guess if I'm already not caring about official assembler compatibility I might as well just use single u32s for the args instead of splitting them, it would not only make the strings themselves shorter to write out but would result in better performance too since I wouldn't have to grab each byte of a multibyte argument one at a time. So I think I'll stick with doing that for now and just rewrite my string parsing code to be more customasm friendly