Open rogueeve575 opened 4 months ago
Proper intel syntax hex literals (0nnnnH) are a good idea to support! Never really liked the syntax, so I just always used 0xnnnn as allowed by nasm.
To be honest, I haven't touched the code in a number of years, but the code is maintained in the sense that I've so far responded to every issue :)
Your way seems fine, but just to jog my memory of how things work, I've done it slightly differently and added a test: https://github.com/mras0/sasm/tree/HexSuffix
If that works for you, I'll merge it into master
Cool, I like how you did it and it does assemble my test cases just as well as my version. If I had any nitpicky at all which I really don't, it would be around needing to push/pop si which I was trying to avoid needing to use the stack for speed, but as a result you end up fetching a lot less instructions than I did so I'm sure it's a win.
I may have been being a little overly cautious in mine; I wasn't certain if I could do the "fancy" addressing modes ("cmp byte [bx+si-1]") on an 8086/80286. I don't have an 8086 but I did test your patch on a real 286 and it's fine, plus I see there's a few other similar uses of those modes already in the program so this is absolutely probably a better way to get a pointer to the last char.
Not being fully familiar with the program I also wasn't 100% sure if it was okay to corrupt the dl register in that subroutine (that I used to signal it was the 'h' form so as not to skip two chars for the "0x"). Your version doesn't need to do that though so doesn't really matter anymore. It's great for the original author to do it because even if you don't 100% remember everything you'll probably have a better idea of stuff like that than I would just due to knowing your own coding style, at least from when I've looked at old projects of mine that's how it seemed.
I'll admit I don't fully understand the part in lines 867-873, I thought this was to deal with leading zeroes somehow, but I can't tell any difference any behavior from my version that was missing that part, maybe it's just an optimization for the number "0h" / "00h":
.HexSuffix:
dec cl
cmp byte [bx], '0'
jne .HexCheckLen
inc bx
dec cl
jnz .HexCheckLen
ret
Your C version is much cleaner than the lazy hack I made for that after the OP as well. So overall looks great to me.
BTW I also noticed that sasm doesn't support the "jle" and "jge" instructions. These are just synonyms for the existing "jng" and "jnl" mnemonics so are really easy to just add to the table:
▶ git diff sasm.asm sasm.c
diff --git a/sasm.asm b/sasm.asm
index 0ff60e9..91a920e 100644
--- a/sasm.asm
+++ b/sasm.asm
@@ -3456,8 +3456,12 @@ DispatchList:
dw InstJCC
db 'JL',0,0,0,0, CC_L
dw InstJCC
+ db 'JGE',0,0,0, CC_NL
+ dw InstJCC
db 'JNL',0,0,0, CC_NL
dw InstJCC
+ db 'JLE',0,0,0, CC_NG
+ dw InstJCC
db 'JNG',0,0,0, CC_NG
dw InstJCC
db 'JG',0,0,0,0, CC_G
diff --git a/sasm.c b/sasm.c
index b73deb2..4d6dad6 100644
--- a/sasm.c
+++ b/sasm.c
@@ -1919,8 +1919,10 @@ static const struct {
{ "JPE" , &InstJcc , JPE },
{ "JPO" , &InstJcc , JPO },
{ "JL" , &InstJcc , JL },
+ { "JGE" , &InstJcc , JNL },
{ "JNL" , &InstJcc , JNL },
{ "JNG" , &InstJcc , JNG },
+ { "JLE" , &InstJcc , JNG },
{ "JG" , &InstJcc , JG },
};
I was going to add support for "xlat" as well, but upon starting I noticed that sasm does already implement it as "xlatb", and documentation seems to indicate that's actually correct, so even though I've always seen it written just "xlat" in every assembler and debugger I've used before, I'm unsure whether or not it'd be appropriate to just add "xlat" as a synonym to "xlatb" as well.
From what I understand apparently Intel intended you to give some bogus argument to the "xlat" form for "documentation" that's then completely ignored and bx used instead, maybe something like "xlat [table]". Seems like a terribly dumb idea to me from a maintainability standpoint, when you could just put a comment instead, but yeah. Another source says it allows for an segment override, so I guess something like "xlat [es:bx]", which at least could be potentially useful although I'm not sure if that's true.
However, I'm guessing as to Intel's intended syntax as I can't find any actual complete examples, and I can't get any of these forms to work in nasm, only plain "xlat" with no argument as I've usually seen, which actually appears incorrect from the documentation, so, lol....
Cool, I like how you did it and it does assemble my test cases just as well as my version. If I had any nitpicky at all which I really don't, it would be around needing to push/pop si which I was trying to avoid needing to use the stack for speed, but as a result you end up fetching a lot less instructions than I did so I'm sure it's a win.
I may have been being a little overly cautious in mine; I wasn't certain if I could do the "fancy" addressing modes ("cmp byte [bx+si-1]") on an 8086/80286. I don't have an 8086 but I did test your patch on a real 286 and it's fine, plus I see there's a few other similar uses of those modes already in the program so this is absolutely probably a better way to get a pointer to the last char.
To be honest, I didn't profile it, and your version was fine as well, just took the opportunity to reorganize the code a bit (after spending a bit of time remembering how it works..). In general I wrote the assembler incrementally with ease of implementation in mind rather than speed. The linear scan of the DispatchList is not great for speed (it should really be a hash table).
You can't use fancy addressing modes, but bx+si is one of the non-fancy ones :) (I find this page a pretty convenient reference: http://ref.x86asm.net/coder.html#modrm_byte_16)
Not being fully familiar with the program I also wasn't 100% sure if it was okay to corrupt the dl register in that subroutine (that I used to signal it was the 'h' form so as not to skip two chars for the "0x"). Your version doesn't need to do that though so doesn't really matter anymore. It's great for the original author to do it because even if you don't 100% remember everything you'll probably have a better idea of stuff like that than I would just due to knowing your own coding style, at least from when I've looked at old projects of mine that's how it seemed.
Using DX should be fine there, but I rely on the tests to keep me honest - I'd forgotten I needed to preserve SI, but luckily that was caught.
I'll admit I don't fully understand the part in lines 867-873, I thought this was to deal with leading zeroes somehow, but I can't tell any difference any behavior from my version that was missing that part, maybe it's just an optimization for the number "0h" / "00h":
.HexSuffix: dec cl cmp byte [bx], '0' jne .HexCheckLen inc bx dec cl jnz .HexCheckLen ret
Your C version is much cleaner than the lazy hack I made for that after the OP as well. So overall looks great to me.
It is skipping a (single) leading '0' (and handling '0H' directly). If not there the length check (at .HexCheckLen) will fail for '0abcdH' (cl would be 5).
BTW I also noticed that sasm doesn't support the "jle" and "jge" instructions. These are just synonyms for the existing "jng" and "jnl" mnemonics so are really easy to just add to the table:
▶ git diff sasm.asm sasm.c diff --git a/sasm.asm b/sasm.asm index 0ff60e9..91a920e 100644 --- a/sasm.asm +++ b/sasm.asm @@ -3456,8 +3456,12 @@ DispatchList: dw InstJCC db 'JL',0,0,0,0, CC_L dw InstJCC + db 'JGE',0,0,0, CC_NL + dw InstJCC db 'JNL',0,0,0, CC_NL dw InstJCC + db 'JLE',0,0,0, CC_NG + dw InstJCC db 'JNG',0,0,0, CC_NG dw InstJCC db 'JG',0,0,0,0, CC_G diff --git a/sasm.c b/sasm.c index b73deb2..4d6dad6 100644 --- a/sasm.c +++ b/sasm.c @@ -1919,8 +1919,10 @@ static const struct { { "JPE" , &InstJcc , JPE }, { "JPO" , &InstJcc , JPO }, { "JL" , &InstJcc , JL }, + { "JGE" , &InstJcc , JNL }, { "JNL" , &InstJcc , JNL }, { "JNG" , &InstJcc , JNG }, + { "JLE" , &InstJcc , JNG }, { "JG" , &InstJcc , JG }, };
I've added these (and xlat).
I was going to add support for "xlat" as well, but upon starting I noticed that sasm does already implement it as "xlatb", and documentation seems to indicate that's actually correct, so even though I've always seen it written just "xlat" in every assembler and debugger I've used before, I'm unsure whether or not it'd be appropriate to just add "xlat" as a synonym to "xlatb" as well.
I originally added it as xlatb since that's what nasm disassembles it to, but I agree that "xlat" is more natural.
From what I understand apparently Intel intended you to give some bogus argument to the "xlat" form for "documentation" that's then completely ignored and bx used instead, maybe something like "xlat [table]". Seems like a terribly dumb idea to me from a maintainability standpoint, when you could just put a comment instead, but yeah. Another source says it allows for an segment override, so I guess something like "xlat [es:bx]", which at least could be potentially useful although I'm not sure if that's true.
However, I'm guessing as to Intel's intended syntax as I can't find any actual complete examples, and I can't get any of these forms to work in nasm, only plain "xlat" with no argument as I've usually seen, which actually appears incorrect from the documentation, so, lol....
Yeah, I think xlat allows for segment override. The reason I like nasm syntax is that it isn't clunky, so I don't like a "dummy" argument that always has to be some for of [(seg:)bx]. The nasm way would be es xlat
(https://www.nasm.us/xdoc/2.16.03rc1/html/nasmdoc3.html#section-3.1), but that isn't supported (yet). As a workaround you can use:
db 0x26 ; ES segment override for..
xlat
If you search for "xlat" on this page, NASM apparently just treats xlat as a synonym for xlatb with no argument expected or possible. I've NEVER seen the form "xlat [some label that presumably is also loaded into bx but ain't nobody checking]", so, maybe other people thought it was a dumb idea too and didn't implement it.
I actually kind of like "es xlat". I don't have a use for it right now, but I'll have to try that (or it's db-bodged form).
Nice reference, x86 is such a nonorthogonal mess that whenever I do 8086 assembly there always ends up being a case at some point where I'm just trying different things to see if they assemble and then going, "oh well that's unfortunate, now I have to spill......am I sure there's not some other form that's faster?". The only CPU I've worked with where every supposed "general purpose" register has a caveat of "but you must use it when doing this instruction and oh you can't with that one".
A lot of my existing nasm code is using hex numbers in the format with the trailing 'h' (e.g. "0100h" instead of "0x100").
I don't know if this project is actively maintained, but it would be nice to have support for that alternate number format. I don't want to go through the whole forking process to be able to submit a pull request, but I did put together this little mod to add that in which seems to work fine as a potential starting place (I only bothered with SASM.COM, I'm sure doing this in the C version is fairly trivial):