Open pcordes opened 3 years ago
Hi Peter,
Thanks very much for your detailed comments and analysis!
I've dealt with the first item (the bug in the comment), and noted the origin of this example -- that's what I get for copy&paste!
It has been a long time since I did any Intel assembly coding, and this is actually my first x86_64-specific toy. Most of my practical experience with assembler is way back when on pdp11, vax, 6502, 1802, etc. and ancient x86, so I definitely appreciate your insight!
BTW, I like the idea of storing the length of the string in memory for other purposes, i.e. not just having a constant in the current assembly unit, so I'll probably keep that as an example, but I'll add a comment about avoiding the storage and using a constant instead.
Someone linked https://github.com/robohack/experiments/blob/430b5ea22bc2f4f697c659aeb399e938d09744c1/thello.s for an example of a BSD build command, which is why I'm randomly looking at it.
It has one bug (in a comment): syscall definitely can't take an arg in RCX, the syscall instruction itself destroys RCX before the kernel gets control. ( https://stackoverflow.com/questions/32253144/why-is-rcx-not-used-for-passing-parameters-to-system-calls-being-replaced-with) Linux uses R10 instead of RCX, with the rest of the convention matching the function-calling convention. I'd guess most other x86-64 SysV OSes do the same, but I don't know for sure.
Separately from that:
RSP is already aligned by 16 on process entry, as guaranteed by the x86-64 System V ABI.
You can
mov $4, %eax
to do this more efficiently (implicit zero-extension to 64-bit), especially if you're later trying to optimize by merging a length into the low by of RDX (which most kernels zero on process entry). Also, you can#include <sys/syscall.h>
to get call numbers as CPP macro #defines, so you canmov $SYS_write, %eax
. (Call your file.S
so gcc will run it through CPP first).You can use
as -O2
or-Os
to do simple optimizations likemov $4, %rax
intomov $4, %eax
like NASM does, because the architectural effect is identical. (If using GCC,-Wa,-O2
, notgcc -O2
)Using a 32-bit sign-extended immediate for an absolute address is possible but inefficient. Normally you'd use
lea hello(%rip), %rsi
, ormov $hello, %esi
(if 32-bit sign-extended works, so does zero-extended, assuming user-space using the bottom of the virtual address space, not the top.) https://stackoverflow.com/questions/57212012/how-to-load-address-of-function-or-label-into-registerAgain, 32-bit operand-size is 100% fine, especially for the xor since
exit()
takes an int arg. See my answer on https://stackoverflow.com/questions/33666617/what-is-the-best-way-to-set-a-register-to-zero-in-x86-assembly-xor-mov-or-andPutting a constant byte in static storage is just silly; make it an assemble time constant you can use as an immediate like
mov $hello_len, %edx
(Or %rdx if you want).So
becomes