robohack / experiments

A collection of code experiments and tests.
2 stars 0 forks source link

thello.s: sizes of constant strings should use .equ, not loading a byte from data memory, and various other optimizations. #2

Open pcordes opened 3 years ago

pcordes commented 3 years ago

Someone linked https://github.com/robohack/experiments/blob/430b5ea22bc2f4f697c659aeb399e938d09744c1/thello.s for an example of a BSD build command, which is why I'm randomly looking at it.

It has one bug (in a comment): syscall definitely can't take an arg in RCX, the syscall instruction itself destroys RCX before the kernel gets control. ( https://stackoverflow.com/questions/32253144/why-is-rcx-not-used-for-passing-parameters-to-system-calls-being-replaced-with) Linux uses R10 instead of RCX, with the rest of the convention matching the function-calling convention. I'd guess most other x86-64 SysV OSes do the same, but I don't know for sure.

# SYSCALL ARGS
# rdi rsi rdx r10 r8 r9
  # wrong original: # rdi rsi rdx rcx r8 r9    # that's the function-calling convention.

Separately from that:

    andq $-16, %rsp     # clear the 4 least significant bits of stack pointer to align it

RSP is already aligned by 16 on process entry, as guaranteed by the x86-64 System V ABI.

    mov $4, %rax        # SYS_write
    mov $1, %rdi

You can mov $4, %eax to do this more efficiently (implicit zero-extension to 64-bit), especially if you're later trying to optimize by merging a length into the low by of RDX (which most kernels zero on process entry). Also, you can #include <sys/syscall.h> to get call numbers as CPP macro #defines, so you can mov $SYS_write, %eax. (Call your file .S so gcc will run it through CPP first).

You can use as -O2 or -Os to do simple optimizations like mov $4, %rax into mov $4, %eax like NASM does, because the architectural effect is identical. (If using GCC, -Wa,-O2, not gcc -O2)

    mov $hello, %rsi

Using a 32-bit sign-extended immediate for an absolute address is possible but inefficient. Normally you'd use lea hello(%rip), %rsi, or mov $hello, %esi (if 32-bit sign-extended works, so does zero-extended, assuming user-space using the bottom of the virtual address space, not the top.) https://stackoverflow.com/questions/57212012/how-to-load-address-of-function-or-label-into-register

    mov $1, %rax        # SYS_exit
    xor %rdi, %rdi

Again, 32-bit operand-size is 100% fine, especially for the xor since exit() takes an int arg. See my answer on https://stackoverflow.com/questions/33666617/what-is-the-best-way-to-set-a-register-to-zero-in-x86-assembly-xor-mov-or-and

Putting a constant byte in static storage is just silly; make it an assemble time constant you can use as an immediate like mov $hello_len, %edx (Or %rdx if you want).

.section .data       # could be .section .rodata

hello:
    .ascii "Hello world!\n"
hello_len = . - hello
# .equ hello_len,  . - hello     # alternative using .equ

    #.byte . - hello

So

    mov hello_len, %dl  # Note: does not clear upper bytes. Use movzxb (move zero extend) for that

becomes

    mov $hello_len, %edx       # zero-extends to fill RDX
robohack commented 3 years ago

Hi Peter,

Thanks very much for your detailed comments and analysis!

I've dealt with the first item (the bug in the comment), and noted the origin of this example -- that's what I get for copy&paste!

It has been a long time since I did any Intel assembly coding, and this is actually my first x86_64-specific toy. Most of my practical experience with assembler is way back when on pdp11, vax, 6502, 1802, etc. and ancient x86, so I definitely appreciate your insight!

BTW, I like the idea of storing the length of the string in memory for other purposes, i.e. not just having a constant in the current assembly unit, so I'll probably keep that as an example, but I'll add a comment about avoiding the storage and using a constant instead.