thello.s: sizes of constant strings should use .equ, not loading a byte from data memory, and various other optimizations.

Someone linked https://github.com/robohack/experiments/blob/430b5ea22bc2f4f697c659aeb399e938d09744c1/thello.s for an example of a BSD build command, which is why I'm randomly looking at it.

It has one bug (in a comment): syscall definitely can't take an arg in RCX, the syscall instruction itself destroys RCX before the kernel gets control. ( https://stackoverflow.com/questions/32253144/why-is-rcx-not-used-for-passing-parameters-to-system-calls-being-replaced-with) Linux uses R10 instead of RCX, with the rest of the convention matching the function-calling convention. I'd guess most other x86-64 SysV OSes do the same, but I don't know for sure.

# SYSCALL ARGS
# rdi rsi rdx r10 r8 r9
  # wrong original: # rdi rsi rdx rcx r8 r9    # that's the function-calling convention.

Separately from that:

    andq $-16, %rsp     # clear the 4 least significant bits of stack pointer to align it

RSP is already aligned by 16 on process entry, as guaranteed by the x86-64 System V ABI.

    mov $4, %rax        # SYS_write
    mov $1, %rdi

You can mov $4, %eax to do this more efficiently (implicit zero-extension to 64-bit), especially if you're later trying to optimize by merging a length into the low by of RDX (which most kernels zero on process entry). Also, you can #include <sys/syscall.h> to get call numbers as CPP macro #defines, so you can mov $SYS_write, %eax. (Call your file .S so gcc will run it through CPP first).

You can use as -O2 or -Os to do simple optimizations like mov $4, %rax into mov $4, %eax like NASM does, because the architectural effect is identical. (If using GCC, -Wa,-O2, not gcc -O2)

    mov $hello, %rsi

Using a 32-bit sign-extended immediate for an absolute address is possible but inefficient. Normally you'd use lea hello(%rip), %rsi, or mov $hello, %esi (if 32-bit sign-extended works, so does zero-extended, assuming user-space using the bottom of the virtual address space, not the top.) https://stackoverflow.com/questions/57212012/how-to-load-address-of-function-or-label-into-register

    mov $1, %rax        # SYS_exit
    xor %rdi, %rdi

Again, 32-bit operand-size is 100% fine, especially for the xor since exit() takes an int arg. See my answer on https://stackoverflow.com/questions/33666617/what-is-the-best-way-to-set-a-register-to-zero-in-x86-assembly-xor-mov-or-and

Putting a constant byte in static storage is just silly; make it an assemble time constant you can use as an immediate like mov $hello_len, %edx (Or %rdx if you want).

.section .data       # could be .section .rodata

hello:
    .ascii "Hello world!\n"
hello_len = . - hello
# .equ hello_len,  . - hello     # alternative using .equ

    #.byte . - hello

    mov hello_len, %dl  # Note: does not clear upper bytes. Use movzxb (move zero extend) for that

becomes

    mov $hello_len, %edx       # zero-extends to fill RDX

robohack / experiments

thello.s: sizes of constant strings should use .equ, not loading a byte from data memory, and various other optimizations. #2