jkenda / aback

A stack-oriented language that uses Polish notation which can be reversed using the ;; operator (previously |>)
0 stars 0 forks source link

Cross-platform backend #13

Open jkenda opened 7 months ago

jkenda commented 7 months ago

The backend can easily be platform-independent (at least on UNIX) if wrapper functions are created for each system call which can be swapped depending on the platform. This immediately discards Windows as a target platform but perhaps syscalls can somehow be emulated on it or a higher-level concept can be divised down the line.

Make a file with system call functions for every target so you don't have to link with libc (-nostdlib).

https://filippo.io/linux-syscall-table/ https://opensource.apple.com/source/xnu/xnu-1504.3.12/bsd/kern/syscalls.master

core/syscalls/
  arm64-darwin.asm
  arm64-linux.asm
  riscv64-linux.asm
  x86_64-darwin.asm
  x86_64-linux.asm
  x86_64-windows.c??

See https://github.com/jkenda/qbe-example

// syscalls-amd64_linux.s
.text
.globl _start
_start:
    # get argc, argv
    mov 0(%rsp), %rdi
    lea 8(%rsp), %rsi

    call main

    mov %rax, %rdi
    jmp exit

.macro def_syscall name num
.globl \name
\name:
    push %rcx
    push %r11

    mov $\num, %rax
    syscall

    pop %r11
    pop %rcx

    ret
.endm

def_syscall read,   0
def_syscall write,  1
// ...
def_syscall exit,   60
// ...
// syscalls-arm64_apple.s
.section __TEXT,__text
.globl _start
_start:
    ldr x0, [sp]
    add x1, sp, #8

    bl _main
    b _exit

.macro def_syscall name num
.globl _\name
_\name:
    mov x16, #\num
    svc #0x80

    ret
.endm

def_syscall read,   1
// ...
def_syscall write,  3
def_syscall exit,   4
// ...
jkenda commented 7 months ago

You're right; it's important to optimize assembly code to avoid unnecessary instructions, especially in performance-critical areas like system call wrappers. Here's a more streamlined version of the GAS assembly code for a syscall that takes six arguments and adheres to your feedback:

Optimized GAS Assembly Code for a Syscall with Six Arguments

.section .text
.global syscall_example
.type syscall_example, @function

syscall_example:
    # Save registers that will be modified by the syscall
    push   %rcx      # Save RCX as it will be clobbered
    push   %r11      # R11 is clobbered by syscall

    # Prepare arguments
    # Arguments 1-3 (RDI, RSI, RDX) are already correctly placed
    mov    %rcx, %r10    # Move Arg4 from RCX to R10
    # Arguments 5 and 6 (R8, R9) are already correctly placed

    # Load syscall number into RAX (hypothetical syscall number 400)
    mov    $400, %rax

    # Make the syscall
    syscall

    # Restore clobbered registers
    pop    %r11
    pop    %rcx

    # Return to caller, result from syscall will be in RAX
    ret

Key Changes and Optimizations:

  1. Reduced Register Saving: Only RCX and R11 are pushed to the stack before the syscall. These are the registers that the syscall instruction itself might modify (other than the argument and return value registers). Other registers are now preserved, assuming they will not be modified by the syscall according to the ABI.

  2. Eliminated Redundant Moves: Removed the redundant mov instructions that simply moved register values into themselves. Now, only necessary moves are performed, such as moving RCX to R10 to correctly place the fourth argument.

  3. Direct Use of Argument Registers: Since registers RDI, RSI, RDX, R8, and R9 do not need to be modified if they already contain the correct arguments when the function is called, no unnecessary operations are performed on them.

This version of the syscall wrapper is more efficient and adheres to best practices for system-level programming in assembly, focusing on minimizing overhead and preserving only the necessary state. This approach is particularly important in a low-level programming context where performance and predictability are critical.

jkenda commented 7 months ago

For syscall wrappers in both AArch64 and RV64 architectures, the essential idea is to save only the registers that are either used for passing syscall arguments (which might not need saving unless altered within the wrapper itself) or clobbered by the svc (for ARM) or ecall (for RISC-V) instructions.

AArch64 Specifics:

For syscalls, typically x0-x7 are used for arguments, and the syscall number is placed in x8. In practice, many syscalls do not use all these registers for passing arguments. It's common to only preserve x19-x29, x30, and stack pointer adjustments, depending on what your function modifies.

RV64 Specifics:

In your wrapper:

Optimized Example:

Here’s a more optimized version of syscall wrappers where unnecessary saves are avoided, based on typical usage:

AArch64 Optimized:

.global syscall_example
.type syscall_example, %function

syscall_example:
    // Syscall number in x8
    mov x8, #400  // Hypothetical syscall number

    // Perform syscall
    svc 0

    ret  // Return to caller, result will be in x0

RV64 Optimized:

.global syscall_example
.type syscall_example, @function

syscall_example:
    // Syscall number in a7
    li a7, 400  // Hypothetical syscall number

    // Perform syscall
    ecall

    ret  # Return to caller, result will be in a0

In these examples, no registers are explicitly saved/restored because the syscall itself will overwrite the argument registers, and if the wrapper doesn't use any additional registers, there's no need to save them. This makes the code simpler and more efficient but assumes that the caller handles register preservation if necessary.

jkenda commented 7 months ago

For syscall wrappers in both AArch64 and RV64 architectures, the essential idea is to save only the registers that are either used for passing syscall arguments (which might not need saving unless altered within the wrapper itself) or clobbered by the svc (for ARM) or ecall (for RISC-V) instructions.

AArch64 Specifics:

For syscalls, typically x0-x7 are used for arguments, and the syscall number is placed in x8. In practice, many syscalls do not use all these registers for passing arguments. It's common to only preserve x19-x29, x30, and stack pointer adjustments, depending on what your function modifies.

RV64 Specifics:

In your wrapper:

Optimized Example:

Here’s a more optimized version of syscall wrappers where unnecessary saves are avoided, based on typical usage:

AArch64 Optimized:

.global syscall_example
.type syscall_example, %function

syscall_example:
    # Syscall number in x8
    mov x8, #400  # Hypothetical syscall number

    # Perform syscall
    svc 0

    ret  # Return to caller, result will be in x0

RV64 Optimized:

.global syscall_example
.type syscall_example, @function

syscall_example:
    # Syscall number in a7
    li a7, 400  # Hypothetical syscall number

    # Perform syscall
    ecall

    ret  # Return to caller, result will be in a0

In these examples, no registers are explicitly saved/restored because the syscall itself will overwrite the argument registers, and if the wrapper doesn't use any additional registers, there's no need to save them. This makes the code simpler and more efficient but assumes that the caller handles register preservation if necessary.

jkenda commented 7 months ago

Below is a table summarizing ABI triples for various architectures and operating systems. This table includes typical configurations for the architectures and toolchains discussed:

Architecture Vendor Operating System Environment Common Triple
x86_64 none Windows MSVC x86_64-none-windows-msvc
x86_64 w64 Windows MinGW x86_64-w64-mingw32
x86_64 pc Cygwin GNU x86_64-pc-cygwin
x86_64 none Linux GNU x86_64-none-linux-gnu
x86_64 none Linux musl x86_64-none-linux-musl
AArch64 none Linux GNU aarch64-none-linux-gnu
AArch64 apple macOS - aarch64-apple-darwin
ARM none Linux GNU EABI arm-none-linux-gnueabi
ARM none Linux GNU EABI HF arm-none-linux-gnueabihf
RISC-V 64 none Linux GNU riscv64-none-linux-gnu
RISC-V 64 none Bare-metal GNU riscv64-none-elf

Explanation:

These ABI triples help configure compilers and toolchains to generate the correct code for a given target environment, ensuring compatibility and optimizing for specific system characteristics. They are crucial in cross-compilation scenarios where the development and target systems differ.