This repository contains a VERY BASIC x86-64 assembler, which is capable of reading assembly-language input, and generating a staticly linked ELF binary output.
It is more a proof-of-concept than a useful assembler, but I hope to take it to the state where it can compile the kind of x86-64 assembly I produce in some of my other projects.
Currently the assembler will generate a binary which looks like this:
$ file a.out
a.out: ELF 64-bit LSB executable, x86-64, version 1 (SYSV)
statically linked, no section header
Why? I've written a couple of toy projects that generate assembly language programs, then pass them through an assembler:
The code in this repository was born out of the process of experimenting with generating an ELF binary directly. A necessary learning-process.
We don't support anywhere near the complete instruction-set which an assembly language programmer would expect. Currently we support only things like this:
add $REG, $REG
+ add $REG, $NUMBER
call $LABEL
dec $REG
inc byte ptr [$REG]
inc word ptr [$REG]
inc dword ptr [$REG]
inc qword ptr [$REG]
inc $REG
inc byte ptr [$REG]
inc word ptr [$REG]
inc dword ptr [$REG]
inc qword ptr [$REG]
jmp $LABEL
, je $LABEL
, jne $LABEL
mov $REG, $NUMBER
mov $REG, $REG
nop
push $NUMBER
, or push $IDENTIFIER
ret
push
- see jmp.asm for an example.sub $REG, $REG
+ sub $REG, $NUMBER
xor $REG, $REG
int $NUM
clc
, cld
, cli
, cmc
, stc
, std
, and sti
.Note that we really only support the following registers, you'll see that we only support the 64-bit registers (which means rax
is supported but eax
, ax
, ah
, and al
are specifically not supported):
rax
rcx
rdx
rbx
rsp
rbp
rsi
rdi
There is some support for the extended registers r8
-r15
, but this varies on a per-instruction basis and should not be relied upon.
There is support for storing fixed-data within our program, and locating that. See hello.asm for an example of that.
We also have some other (obvious) limitations:
push
" and "ret
", see jmp.asm for an example of that.data
section of the generated binary, but must be defined first.If you have this repository cloned locally you can build the assembler like so:
cd cmd/assembler
go build .
go install .
If you wish to fetch and install via your existing toolchain:
go get -u github.com/skx/assembler/cmd/assembler
You can repeat for the other commands if you wish:
go get -u github.com/skx/assembler/cmd/lexer
go get -u github.com/skx/assembler/cmd/parser
Of course these binary-names are very generic, so perhaps better to work locally!
Build the assembler:
$ cd cmd/assembler
$ go build .
Compile the sample program, and execute it showing the return-code:
$ cmd/assembler/assembler test.asm && ./a.out ; echo $?
9
Or run the hello.asm example:
$ cmd/assembler/assembler hello.in && ./a.out
Hello, world
Goodbye, world
You'll note that the \n
character was correctly expanded into a newline.
The core of our code consists of a small number of simple packages:
In addition to the package modules we also have a couple of binaries:
cmd/lexer
cmd/parser
cmd/assembler
These commands located beneath cmd
each operate the same way. They each take a single argument which is a file containing assembly-language instructions.
For example here is how you'd build and test the parser:
cd cmd/parser
go build .
$ ./parser ../../test.asm
&{{INSTRUCTION xor} [{REGISTER rax} {REGISTER rax}]}
&{{INSTRUCTION inc} [{REGISTER rax}]}
&{{INSTRUCTION mov} [{REGISTER rbx} {NUMBER 0x0000}]}
&{{INSTRUCTION mov} [{REGISTER rcx} {NUMBER 0x0007}]}
&{{INSTRUCTION add} [{REGISTER rbx} {REGISTER rcx}]}
&{{INSTRUCTION mov} [{REGISTER rcx} {NUMBER 0x0002}]}
&{{INSTRUCTION add} [{REGISTER rbx} {REGISTER rcx}]}
&{{INSTRUCTION int} [{NUMBER 0x80}]}
This is how you might add a new instruction to the assembler, for example you might add jmp 0x00000
or some similar instruction:
InstructionLengths
map to add the instruction.compiler/compiler.go
, inside the function compileInstruction
.
Launch the binary under gdb:
$ gdb ./a.out
Start it:
(gdb) starti
Starting program: /home/skx/Repos/github.com/skx/assembler/a.out
Program stopped.
0x00000000004000b0 in ?? ()
Dissassemble:
(gdb) x/5i $pc
Or show string-contents at an address:
(gdb) x/s 0x400000
Feel free to report, as this is more a proof of concept rather than a robust tool they are to be expected.
Specifically we're missing support for many instructions, but I hope the code generated for those that is present is correct.
Steve