rozukke / lace

An all-in-one assembler toolchain for the LC3 assembly language.
MIT License
12 stars 1 forks source link

Debugger proposal #43

Open dxrcy opened 1 week ago

dxrcy commented 1 week ago

Lace Debugger Proposal

The debugger can only be only ran on an assembly file; It cannot run directly on an object file, as it won't have access to predefined breakpoints or label names.

In debugger mode, the original object code is stored and not mutated. It is restored to memory with the reset command, and lines can be displayed with the source command. Note that the source file cannot be changed while running, whether by the debugger or the user modifying the file; the program must be run again.

All values will be printed as hex, signed decimal, unsigned decimal, binary, or ASCII representation (if printable; common control characters can use escape sequences).

All debug output will be printed to stderr.

Command-Line Options

The command-line subcommand lace debug behaves like lace run, except that it enables the debugger for the program.

The options -m or --minimal will cause the debugger to print minimal output (no color or fancy formatting). This will be useful for blackbox testing. For example, set commands will only print a hex value, rather than multiple representations.

The options -d or --debugger-input, followed by a string, will cause the debugger to use the string argument as its input, instead of stdin. This can allow a user to separate debugger commands from program input, which will be useful for blackbox testing.

Command Input

Input is by default taken from stdin. Users can use up/down keys to navigate command history. Stdin can be piped from a command or file to automate debugging. See above for --debugger-input command-line option.

Characters are read from input until a newline, semicolon, or EOF is reached. If a newline or semiclon is reached, the rest of the line will be read by the parsing of the next command. Semicolons are used to separate commands as they won't interfere with the syntax of the eval command.

If EOF is reached, the debugger treats this as a quit command, which stops the debugger but continues execution. This will prevent the debugger from blocking the normal program execution on the EOF of a piped stdin.

Command arguments are separated by spaces, without commas. Each command can have optional operands, and some commands will warn if too many arguments were given. Arguments can be integers, labels, register names, or --in the case of eval-- an arbitrary string including spaces.

Debugger Loop

In debugger mode, the progam will wait for a user command before executing each instruction (unless currently continuing).

Before a HALT is executed, the debugger will prompt the user for a command. The behaviour of some commands in this state is listed:

Commands

step COUNT=1 (t) Step next instruction or INTO subroutine.

next COUNT=1 (n) Step next instruction or OVER subroutine.

continue (c, cont) Continue until breakpoint or HALT.

finish (f, fin) Continue until end of subroutine, breakpoint, or HALT.

quit (q) Stop debugger and continue execution as normal.

exit (e, ^C) Exit debugger and simulator.

break list (b l, bl) List breakpoints.

break add (ADDRESS|LABEL+)=PC (b a, ba) Add breakpoint at an address/label/PC.

break remove (ADDRESS|LABEL+)=PC (b r, br) Remove breakpoint at an address/label/PC.

get (REGISTER|ADDRESS|LABEL+) (g) Print the value at a register, address, or label.

set (REGISTER|ADDRESS|LABEL+) VALUE (s) Set the value at a register, address, or label.

registers (r, reg) Print the value of all registers.

reset (no alias) Reset all memory and registers.

source COUNT=1 (ADDRESS|LABEL+)=PC (no alias) Print corresponding line and line number of source code from address/label/PC.

eval OPCODE OPERANDS... (no alias) Simulate an instruction. Note that labels cannot be created or modified.

Invalid command (h, help, *): Show available commands.

Argument Types

COUNT, ADDRESS, VALUE An integer literal of any supported base. Signed or unsigned.

LABEL+ A label name with an optional offset. Eg. Foo, Foo+2, Foo-2. Whitespace cannot appear between the label name and the offset, or it will be treated as a separate argument.

REGISTER A register name: r*, pc, cc.

PC The current program counter value. Used as a default value.

OPCODE OPERANDS... An assembly instruction, with the same form as found in a source file.

Breakpoints

Breakpoints can be defined with the .BREAK pseudo-op, or at runtime with a command. Both predefined and runtime-defined breakpoints are added to a global list, and the list is checked before executing the next instruction.

Labels

Labels are added to a global list, which can be queries by user commands.

State

// Could also be non-static and passed to functions
static mut DEBUGGER: Debugger = None;

// `None` if debugger is not enabled
type Debugger = Option<DebuggerState>;

struct DebuggerState {
    status: DebuggerStatus,

    intial_memory: Vec<u16>,

    breakpoints: Vec<u16>,
    labels: Vec<(String, u16)>,

    command_history: Vec<String>,
    cli_input: Option<String>, // TODO: Include cursor
}

enum DebuggerStatus {
    WaitForCommand,
    ContinueUntilBreakpoint,
    ContinueUntilEndOfSubroutine,
}

Examples

Set some memory addresses, run program until HALT (or breakpoint), and print a memory address. The EOF is equivalent to quit, and since the debugger is quit directly before a HALT, the program will reach the HALT and exit.

lace debug hw.asm << EOF
set x3100 #2
set x3101 #4
continue
get x3102
EOF

Use string argument for debugger input.

lace debug hw.asm -d "registers; continue"

Accommodations

The debugger prompt will always be printed on a new line. This can be guaranteed by remembering if the last printed character was a newline.

Features that can be implemented later

rozukke commented 4 days ago

Just a few thoughts on the proposal:

Overall very well considered and a good featureset to aim for. The main point of consideration is making the most common functions as easy to use as possible, which in this case would be memset and register set.

dxrcy commented 4 days ago

We have the ability to split out subcommands for the cli interface, so using an explicit debug command seems more ergonomic than having it as a flag.

Certainly.

...though we can use lowercase d for the debugger input flag.

How about -i/--input (or -c/--commands), since we have already given the debug subcommand explicitly.

Would reset also recompile?

Not necessarily. The object code can be saved (cloned) before being ran, and restored on reset. As long as all the memory is restored to its original state, as the user could write code that modifies itself.

Register and memory setting can have the same syntax that would make it a bit nicer.

Certainly. Not sure how I overlooked this. This also allowed the display(d) command to be renamed registers(r).

For memory setting, we might want a way to show a label address to save with an offset.

How about optionally specifying the offset value after the label? Eg. .Foo, Foo+2, Foo-2. Whitespace will not be allowed before the +/- symbols, to prevent parsing ambiguity. Perhaps other bases can be supported; the existing number parsing code can be re-used.

rozukke commented 3 days ago

Sounds good, reasonable improvements. I'll have a think about how this would integrate with the current codebase to figure out the first steps going forward.