Open kevinramharak opened 6 years ago
I tried to write out the grammar of the assembly syntax in EBNF. This might help fixing bugs in the assembler, while also stating a formal syntax.
I am probably missing some things and also have some notes:
.data MOV A, B MOV C, D DW 100 DUP(0x30) ;; or
this is perfectly valid syntax? would enforcing a instruction per line be a bad thing?
something like C escape sequences?
Would be nice to model the parser of the assembler to a documented syntax.
binary = '0' | '1' ; octal = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' ; non_zero_digit = '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ; digit = '0' | non_zero_digits ; hexadecimal = digits | 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'A' | 'B' | 'C' | 'D' | 'E' | 'F' ; lowercase = 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' | 'j' | 'k' | 'l' | 'm' | 'n' | 'o' | 'p' | 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z' ; uppercase = 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I' | 'J' | 'K' | 'L' | 'M' | 'N' | 'O' | 'P' | 'Q' | 'R' | 'S' | 'T' | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z' ; letter = lowercase | uppercase ; symbol = '`' | '~' | '!' | '@' | '#' | '$' | '%' | '^' | '&' | '*' | '(' | ')' | '-' | '+' | '=' | '{' | '}' | '[' | ']' | '|' | '\' | ':' | ';' | '<' | '>' | '?' | '/' ; escaped_double_quote = '\"' ; identifier_start = letter | '_' ; identifier_character = identifier_start | digit ; identifier = identifier_start , { identifier_character } ; label = identifier , ':' ; (* what about binary/octal literals? *) decimal_integer_literal = digit , { digit } ; hexadecimal_integer_literal = '0x' , hexadecimal , { hexadecimal } ; integer_literal = decimal_integer_literal | hexadecimal_integer_literal ; (* character literals? escape sequences? *) string_literal = '"' , { letter | digit | symbol | '_' | "'" | escaped_double_quote } , '"' ; literal = integer_literal | string_literal ; equ_directive = identifier , 'EQU' , integer_literal ; (* string and/or sequence (1,2,3,...) support? *) dup_operand = integer_literal , 'DUP(', integer_literal , ')' ; dw_directive = 'DW' , literal | dup_operand , { ',' , literal | dup_operand } ; text_section = '.' , 'text' ; data_section = '.' , 'data' ; section = text_section | data_section ; memory_reference = '[' , integer_literal | identifier , ']' ; mmemonic = (* list of all instructions *) operand = identifier | integer_literal | memory_reference ; instruction = mmemonic , { operand } ; (* instructions, sections and labels do not need to be on seperate lines? might be easier parsing if they do *) line = [ label | section ] , { instruction | dw_directive } ; program = { line }
I tried to write out the grammar of the assembly syntax in EBNF. This might help fixing bugs in the assembler, while also stating a formal syntax.
I am probably missing some things and also have some notes:
instruction per line
this is perfectly valid syntax? would enforcing a instruction per line be a bad thing?
character literals
164
binary literal integers
165
octal literal integers
166
escape sequences
something like C escape sequences?
conclusion
Would be nice to model the parser of the assembler to a documented syntax.
Grammar