Output a full memory dump from the assembler?

danya02 commented 2 years ago

I would like to use the assembler program to write programs to run on an Intel8080-based computer emulator. My emulator accepts 65536-byte files that are loaded into memory, with the top 2048 bytes occupied by the "operating system" and overwriting the first 2048 bytes of the dump.

The assembler program outputs files that are as large as needed to fit the assembled program and no larger. The ORG pseudo-instruction only informs the calculation of label offsets, and does not create blank spaces in the resulting file. If there is more than one ORG line, there is no separation between the parts where different ORGs apply.

How can I get 65536-sized files from the assembler that form the entire memory space of my emulator, so that I could write the code in the parts of memory I need, and not bother with the gaps? I have already discovered the DS pseudo-instruction, which inserts a specified number of zero bytes, but then I'd have to count the number of bytes in my code to choose the appropriate number of padding bytes. I'd like to do it with an ORG syntax, especially since the I8080 Programmer's Guide says on page 40 that DS and ORG can be used for similar purposes.

pamoroso commented 2 years ago

Other than with ORG and DS, I can't think of any other way of doing it either. But, if I recall correctly, the assembler of the ASM80 emulator lets you download 64K files when exporting assembled binaries, I think with the Download BIN option. Maybe it does what you need.

danya02 commented 2 years ago

I think I got what I need by patching asm80.py: instead of a bytes, we now output into a bytearray that's 64K in size, and the ORG instruction seeks within the bytearray to write. Perhaps you could consider adding something like this as a feature?

--- asm80.py    2022-08-14 09:59:06.606471850 +0000
+++ asm80e.py   2022-08-14 10:26:47.959513820 +0000
@@ -13,11 +13,29 @@

 # This is a 2-pass assembler, so keep track of which pass we're in.
 source_pass = 1

 # Assembled machine code.
-output = b''
+output = bytearray(65536)
+written_byte = [False for _ in range(65536)]
+write_address = 0
+def write(data):
+    global write_address
+    #print(f'asm80> writing {data.hex()} at {write_address:04x}')
+    for index in range(len(data)):
+        if not written_byte[write_address]:
+            output[write_address] = data[index]
+            #print(f'asm80> writing {data[index]:02x} at {write_address:04x}')
+            written_byte[write_address] = True
+            write_address += 1
+        else:
+            report_error(f'byte 0x{write_address + index:04x} has been written to more than once, check your assembly code')
+
+
+def seek(new_address):
+    global write_address
+    write_address = new_address

 # Tokens
 label = ''
 mnemonic = ''
 operand1 = ''
@@ -427,11 +445,11 @@
         address += instruction_size
     else:
         # Pass 2. Output the byte representing the opcode. For instructions with
         # additional arguments or data we'll output that in a separate function.
         if output_byte != b'':
-            output += output_byte
+            write(output_byte)

 def add_label():
     """Add a label to the symbol table."""
     global symbol_table
@@ -1036,21 +1054,21 @@
                     add_label()
                     should_add_label = False
                 address += string_length
             else:
                 # Strip enclosing ' characters when adding to output.
-                output += bytes(argument[1:-1], encoding='utf-8')
+                write(bytes(argument[1:-1], encoding='utf-8'))
                 address += string_length
         # Label.
         else:
             if source_pass == 2:
                 symbol = argument.lower()
                 if symbol not in symbol_table:
                     report_error(f'undefined label "{argument}"')
                 value = symbol_table[symbol]
                 value_size = 1 if (0 <= value <= 255) else 2
-                output += value.to_bytes(value_size, byteorder='little')
+                write(value.to_bytes(value_size, byteorder='little'))
                 address += value_size

 def parse_db_arguments(string):
     """Return a list of ``db`` arguments parsed from string.
@@ -1090,11 +1108,11 @@
                                         else symbol_table.get(operand1.lower(), -1)
     # Label must be defined before use.
     if storage_size < 1:
         report_error(f'invalid "ds" operand or forward reference')
     if source_pass == 2:
-        output += bytes(storage_size)
+        write(bytes(storage_size))
     address += storage_size

 def dw():
     global address
@@ -1165,18 +1183,22 @@

     check_operands(operand1 != '' and (label == operand2 == ''))
     if operand1[0].isdigit():
         if source_pass == 1:
             address = get_number(operand1)
+        else:
+            seek(get_number(operand1))
     # Label, which must be defined before use.
     elif operand1[0].isalpha():
+        value = symbol_table.get(operand1.lower(), -1)
+        if not value:
+            report_error(f'invalid "org" address "{value}')
         if source_pass == 1:
-            value = symbol_table.get(operand1.lower(), -1)
-            if value:
-                address = value
-            else:
-                report_error(f'invalid "org" address "{value}')
+            address = value
+        else:
+            seek(value)
+
     else:
         report_error(f'invalid "org" operand "{operand1}"')

 # Skipped.
@@ -1261,11 +1283,11 @@
             report_error(f'undefined label "{operand}"')
         number = symbol_table[operand]

     if source_pass == 2:
         operand_size = 1 if operand_type == IMMEDIATE8  else 2
-        output += number.to_bytes(operand_size, byteorder='little')
+        write(number.to_bytes(operand_size, byteorder='little'))

 # BUG: doesn't work with immediate addresses like ffh; labels aren't added to
 # symbol table as pass 1 isn't handled.
 def address16():
@@ -1280,11 +1302,11 @@
         number = symbol_table.get(operand1.lower(), -1)
         if source_pass == 2 and number < 0:
             report_error(f'undefined label "{operand1}"')

     if source_pass == 2:
-        output += number.to_bytes(2, byteorder='little')
+        write(number.to_bytes(2, byteorder='little'))

 def get_number(input):
     """Return value of hex or decimal numeric input string."""
     if input.endswith(('h', 'H')):

pamoroso commented 2 years ago

That's interesting, thanks. However, I'll not implement such a feature as I'm going the opposite direction, i.e. trimming uninitialized data.

danya02 commented 2 years ago

I see, but doesn't this only affect the zero bytes at the end? I believe my emulator will accept files smaller than 65536 bytes (though I haven't tested that yet), and DS directives inside the file would still result in zero bytes being generated, right?

Personally, I think a more elegant solution would be that every ORG and DS directive marks the beginning of a segment, and that segment is written to a separate file with a note saying where the contents of that file should be available in memory. It's then up to the user of these files to decide whether to compose them into a single memory image or load them separately into memory. The reason this should be allowed is that the Programmer's Guide specifically says that you must not rely on the memory contents in areas skipped by ORG or DS. However, I've never worked with CP/M so I don't know if this is feasible to use on that platform, and in any case it adds a lot of complexity which would for my educational purposes much rather be sidestepped -- which is why I want a full memory image in the first place.

pamoroso commented 2 years ago

Yes, the upcoming feature affects only the zero bytes ad the end and DS directives inside the file will still generate zero bytes. What I meant is I prefer the assembler not to output a full or nearly full memory image. In addition, I plan a major rewrite of the assembler.

pamoroso / suite8080

Output a full memory dump from the assembler? #2