sy2002 / QNICE-FPGA

QNICE-FPGA is a 16-bit computer system for recreational programming built as a fully-fledged System-on-a-Chip in portable VHDL.
http://qnice-fpga.com
Other
69 stars 16 forks source link

Recent emulator changes broke SD Card, FAT32 and Q-TRIS #53

Closed sy2002 closed 4 years ago

sy2002 commented 4 years ago

In the last few days, beginning from August 7, on the (not in use any more) branch dev-io-addr, Bernd and Michael worked together to improve the quality of the emulator to fit better to the super thorough (and great) test_programs/cpu_test.asm.

But this came at a price, as mentioned in issue #52: We broke the SD Card code, the FAT32 code and Q-TRIS at an unpredictable amount of lines of code. Just for your reference, here is the line count of the affected files:

TITANIUM-M5:QNICE-FPGA sy2002$ wc -l demos/q-tris.asm monitor/sd_library.asm monitor/fat32_library.asm 
    2228 demos/q-tris.asm
     131 monitor/sd_library.asm
    2134 monitor/fat32_library.asm
    4493 total

And: We might have even more affected files, because my smoke test was super short.

One thing is clear: I will not sift alone through 4493+ lines of code in search for obscure bugs. It can take ages. The FAT32 stuff is non-trivial and also Q-TRIS has its complicated parts. This needs to be a team effort or we need to be smart (and/or revert some decisions to change the behaviour and declare them as "this is how the ISA works ;-) )

It is good, that the three of us will meet soon in an online meeting, because the right way forward in this situation is not obvious and also not trivial, so that I think a verbal discussion / video call is much better then a written discussion in this issue.

For finding a stable basis - a point from where we could start our investigations - I rolled back qnice.c to the last point where my smoke test did not fail but work. @bernd-ulmann : Don't worry, none of your code changes since then is lost. :-) At least when we use the Git GUI that I use (SourceTree), there is a very convenient way to roll forward/backward, see differences, etc.

It did cost me quite some time (more than 1hr) just to find this last point of stable, so here is a wish, that I would like to share: Let's agree that any change to the semantics of any opcode any flag or any other ISA detail and (of course) the emulator automatically leads to someone of the three of us running the following smoke test right after the change was done:

cd emulator
rm qnice-vga
./run-vga.bash
<press enter>
<switch to the terminal screen and type> F R qbin/q-tris.out

Some important notes: rm qnice-vga is necessary to force a rebuild when running ./run-vga.bash. The latter one automatically downloads and mounts a FAT32 disk image.

And then: Play at least something like 60 seconds and clear one row, because funnily enough, one of the bugs that occured lead to not being able to clear any rows any more. Another one lead to the bricks just falling through. So just running Q-TRIS does not help. 60 seconds of gameplay are necessary and save hours of debugging later.

Here is what I found out (everything is related to our "ex branch" dev-io-addr):

The last known good version is from August, 1st and the commit #8711d11

Commit #af2552a from August 7th is interessting: Q-TRIS still works, but FAT32 and/or SD Card do not

Status right now: Branch develop is stable: Everything should work (hardware to be tested, because I was not able to), but all changes to the ISA and to the emulator in qnice.c since August, 1st are rolled back (but not lost).

bernd-ulmann commented 4 years ago

Hi Mirko - I think the current state of the Emulator should be our definition of "done" as the CPU-tests are not only pretty comprehensive but also consistent.

I am sure that we do not have to sift through 1000s of lines as the changes in the emulator were centered on the behaviour of R14, so my first guess is that some conditional branches/jumps are affected.

bernd-ulmann commented 4 years ago

These are the differences in the emulator that might give us some clues to what might go wrong now:

101a102,105

define NO_ADD_SUB_INSTRUCTION 0x0

define ADD_INSTRUCTION 0x1

define SUB_INSTRUCTION 0x2

127c131 < *gbl$sr_bits = "1XCZNVIM",

 *gbl$sr_bits = "1XCZNV--",

326a331,332 unsigned int value;

329c335,337 < return gbl$registers[address] | (address == 0xe ? 1 : 0); / The LSB of SR is always 1! /

value = gbl$registers[address] | (address == 0xe ? 1 : 0); /* The LSB of SR is always 1! */

else value = gbl$registers[address | ((read_register(SR) >> 4) & 0xFF0)]; 331c339 < return gbl$registers[address | ((read_register(SR) >> 4) & 0xFF0)];

return value & 0xffff; 697,698c705,732 < void update_status_bits(unsigned int destination, unsigned int source_0, unsigned int source_1, unsigned int control_bitmask) { < unsigned int sr_bits;

void update_status_bits(unsigned int destination, unsigned int source_0, unsigned int source_1, unsigned int control_bitmask, unsigned int operation) { unsigned int x, c, z, n, v, sr_bits;

sr_bits = read_register(SR);

if (!(control_bitmask & DO_NOT_MODIFY_X)) x = (destination & 0xffff) == 0xffff ? 1 : 0; else x = (sr_bits >> 1) & 1; // Retain old X-flag

if (!(control_bitmask & DO_NOT_MODIFY_CARRY)) c = destination & 0x10000 ? 1 : 0; else c = (sr_bits >> 2) & 1; // Retain old C-flag

z = !(destination & 0xffff) ? 1 : 0;

n = destination & 0x8000 ? 1 : 0;

// See http://www.righto.com/2012/12/the-6502-overflow-flag-explained.html for the logic behind this: if (!(control_bitmask & DO_NOT_MODIFY_OVERFLOW) && (operation == ADD_INSTRUCTION || operation == SUB_INSTRUCTION)) { if (operation == ADD_INSTRUCTION) v = ((source_0 & 0xffff) ^ (destination & 0xffff)) & ((source_1 & 0xffff) ^ (destination & 0xffff)) & 0x8000 ? 1 : 0; else if (operation == SUB_INSTRUCTION) v = ((source_0 & 0xffff) ^ (destination & 0xffff)) & (((~source_1) & 0xffff) ^ (destination & 0xffff)) & 0x8000 ? 1 : 0; } else v = (sr_bits >> 5) & 1; // Retain old V-flag 700,711c734,735 < sr_bits = 1; / LSB is always set (for unconditional branches and subroutine calls) / < if (((destination & 0xffff) == 0xffff) & !(control_bitmask & DO_NOT_MODIFY_X)) / X / < sr_bits |= 0x2; < if ((destination & 0x10000) && !(control_bitmask & DO_NOT_MODIFY_CARRY)) / C / < sr_bits |= 0x4; < if (!(destination & 0xffff)) / Z / < sr_bits |= 0x8; < if (destination & 0x8000) / N / < sr_bits |= 0x10; < if (((!(source_0 & 0x8000) && !(source_1 & 0x8000) && (destination & 0x8000)) || < ((source_0 & 0x8000) && (source_1 & 0x8000) && !(destination & 0x8000))) && !(control_bitmask & DO_NOT_MODIFY_OVERFLOW)) < sr_bits |= 0x20;

sr_bits &= 0xffc1; // Keep the upper 10 bits and the LSB. sr_bits |= (x << 1) | (c << 2) | (z << 3) | (n << 4) | (v << 5); // Set/clear all required flags 713c737 < write_register(SR, (read_register(SR) & 0xffc0) | (sr_bits & 0x3f));

write_register(SR, sr_bits); 779c803,804 < update_status_bits(destination, destination, destination, DO_NOT_MODIFY_CARRY | DO_NOT_MODIFY_OVERFLOW);

  update_status_bits(destination, destination, destination, DO_NOT_MODIFY_CARRY | DO_NOT_MODIFY_OVERFLOW, 
                     NO_ADD_SUB_INSTRUCTION);

786c811 < update_status_bits(destination, source_0, source_1, MODIFY_ALL);

  update_status_bits(destination, source_0, source_1, MODIFY_ALL, ADD_INSTRUCTION); 

793c818 < update_status_bits(destination, source_0, source_1, MODIFY_ALL);

  update_status_bits(destination, source_0, source_1, MODIFY_ALL, ADD_INSTRUCTION);

800c825 < update_status_bits(destination, source_0, source_1, MODIFY_ALL);

  update_status_bits(destination, source_0, source_1, MODIFY_ALL, SUB_INSTRUCTION);

807c832 < update_status_bits(destination, source_0, source_1, MODIFY_ALL);

  update_status_bits(destination, source_0, source_1, MODIFY_ALL, SUB_INSTRUCTION);

811,815c836,843 < source_0 = read_source_operand(source_mode, source_regaddr, FALSE); < destination = read_source_operand(destination_mode, destination_regaddr, TRUE); < for (i = 0; i < source_0; i++) { < temp_flag = (destination & 0x8000) >> 13; < destination = (destination << 1) | ((read_register(SR) >> 1) & 1); / Fill with X bit /

  if ((source_0 = read_source_operand(source_mode, source_regaddr, FALSE))) {
    destination = read_source_operand(destination_mode, destination_regaddr, TRUE);
    for (i = 0; i < source_0; i++) {
      temp_flag = (destination & 0x8000) >> 13;
      destination = (destination << 1) | ((read_register(SR) >> 1) & 1);          /* Fill with X bit */
    }
    write_register(SR, (read_register(SR) & 0xfffb) | temp_flag);                 /* Shift into C bit */
    write_destination(destination_mode, destination_regaddr, destination, FALSE);

817,818d844 < write_register(SR, (read_register(SR) & 0xfffb) | temp_flag); / Shift into C bit / < write_destination(destination_mode, destination_regaddr, destination, FALSE); 821,825c847,854 < scratch = source_0 = read_source_operand(source_mode, source_regaddr, FALSE); < destination = read_source_operand(destination_mode, destination_regaddr, TRUE); < for (i = 0; i < source_0; i++) { < temp_flag = (destination & 1) << 1; < destination = ((destination >> 1) & 0xffff) | ((read_register(SR) & 4) << 13); / Fill with C bit /

  if ((scratch = source_0 = read_source_operand(source_mode, source_regaddr, FALSE))) {
    destination = read_source_operand(destination_mode, destination_regaddr, TRUE);
    for (i = 0; i < source_0; i++) {
      temp_flag = (destination & 1) << 1;
      destination = ((destination >> 1) & 0xffff) | ((read_register(SR) & 4) << 13);  /* Fill with C bit */
    }
    write_register(SR, (read_register(SR) & 0xfffd) | temp_flag);                     /* Shift into X bit */
    write_destination(destination_mode, destination_regaddr, destination, FALSE);

827,828d855 < write_register(SR, (read_register(SR) & 0xfffd) | temp_flag); / Shift into X bit / < write_destination(destination_mode, destination_regaddr, destination, FALSE); 833c860 < update_status_bits(destination, source_0, source_0, DO_NOT_MODIFY_CARRY | DO_NOT_MODIFY_OVERFLOW);

  update_status_bits(destination, source_0, source_0, DO_NOT_MODIFY_CARRY | DO_NOT_MODIFY_OVERFLOW, NO_ADD_SUB_INSTRUCTION);

839c866 < update_status_bits(destination, source_0, source_0, DO_NOT_MODIFY_CARRY | DO_NOT_MODIFY_OVERFLOW);

  update_status_bits(destination, source_0, source_0, DO_NOT_MODIFY_CARRY | DO_NOT_MODIFY_OVERFLOW, NO_ADD_SUB_INSTRUCTION);

846c873 < update_status_bits(destination, source_0, source_1, DO_NOT_MODIFY_CARRY | DO_NOT_MODIFY_OVERFLOW);

  update_status_bits(destination, source_0, source_1, DO_NOT_MODIFY_CARRY | DO_NOT_MODIFY_OVERFLOW, NO_ADD_SUB_INSTRUCTION);

853c880 < update_status_bits(destination, source_0, source_1, DO_NOT_MODIFY_CARRY | DO_NOT_MODIFY_OVERFLOW);

  update_status_bits(destination, source_0, source_1, DO_NOT_MODIFY_CARRY | DO_NOT_MODIFY_OVERFLOW, NO_ADD_SUB_INSTRUCTION);

860c887 < update_status_bits(destination, source_0, source_1, DO_NOT_MODIFY_CARRY | DO_NOT_MODIFY_OVERFLOW);

  update_status_bits(destination, source_0, source_1, DO_NOT_MODIFY_CARRY | DO_NOT_MODIFY_OVERFLOW, NO_ADD_SUB_INSTRUCTION);

866,881c893,894 < < / CMP does NOT use the standard logic for setting the SR bits - this is done explicitly in the following: / < sr_bits = 1; / Take care of the LSB of SR which must be 1. / < < if (source_0 == source_1) sr_bits |= 0x0008; < if (source_0 > source_1) sr_bits |= 0x0010; < < / Ugly but it works: Convert the unsigned int source_0/1 to signed ints with possible sign extension: / < cmp_0 = source_0; < cmp_1 = source_1; < < if (source_0 & 0x8000) cmp_0 |= 0xffff0000; < if (source_1 & 0x8000) cmp_1 |= 0xffff0000; < if (cmp_0 > cmp_1) sr_bits |= 0x0020; < < write_register(SR, (read_register(SR) & 0xffc0) | (sr_bits & 0x3f));

  destination = source_0 - source_1;
  update_status_bits(destination, source_0, source_1, MODIFY_ALL, SUB_INSTRUCTION);
MJoergen commented 4 years ago

Here's a data point: I just tried with the latest version of the develop branch on the FPGA, and found no problems. I was able to:

This was using the keyboard input and VGA output.

This was perhaps not surprising since the breaking changes were all in the emulator. But it does show that the merge itself has not introduced any problems.

I'll do some more testing on Saturday, then we'll talk Saturday evening.

bernd-ulmann commented 4 years ago

Mirko and I had a long (and fruitful) debugging session with the emulator tonight and we changed some things "back" - at least for v1.6:

Stable release of the emulator. This version has the following features:

NEW:

A 16 entry ring buffer contains the last 16 addresses from which instructions were read. This buffer is printed in the statistics function.

CHANGED:

The behaviour of CMP has been reverted to the old (!) implementation (it is no longer just a SUB without write back of the result!). To be honest, I have no idea why the SUB variant did not work but will investigate this further.

The routine update_status_bits has also been changed back to the original routine! This results in a "forgetful" behaviour of the update_status_bits routine.

This is something we need to think about, too: Do we want this behaviour? I (Bernd) would opt for changing this behaviour so that flags in the SR will not be touched by an instruction if necessary, i.e. MOVE should leave C and V unchanged.

As "clean" as this implementation variant would be from an architectural point of view, it breaks our code on numerous places, so we need to decide whether we want to declare the "forgetful" behaviour of update_status_bits as being intentional.

bernd-ulmann commented 4 years ago

Mirko and I changed the emulator again tonight:

The emulator now uses the new update_status_bits() function that is no longer "forgetful" (and thus deviates from the current hardware behaviour).

BUT: The routine uses the OLD overflow logic again as the new (more correct) logic as described by Ken Shirriff breaks our existing code base!

The CMP implementation was also changed back to the old behaviour.

I (Bernd) am now fine with this version although we (Mirko, Michael, and I) should discuss the desired overflow logic on Saturday.

We propose to make this the basis of 1.6 and move any changes to the overflow/CMP logic to 1.7.

sy2002 commented 4 years ago

Small addition to Bernd's latest comment: There is a new issue #57 that is supposed to track our progress towards the "clean ISA everywhere" undertaking, so we can close this one. With the compromise described above, we will have the chance to release V1.6 before the end of September, because the software stack seems to work still, even though the no longer "forgetful" mode is implemented in the emulator.

TODO is still to add the "not forgetful" mode to hardware. Right now, the emulator and the hardware are differing in this detail, but still, binary compatibility is being given: Monitor works, FAT32 works, Q-TRIS works, interrupts work on both: emulator and hardware.

And @MJoergen for V1.6 being consistent between the emulator and the hardware: you might want to "correct" the handful or so failing CPU tests (document them with a hint that points to issue #57 on GitHub that this will be fixed later), so that we still have a CPU test in V1.6 that ensures consistent behaviour of FPGA and emulator while ensuring that the existing software stack works and the elephant in the room will be sliced in V1.7 then.

MJoergen commented 4 years ago

Since there is still work to do (change the cpu_test.asm to be consistent with V1.6) I'll re-open this issue and reassign to me.

MJoergen commented 4 years ago

Ok. I've now updated the file cpu_test.asm to match the latest version of the emulator. I assume this will be the target for V1.6.

Changes made to cpu_test.asm for V1.6 are: