stardot / b-em

An opensource BBC Micro emulator for Win32 and Linux
http://stardot.org.uk/forums/viewtopic.php?f=4&t=10823
GNU General Public License v2.0
117 stars 58 forks source link

NS32016: Crash attempting to install Panos #91

Closed SteveFosdick closed 5 years ago

SteveFosdick commented 5 years ago

Attempting a fresh install of Panos I created a blank 10M SCSI hard disc with hdfmt then started b-em so this would be the default hard disc, selected the 32016 processor and used D-Break to select DFS. Then at the * prompt, Panos to start Panos and at the Panos prompt after the Panos 1.4 banner install adfs as per the cambridge processor user guide.

After starting to execute the fcopy command and quite a bit floppy noise the following messages appeared on stdout:

[pc=F00CED] Writing outside of RAM @ FFFFFF = 00
[pc=F00CEB] Writing outside of RAM @ FFFFF9 = 00
[pc=F00CEB] Writing outside of RAM @ FFFFFB = 00
[pc=F00CEB] Writing outside of RAM @ FFFFF5 = 0C
[pc=F00CEB] Writing outside of RAM @ FFFFF7 = 00
[pc=F00CF1] Writing outside of RAM @ FFFFF1 = 0C
[pc=F00CF1] Writing outside of RAM @ FFFFF3 = 00
[pc=F00D07] Writing outside of RAM @ FFFFEC = B2
[pc=F00D07] Writing outside of RAM @ FFFFED = D3
[pc=F00D07] Writing outside of RAM @ FFFFEE = 00
[pc=F00D07] Writing outside of RAM @ FFFFEF = 00
[pc=F00D0C] Bad Read @ fffff1
[pc=F00D0C] Bad Read @ fffff3
[pc=F00D11] Writing outside of RAM @ FFFFE8 = 20
[pc=F00D11] Writing outside of RAM @ FFFFE9 = 00
[pc=F00D11] Writing outside of RAM @ FFFFEA = E6
[pc=F00D11] Writing outside of RAM @ FFFFEB = 0C
[pc=F00D15] Writing outside of RAM @ FFFFE4 = 00
[pc=F00D15] Writing outside of RAM @ FFFFE5 = 00
[pc=F00D15] Writing outside of RAM @ FFFFE6 = 00
[pc=F00D15] Writing outside of RAM @ FFFFE7 = 00
[pc=F00D18] Writing outside of RAM @ FFFFEA = 00
[pc=F00D18] Writing outside of RAM @ FFFFEB = 00
[pc=F00D1C] Writing outside of RAM @ FFFFE4 = 30
[pc=F00D1C] Writing outside of RAM @ FFFFE5 = 00
[pc=F00D1C] Writing outside of RAM @ FFFFE6 = 04
[pc=F00D1C] Writing outside of RAM @ FFFFE7 = 00
[pc=3A] Writing outside of RAM @ FFFFE4 = FA
[pc=3A] Writing outside of RAM @ FFFFE5 = 00
[pc=D413] Writing outside of RAM @ 13D236 = 00

Nothing further happened after this. The b-emlog file contains:

18/09/2019 21:05:13 WARNING [pc=F00CED] Writing outside of RAM @ FFFFFD = 0C
18/09/2019 21:05:13 WARNING [pc=F00CED] Writing outside of RAM @ FFFFFF = 00
18/09/2019 21:05:13 WARNING [pc=F00CEB] Writing outside of RAM @ FFFFF9 = 00
18/09/2019 21:05:13 WARNING [pc=F00CEB] Writing outside of RAM @ FFFFFB = 00
18/09/2019 21:05:13 WARNING [pc=F00CEB] Writing outside of RAM @ FFFFF5 = 0C
18/09/2019 21:05:13 WARNING [pc=F00CEB] Writing outside of RAM @ FFFFF7 = 00
18/09/2019 21:05:13 WARNING [pc=F00CF1] Writing outside of RAM @ FFFFF1 = 0C
18/09/2019 21:05:13 WARNING [pc=F00CF1] Writing outside of RAM @ FFFFF3 = 00
18/09/2019 21:05:13 WARNING [pc=F00D07] Writing outside of RAM @ FFFFEC = B2
18/09/2019 21:05:13 WARNING [pc=F00D07] Writing outside of RAM @ FFFFED = D3
18/09/2019 21:05:13 WARNING [pc=F00D07] Writing outside of RAM @ FFFFEE = 00
18/09/2019 21:05:13 WARNING [pc=F00D07] Writing outside of RAM @ FFFFEF = 00
18/09/2019 21:05:13 WARNING [pc=F00D0C] Bad Read @ fffff1
18/09/2019 21:05:13 WARNING [pc=F00D0C] Bad Read @ fffff3
18/09/2019 21:05:13 WARNING [pc=F00D11] Writing outside of RAM @ FFFFE8 = 20
18/09/2019 21:05:13 WARNING [pc=F00D11] Writing outside of RAM @ FFFFE9 = 00
18/09/2019 21:05:13 WARNING [pc=F00D11] Writing outside of RAM @ FFFFEA = E6
18/09/2019 21:05:13 WARNING [pc=F00D11] Writing outside of RAM @ FFFFEB = 0C
18/09/2019 21:05:13 WARNING [pc=F00D15] Writing outside of RAM @ FFFFE4 = 00
18/09/2019 21:05:13 WARNING [pc=F00D15] Writing outside of RAM @ FFFFE5 = 00
18/09/2019 21:05:13 WARNING [pc=F00D15] Writing outside of RAM @ FFFFE6 = 00
18/09/2019 21:05:13 WARNING [pc=F00D15] Writing outside of RAM @ FFFFE7 = 00
18/09/2019 21:05:13 WARNING [pc=F00D18] Writing outside of RAM @ FFFFEA = 00
18/09/2019 21:05:13 WARNING [pc=F00D18] Writing outside of RAM @ FFFFEB = 00
18/09/2019 21:05:13 WARNING [pc=F00D1C] Writing outside of RAM @ FFFFE4 = 30
18/09/2019 21:05:13 WARNING [pc=F00D1C] Writing outside of RAM @ FFFFE5 = 00
18/09/2019 21:05:13 WARNING [pc=F00D1C] Writing outside of RAM @ FFFFE6 = 04
18/09/2019 21:05:13 WARNING [pc=F00D1C] Writing outside of RAM @ FFFFE7 = 00
18/09/2019 21:05:13 INFO Bad NS32016 gen mode
18/09/2019 21:05:13 INFO R0=00000000 R1=000000FA R2=000FD3B2 R3=00000006
18/09/2019 21:05:13 INFO R4=00039728 R5=00000000 R6=000076B4 R7=00000033
18/09/2019 21:05:13 INFO PC=0000003A SB=00040030 SP=00000000 TRAP=00000000
18/09/2019 21:05:13 INFO FP=000FD236 INTBASE=000FFE64 PSR=0040 MOD=0000
18/09/2019 21:05:13 INFO F0=0.000000 F1=0.000000 F2=0.000000 F3=0.000000
18/09/2019 21:05:13 INFO F4=0.000000 F5=0.000000 F6=0.000000 F7=0.000000
18/09/2019 21:05:13 INFO D0=0.000000 D1=0.000000 D2=0.000000 D3=0.000000
18/09/2019 21:05:13 INFO D4=0.000000 D5=0.000000 D6=0.000000 D7=0.000000
18/09/2019 21:05:13 WARNING [pc=3A] Writing outside of RAM @ FFFFE4 = FA
18/09/2019 21:05:13 WARNING [pc=3A] Writing outside of RAM @ FFFFE5 = 00
18/09/2019 21:05:13 INFO HandleTrap() called
18/09/2019 21:05:13 INFO R0=00000000 R1=000000FA R2=000FD3B2 R3=00000006
18/09/2019 21:05:13 INFO R4=00039728 R5=00000000 R6=000076B4 R7=00000033
18/09/2019 21:05:13 INFO PC=00000040 SB=00040030 SP=00000000 TRAP=00000400
18/09/2019 21:05:13 INFO FP=000FD236 INTBASE=000FFE64 PSR=0040 MOD=0000
18/09/2019 21:05:13 INFO F0=0.000000 F1=0.000000 F2=0.000000 F3=0.000000
18/09/2019 21:05:13 INFO F4=0.000000 F5=0.000000 F6=0.000000 F7=0.000000
18/09/2019 21:05:13 INFO D0=0.000000 D1=0.000000 D2=0.000000 D3=0.000000
18/09/2019 21:05:13 INFO D4=0.000000 D5=0.000000 D6=0.000000 D7=0.000000
18/09/2019 21:05:13 INFO Illegal Writing Immediate
18/09/2019 21:05:13 INFO HandleTrap() called
18/09/2019 21:05:13 INFO R0=00000000 R1=000000FA R2=000FD3B2 R3=00000006
18/09/2019 21:05:13 INFO R4=00039728 R5=00000000 R6=000076B4 R7=00000033
18/09/2019 21:05:13 INFO PC=0000004F SB=00040030 SP=00000000 TRAP=00000400
18/09/2019 21:05:13 INFO FP=000FD236 INTBASE=000FFE64 PSR=0041 MOD=0000
18/09/2019 21:05:13 INFO F0=0.000000 F1=0.000000 F2=0.000000 F3=0.000000
18/09/2019 21:05:13 INFO F4=0.000000 F5=0.000000 F6=0.000000 F7=0.000000
18/09/2019 21:05:13 INFO D0=0.000000 D1=0.000000 D2=0.000000 D3=0.000000
18/09/2019 21:05:13 INFO D4=0.000000 D5=0.000000 D6=0.000000 D7=0.000000
18/09/2019 21:05:13 INFO Illegal Writing Immediate
18/09/2019 21:05:13 INFO Bad NS32016 gen mode
18/09/2019 21:05:13 INFO R0=00000000 R1=000000FA R2=000FD3B2 R3=00000006
18/09/2019 21:05:13 INFO R4=00039728 R5=00000000 R6=000076B4 R7=00000033
18/09/2019 21:05:13 INFO PC=0000005E SB=00040030 SP=00000000 TRAP=00000000
18/09/2019 21:05:13 INFO FP=000FD236 INTBASE=000FFE64 PSR=0040 MOD=0000
18/09/2019 21:05:13 INFO F0=0.000000 F1=0.000000 F2=0.000000 F3=0.000000
18/09/2019 21:05:13 INFO F4=0.000000 F5=0.000000 F6=0.000000 F7=0.000000
18/09/2019 21:05:13 INFO D0=0.000000 D1=0.000000 D2=0.000000 D3=0.000000
18/09/2019 21:05:13 INFO D4=0.000000 D5=0.000000 D6=0.000000 D7=0.000000
18/09/2019 21:05:13 INFO HandleTrap() called
18/09/2019 21:05:13 INFO R0=00000000 R1=000000FA R2=000FD3B2 R3=00000006
18/09/2019 21:05:13 INFO R4=00039728 R5=00000000 R6=000076B4 R7=00000033
18/09/2019 21:05:13 INFO PC=00000064 SB=00040030 SP=00000000 TRAP=00000400
18/09/2019 21:05:13 INFO FP=000FD236 INTBASE=000FFE64 PSR=0040 MOD=0000
18/09/2019 21:05:13 INFO F0=0.000000 F1=0.000000 F2=0.000000 F3=0.000000
18/09/2019 21:05:13 INFO F4=0.000000 F5=0.000000 F6=0.000000 F7=0.000000
18/09/2019 21:05:13 INFO D0=0.000000 D1=0.000000 D2=0.000000 D3=0.000000
18/09/2019 21:05:13 INFO D4=0.000000 D5=0.000000 D6=0.000000 D7=0.000000
18/09/2019 21:05:13 INFO Illegal Writing Immediate
18/09/2019 21:05:13 INFO HandleTrap() called
18/09/2019 21:05:13 INFO R0=00000000 R1=000000FA R2=000FD3B2 R3=00000006
18/09/2019 21:05:13 INFO R4=00039728 R5=00000000 R6=000076B4 R7=00000033
18/09/2019 21:05:13 INFO PC=00000073 SB=00040030 SP=00000000 TRAP=00000400
18/09/2019 21:05:13 INFO FP=000FD236 INTBASE=000FFE64 PSR=0041 MOD=0000
18/09/2019 21:05:13 INFO F0=0.000000 F1=0.000000 F2=0.000000 F3=0.000000
18/09/2019 21:05:13 INFO F4=0.000000 F5=0.000000 F6=0.000000 F7=0.000000
18/09/2019 21:05:13 INFO D0=0.000000 D1=0.000000 D2=0.000000 D3=0.000000
18/09/2019 21:05:13 INFO D4=0.000000 D5=0.000000 D6=0.000000 D7=0.000000
18/09/2019 21:05:13 INFO Illegal Writing Immediate
18/09/2019 21:05:13 INFO Bad NS32016 gen mode
18/09/2019 21:05:13 INFO R0=00000000 R1=000000FA R2=000FD3B2 R3=00000006
18/09/2019 21:05:13 INFO R4=00039728 R5=00000000 R6=000076B4 R7=00000033
18/09/2019 21:05:13 INFO PC=00000082 SB=00040030 SP=00000000 TRAP=00000000
18/09/2019 21:05:13 INFO FP=000FD236 INTBASE=000FFE64 PSR=0040 MOD=0000
18/09/2019 21:05:13 INFO F0=0.000000 F1=0.000000 F2=0.000000 F3=0.000000
18/09/2019 21:05:13 INFO F4=0.000000 F5=0.000000 F6=0.000000 F7=0.000000
18/09/2019 21:05:13 INFO D0=0.000000 D1=0.000000 D2=0.000000 D3=0.000000
18/09/2019 21:05:13 INFO D4=0.000000 D5=0.000000 D6=0.000000 D7=0.000000
18/09/2019 21:05:13 INFO HandleTrap() called
18/09/2019 21:05:13 INFO R0=00000000 R1=000000FA R2=000FD3B2 R3=00000006
18/09/2019 21:05:13 INFO R4=00039728 R5=00000000 R6=000076B4 R7=00000033
18/09/2019 21:05:13 INFO PC=00000088 SB=00040030 SP=00000000 TRAP=00000400
18/09/2019 21:05:13 INFO FP=000FD236 INTBASE=000FFE64 PSR=0040 MOD=0000
18/09/2019 21:05:13 INFO F0=0.000000 F1=0.000000 F2=0.000000 F3=0.000000
18/09/2019 21:05:13 INFO F4=0.000000 F5=0.000000 F6=0.000000 F7=0.000000
18/09/2019 21:05:13 INFO D0=0.000000 D1=0.000000 D2=0.000000 D3=0.000000
18/09/2019 21:05:13 INFO D4=0.000000 D5=0.000000 D6=0.000000 D7=0.000000
18/09/2019 21:05:13 INFO Illegal Writing Immediate

then many more similar lines.

Having eliminated the red-herring of the PDP11 BASIC ROM it does not seem matter which OS version I try with the BBC Master (to take advantage of ADFS) it crashes in the same way.

I have tried reverting to previous versions of b-em as far back as before the port to Allego 5 and again the crash happens there too. I suspect, therefore this may be a 32016 core bug.

hoglet67 commented 5 years ago

Steve,

What version of DFS are you using?

I think Panos needs a file system with large (>64KB) file support.

It also needs the DFS file system to be well-behaved when switching between DFS and ADFS.

I've certainly installed Panos from MMFS floppies in PiTubeDirect, and the core is directly shared with B-em. So I would be surprised if this turned out to be a NS32016 core bug.

All this should be fine on a Master, which I belatedly noticed you are using.

Dave

hoglet67 commented 5 years ago

Steve,

If you like, I can try to reproduce this issue on a real Master.

I have several implementations of the 32016 Co Pro:

It would be informative to know which of these (if any) also exhibit the issue.

Can you attach the .ssd images you are installing from?

Can you also confirm the active ROMs on the Master?

Dave

hoglet67 commented 5 years ago

Steve,

It's just possible I have never tried a Panos install with PiTubeDirect.

Looking through the stardot posts, my original installs of Panos 1.1 and 1.4 were done with the Matchbox Co Pro: https://stardot.org.uk/forums/viewtopic.php?f=44&t=8852&p=132621&hilit=PANOS#p132621

So a NS32016 core bug might be possible.

I will try this tomorrow.

Dave

SteveFosdick commented 5 years ago

Dave, Thanks for picking this up. The issue occurs with:

MOS 3.20/DFS 2.24/ADFS 1.50 MOS 3.50/DFS 2.45/ADFS 2.03

It also happens with 3.22 and 3.52 which have Y2K and other patches but the vanilla 3.20 and 3.50 should be available on the real hardware so are a better test case. The DFS and ADFS versions are the ones that come in the 1Mbit ROM with that OS version.

The floppies I am using are the images you added to the b-em distribution so those of the form Pan14?.img at https://github.com/stardot/b-em/tree/master/discs

*ROMS on 3.20 gives

ROM F TERMINAL 01
ROM E VIEW 04
ROM D Acorn ADFS 50
ROM C BASIC 04
ROM B Edit 01
ROM A ViewSheet 02
ROM 9 DFS 79
ROM 8 B-Em VDFS 06
ROM 7 View Printer Driver  20
ROM 6 ?
ROM 5 ?
ROM 4 ?
ROM 3 ?
ROM 2 ?
ROM 1 ?
ROM 0 ?

So I will try stripping that back to just the OS in case that makes a difference.

SteveFosdick commented 5 years ago

Yes, with the extra ROMS removed to just the 1Mbit OS ROM the crash still happens.

hoglet67 commented 5 years ago

Steve,

I've tried installing your Panos 1.4 disk 1 to ADFS using PiTubeDirect on a real Master, and I've not been able to reproduce the issue. i.e. the install process worked correctly, and there were no errors in the logs.

I've carefully checked the NS32016 code between B-Em and PiTubeDirect, and can't see any significant differences.

My best guess at the moment is this is indeed a NS32016 core bug, but one that only manifests when building a 64-bit version of the code. PiTubeDirect is only 32-bit.

Do we have any way of compiling a 32-bit version of B-Em to test out this theory?

Dave

SteveFosdick commented 5 years ago

I may be able to build a 32 bit version on 64bit Linux. I'll check the compiler options. The cross compiled Windows builds include a 32 bit version.

SteveFosdick commented 5 years ago

There is a 32bit Windows version in this release: https://github.com/stardot/b-em/releases/tag/issue91

SteveFosdick commented 5 years ago

I have just tried the 32bit version on a Windows 10 VM and that crashes in the same place. On the assumption this confirms this as a b-em rather than core bug I wonder if the emulation of the TubeULA and associated interrupt handling is a likely culprit. If you agree, it would be interesting to trace the tube transfers and compare PiTubeDirect with b-em. Does PiTubeDirect have a means of doing that?

hoglet67 commented 5 years ago

Can I withdraw the previous hypothesis (that this is a 64-bit bug)?

I've captured an instruction trace in the debugger of the moment of the where things seem to go wrong:

&F004AB [d4ad7e          ] F4 MOVB    @x'FFFFFFFE,TOS 00f004ab 00000001 00000001
&F004AE [cf0f78          ] F2 ACBD    -1,R1PC x+'-8 00f004ae 00000001 00000001 0
&F004B1 [1c02            ] F2 CMPQB   4,R0 00f004b1 00000001 00000000 000fd3b2 0
&F004B3 [1a08            ] F0 BNE     &F004BB  00f004b3 00000001 00000000 000fd3
&F004BB [57bd30          ] F4 MOVD    TOS,@x'30 00f004bb 00000001 00000000 000fd
&F004BE [34a000          ] F4 TBITB   x'0,R0 00f004be 00000001 00000000 000fd3b2
&F004C1 [9a1e            ] F0 BFC     &F004DF  00f004c1 00000001 00000000 000fd3
&F004C3 [9fd70c          ] F2 CMPQD   -1,12(SB) 00f004c3 00000001 00000000 000fd
&F004C6 [1a06            ] F0 BNE     &F004CC  00f004c6 00000001 00000000 000fd3
&F004C8 [97ae300c        ] F4 MOVD    @x'30,12(SB) 00f004c8 00000001 00000000 00
&F00CE3 [0222            ] F1 BSR     &F00D05  00f00ce3 00000001 00000000 000fd3
&F00D05 [d705            ] F4 MOVD    R0,TOS 00f00d05 00000001 00000000 000fd3b2
&F00D07 [ce18880400      ] F7 MOVZBD  0(4(SP)),R0 00f00d07 00000001 00000000 000
&F00D0C [d7e5900000ced9cd] F4 MOVD    249154818(0(SB))[R0:B],TOS 00f00d0c 000000
&F00D11 [ced9cd02        ] F7 MOVZWD  2(SP),TOS 00f00d11 00000004 00000000 000fd
&F00D15 [5dc806          ] F2 MOVQW   0,6(SP) 00f00d15 00000004 00000000 000fd3b
&F00D18 [c38d0408        ] F4 ADDD    8(4(SP)),TOS 00f00d18 00000004 00000000 00
&F00D1C [17c808          ] F4 MOVD    8(SP),R0 00f00d1c 00000004 00000000 000fd3
&F00D1F [3208            ] F1 RXP      x'8 00f00d1f 00000001 00000000 000fd3b2 0
&F00CE3 [0222            ] F1 BSR     &F00D05  00f00ce3 00000001 00000000 000fd3
&F00D05 [d705            ] F4 MOVD    R0,TOS 00f00d05 00000001 00000000 000fd3b2
&F00D07 [ce18880400      ] F7 MOVZBD  0(4(SP)),R0 00f00d07 00000001 00000000 000
&F00CE3 [0222            ] F1 BSR     &F00D05  00f00ce3 00000004 00000000 000fd3
&F00D05 [d705            ] F4 MOVD    R0,TOS 00f00d05 00000004 00000000 000fd3b2
&F00D07 [ce18880400      ] F7 MOVZBD  0(4(SP)),R0 00f00d07 00000004 00000000 000
&F00D0C [d7e5900000ced9cd] F4 MOVD    249154818(0(SB))[R0:B],TOS 00f00d0c 000000
&F00D11 [ced9cd02        ] F7 MOVZWD  2(SP),TOS 00f00d11 00000004 00000000 000fd
&F00CE3 [0222            ] F1 BSR     &F00D05  00f00ce3 00000004 00000000 000fd3

The start of the fragment above, F004AB is part way through the Pandora ROM code that prepares for a data transfer. You can refer to the disassmbly here: http://mdfs.net/Software/Tube/32016/v200/Pan200.zip

If you look, you'll see an abrupt/unexpected address change from F004C8 to F00CE3. This is the result of an NMI. But F00CE3 is not one of the normal NMI data handlers. So the NMI vector is not set correctly at this point (the NMI vector is changed with each data transfer).

I don't think F00CE3 should ever be reached. And you can see, that code section repeats, because I guess the NMIs keep coming.

At a high level, I think what's happening is that we are processing an OS_FILE, the host has requested a data transfer, but before the client has had time to correctly setup the NMI vector that data transfer has started, and it starts to receive NMIs which are mis-handled.

There is meant to be a synchronisation mechanism at the start of the data transfer, where the host should wait for the sync byte to be read from the R4 FIFO by the client before starting the transfer. For some reason that seems to have failed in this case. So the transfer has started before the client is ready. Why I have no idea....

Dave

SteveFosdick commented 5 years ago

Dave, that's very interesting. I was just looking at what is happening with the Tube ULA and I am getting this in the logs:

19/09/2019 17:12:01 DEBUG tube: parasite write R3=50
19/09/2019 17:12:01 DEBUG tube: parasite write R3=63
19/09/2019 17:12:01 DEBUG tube: host read R3=63
19/09/2019 17:12:01 DEBUG tube: parasite write R3=A0
19/09/2019 17:12:01 DEBUG tube: parasite write R3=00
19/09/2019 17:12:01 DEBUG tube: host read R3=00
19/09/2019 17:12:01 DEBUG tube: parasite write R3=00
19/09/2019 17:12:01 DEBUG tube: parasite write R3=00
19/09/2019 17:12:01 DEBUG tube: host read R3=00
19/09/2019 17:12:01 DEBUG tube: host read R3=00
19/09/2019 17:12:01 DEBUG tube: host read R3=00
19/09/2019 17:12:01 DEBUG tube: host read R3=00
19/09/2019 17:12:01 DEBUG tube: host read R3=00
19/09/2019 17:12:01 DEBUG tube: host read R3=00

so this appears to be a parasite to host transfer but it appears the parasite is writing faster than the host is reading. Then it looks like the host is using a counter to decide how many bytes to read so it keep reading the last value sent until the count is satisfied. I assume there is supposed to some kind of synchronisation here and it has gone wrong. That may be a Tube ULA bug though I had this happening the other way round when a bug had prevented setting the parasite's clock speed high enough.

I have got go out now but I'll have a look at the Pandora ROM when I come back to see how this is supposed to work.

hoglet67 commented 5 years ago

Here's a simpler test case that fails, and doesn't involve PANOS at all. Screenshot from 2019-09-19 18-13-14

hoglet67 commented 5 years ago

And a couple more variants:

DFS Only works fine: Screenshot from 2019-09-19 18-18-34

ADFS Only breaks: Screenshot from 2019-09-19 18-17-19

It seems that the SAVE from ADFS is leaving the Tube FIFO in such a state that the next load is subject to a premature NMI. This is almost certainly a tube bug.

hoglet67 commented 5 years ago

This is the transfer done by ADFS SAVE:

cpu 32016: F0044B: read from FFFFFFFE, value=06
cpu 32016: F0049C: read from FFFFFFFE, value=C4
cpu 32016: F004AB: read from FFFFFFFE, value=00
cpu 32016: F004AB: read from FFFFFFFE, value=00
cpu 32016: F004AB: read from FFFFFFFE, value=6C
cpu 32016: F004AB: read from FFFFFFFE, value=91
cpu 32016: F00505: read from FFFFFFFE, value=20
cpu 32016: F0044B: read from FFFFFFFE, value=05
cpu 32016: F0049C: read from FFFFFFFE, value=C4

What's slightly unusual is it's a type-6 transfer.

SteveFosdick commented 5 years ago

Thanks, David, that's a big step forward. I think what I was seeing in the log was the type 6 transfer already out of sync. Running through this from the start, I get:

19/09/2019 21:19:16 DEBUG tube: parasite write R2=14
19/09/2019 21:19:16 DEBUG tube: host read R2=14
19/09/2019 21:19:16 DEBUG tube: parasite write R2=00
19/09/2019 21:19:16 DEBUG tube: host read R2=00
19/09/2019 21:19:16 DEBUG tube: parasite write R2=00
19/09/2019 21:19:16 DEBUG tube: host read R2=00
19/09/2019 21:19:16 DEBUG tube: parasite write R2=6C
19/09/2019 21:19:16 DEBUG tube: host read R2=6C
19/09/2019 21:19:16 DEBUG tube: parasite write R2=9F
19/09/2019 21:19:16 DEBUG tube: host read R2=9F
19/09/2019 21:19:16 DEBUG tube: parasite write R2=00
19/09/2019 21:19:16 DEBUG tube: host read R2=00
19/09/2019 21:19:16 DEBUG tube: parasite write R2=00
19/09/2019 21:19:16 DEBUG tube: host read R2=00
19/09/2019 21:19:16 DEBUG tube: parasite write R2=6C
19/09/2019 21:19:16 DEBUG tube: host read R2=6C
19/09/2019 21:19:16 DEBUG tube: parasite write R2=91
19/09/2019 21:19:16 DEBUG tube: host read R2=91
19/09/2019 21:19:16 DEBUG tube: parasite write R2=00
19/09/2019 21:19:16 DEBUG tube: host read R2=00
19/09/2019 21:19:16 DEBUG tube: parasite write R2=00
19/09/2019 21:19:16 DEBUG tube: host read R2=00
19/09/2019 21:19:16 DEBUG tube: parasite write R2=21
19/09/2019 21:19:16 DEBUG tube: host read R2=21
19/09/2019 21:19:16 DEBUG tube: parasite write R2=B0
19/09/2019 21:19:16 DEBUG tube: host read R2=B0
19/09/2019 21:19:16 DEBUG tube: parasite write R2=00
19/09/2019 21:19:16 DEBUG tube: host read R2=00
19/09/2019 21:19:16 DEBUG tube: parasite write R2=00
19/09/2019 21:19:16 DEBUG tube: host read R2=00
19/09/2019 21:19:16 DEBUG tube: parasite write R2=6C
19/09/2019 21:19:16 DEBUG tube: host read R2=6C
19/09/2019 21:19:16 DEBUG tube: parasite write R2=91
19/09/2019 21:19:16 DEBUG tube: host read R2=91
19/09/2019 21:19:16 DEBUG tube: parasite write R2=54
19/09/2019 21:19:16 DEBUG tube: host read R2=54
19/09/2019 21:19:16 DEBUG tube: parasite write R2=45
19/09/2019 21:19:16 DEBUG tube: host read R2=45
19/09/2019 21:19:16 DEBUG tube: parasite write R2=53
19/09/2019 21:19:16 DEBUG tube: host read R2=53
19/09/2019 21:19:16 DEBUG tube: parasite write R2=54
19/09/2019 21:19:16 DEBUG tube: host read R2=54
19/09/2019 21:19:16 DEBUG tube: parasite write R2=0D
19/09/2019 21:19:16 DEBUG tube: host read R2=0D
19/09/2019 21:19:16 DEBUG tube: parasite write R2=00
19/09/2019 21:19:16 DEBUG tube: host read R2=00

so perfectly in-sync at this point where the parasite is sending the OSFILE control block, the the host sends the header for the type 6 transfer:

19/09/2019 21:19:16 DEBUG tube: host write R4=06
19/09/2019 21:19:16 DEBUG tube: parasite IRQ raised
19/09/2019 21:19:16 DEBUG tube: parasite read R4=06
19/09/2019 21:19:16 DEBUG tube: parasite IRQ lowered
19/09/2019 21:19:16 DEBUG tube: host write R4=C4
19/09/2019 21:19:16 DEBUG tube: parasite IRQ raised
19/09/2019 21:19:16 DEBUG tube: parasite read R4=C4
19/09/2019 21:19:16 DEBUG tube: parasite IRQ lowered
19/09/2019 21:19:16 DEBUG tube: host write R4=00
19/09/2019 21:19:16 DEBUG tube: parasite IRQ raised
19/09/2019 21:19:16 DEBUG tube: parasite read R4=00
19/09/2019 21:19:16 DEBUG tube: parasite IRQ lowered
19/09/2019 21:19:16 DEBUG tube: host write R4=00
19/09/2019 21:19:16 DEBUG tube: parasite IRQ raised
19/09/2019 21:19:16 DEBUG tube: parasite read R4=00
19/09/2019 21:19:16 DEBUG tube: parasite IRQ lowered
19/09/2019 21:19:16 DEBUG tube: host write R4=6C
19/09/2019 21:19:16 DEBUG tube: parasite IRQ raised
19/09/2019 21:19:16 DEBUG tube: parasite read R4=6C
19/09/2019 21:19:16 DEBUG tube: parasite IRQ lowered
19/09/2019 21:19:16 DEBUG tube: host write R4=91
19/09/2019 21:19:16 DEBUG tube: parasite IRQ raised
19/09/2019 21:19:16 DEBUG tube: parasite read R4=91

So by my reading of the tube protocol that is the end of the source address and it is just the sync byte outstanding then the host sets up R1 status:

19/09/2019 21:19:16 DEBUG tube: host write S1=18->07
19/09/2019 21:19:16 DEBUG tube: host write S1=82->07

then strangely the host reads R3 twice. I wonder if this has already gone wrong.

19/09/2019 21:19:16 DEBUG tube: host read R3=00
19/09/2019 21:19:16 DEBUG tube: host read R3=00

Now the host wrtites what I believe is the sync byte and the parasite responds by writing to R3 but doesn't seem to be waiting for the host to read. I looks like already it has overrun the R3 FIFO before the host has read even one byte:

19/09/2019 21:19:16 DEBUG tube: host write R4=20
19/09/2019 21:19:16 DEBUG tube: parasite IRQ raised
19/09/2019 21:19:16 DEBUG tube: parasite read R4=20
19/09/2019 21:19:16 DEBUG tube: parasite IRQ lowered
19/09/2019 21:19:16 DEBUG tube: parasite write R3=0D
19/09/2019 21:19:16 DEBUG tube: parasite write R3=00
19/09/2019 21:19:16 DEBUG tube: parasite write R3=0A
19/09/2019 21:19:16 DEBUG tube: parasite write R3=0C
19/09/2019 21:19:16 DEBUG tube: parasite write R3=F1
19/09/2019 21:19:16 DEBUG tube: parasite write R3=22
19/09/2019 21:19:16 DEBUG tube: parasite write R3=48
19/09/2019 21:19:16 DEBUG tube: parasite write R3=45
19/09/2019 21:19:16 DEBUG tube: host read R3=45

I have added a debug message to log the raising of NMI in the same was as I have for IRQ and the message has not appeared - it looks like the R1S values being written is disabling NMI generation.

So I'll look at Pandora next to see what it expects to be happenning for flow control here.

hoglet67 commented 5 years ago

That's really useful debug ouput there.

One thing to be aware of: type 6/7 transfers don't use NMI at all, because the block length is fixed at 256 bytes. The code in Pandora is at F0004F7.

hoglet67 commented 5 years ago

JGH's disassembly is a bit confused at this point..... Here's my attempt:

.L00F00500
00F00500 1C A8 7C          .(|     CMPQB  +0,@&FFFFFFFC             ; Wait for sync byte in TUBE R4
00F00503 CA 7D             J}      BLT    L00F00500
00F00505 1C A8 7E          .(~     CMPQB  +0,@&FFFFFFFE             ; read sync byte
.L00F00508
00F00508 1C A8 78          .(x     CMPQB  +0,@&FFFFFFF8             ; wait for space in TUBE R3
00F0050B CA 7D             J}      BLT    L00F00508
00F0050D 54 4D 00 7A       TM.z    MOVB   &00(R1),@&FFFFFFFA        ; Send data byte
.L00F00511
00F00511 1C A8 78          .(x     CMPQB  +0,@&78                   ; wait for space in TUBE R3 (disassmbly broken!)
00F00514 CA 7D             J}      BLT    L00F00511
00F00516 54 4D 01 7A       TM.z    MOVB   &01(R1),@&FFFFFFFA        ; Send data byte
00F0051A 0F 09             ..      ADDQD  +2,R1                     ; Add 2 to address
00F0051C 07 08             ..      CMPD   R1,R0                     ; Compare with end
00F0051E 1A 6A             .j      BNE    L00F00508                 ; Loop until all 256 bytes done

The loop is unrolled twice, but for some reason same instruction is disassembled differently. I'm sure the b-em disassembler will do a better job.

hoglet67 commented 5 years ago

Note, there is (or should be) flow control.

The fact that overrun occurs suggests one of three things is happening:

The N flag (bit 7 of R3 status) needs to work differently to the A flags (bit 7 of R1, R2, R4 status), because it pertains to both directions. I think NMI is generated directly off this flag (if it's enabled).

SteveFosdick commented 5 years ago

I am new to the 32016 instruction set but if the test of whether the tube is ready is to compare the status with zero this looks like negative is not ready, positive is. Looking at the b-em code it appears to be clearing bit 6 of R3 status (as seen from the parasite) once a value has been written by the parasite. It does also set bit 7 but this is the status as seen by the host, i.e. the bit to say a byte is ready, not that the register is full.

hoglet67 commented 5 years ago

Functionally it's very similar to the type 6 transfer in the 6809 Co Pro:

                        ; Transfer 6 - Multiple byte parasite -> host
                        ; -------------------------------------------
fdfc :                  FIRQ6:
fdfc :                  FIRQ6lp:
fdfc : b6fee4               LDA  >TUBE3S    ; Wait for Tube R3 ready
fdff : 2afb                 BPL  FIRQ6lp
fe01 : a680                 LDA  ,X+    ; Get byte from parasite memory at X
fe03 : b7fee5               STA  >TUBE3 ; Send to Tube host via R3
fe06 : 5a                   DECB        ; Decrement counter
fe07 : 26f3                 BNE  FIRQ6lp    ; Loop 256 times
fe09 :                  FIRQ6lp2:
fe09 : b6fee4               LDA  >TUBE3S    ; Wait for Tube R3 ready
fe0c : 2afb                 BPL  FIRQ6lp2
fe0e : b7fee5               STA  >TUBE3 ; Send 257th byte to flush FIFO
fe11 : 203d                 BRA  FIRQ_DONE

(just not unrolled)

SteveFosdick commented 5 years ago

The 6809 doesn't seem to have the problem but that's probably because it is too slow to. Here's the bit of tube_parasite_write for R3:

                if (tubeula.r1stat & 16)
                {
                        if (tubeula.ph3pos < 2)
                           tubeula.ph3[tubeula.ph3pos++] = val;
                        if (tubeula.ph3pos == 2)
                        {
                                tubeula.hstat[2] |=  0x80;
                                tubeula.pstat[2] &= ~0x40;
                        }
                }
                else
                {
                        tubeula.ph3[0] = val;
                        tubeula.ph3pos = 1;
                        tubeula.hstat[2] |=  0x80;
                        tubeula.pstat[2] &= ~0xc0;
                }

so it looks like there are two modes, one where R3 is a FIFO and one where it is a single register but in each case bit 7 is only set on the host side. It seems like that is the bug, then.

SteveFosdick commented 5 years ago

I have just re-read the 6809 example and I think I have this backwards. It looks like bit 7 is clear for busy and set for ready (as seen from the parasite). In that case the single register version looks correct but the FIFO version should probably have &= 0xc0 like its counterpart. I'll try than and see.

hoglet67 commented 5 years ago

Think of pstat[2] bit 7 (the N bit) as meaning "needs attention from the parasite" when set.

There are two causes of this becoming set:

Both of these are conditions caused by the host, so I think it's correct that only the host read/write code should ever set this bit.

And conversly, only the paraside read/write code should clear it.

This seems to be how it's implemented.

The "FIFO version" only comes into play if two-byte transfer mode is enabled, which happens in type 2/3 transfers. ADFS doesn't use any of those, so I think any bugs relating to two-byte mode are not coming into play here.

SteveFosdick commented 5 years ago

Ok, so going back to the debug output the two host reads of R3 immediately before the sync byte is sent via R4 should ensure R3 is empty and thus R3 status bit 7 should be set and the first time the code to send a byte is executed it should not need to wait and can send immediately. Assuming the parasite is faster than the host, next time round the loop it should find bit 7 clear and should thus wait for a host read to set it again.

hoglet67 commented 5 years ago

Yes.

And we know this flow control works, because the 32016 Core in PiTubeDirect is much faster than the host.

Could you add debug on parasite reads of R3 Status?

SteveFosdick commented 5 years ago

Ok, I was dong that and here's the result:

19/09/2019 22:38:43 DEBUG tube: parasite read R4=20
19/09/2019 22:38:43 DEBUG tube: parasite IRQ lowered
19/09/2019 22:38:43 DEBUG tube: parasite read R3 status C0
19/09/2019 22:38:43 DEBUG tube: parasite write R3=0D
19/09/2019 22:38:43 DEBUG tube: R3 single full
19/09/2019 22:38:43 DEBUG tube: parasite read R3 status 00
19/09/2019 22:38:43 DEBUG tube: parasite write R3=00
19/09/2019 22:38:43 DEBUG tube: R3 single full

this is from the sync byte so the first time bit 7 is set the parasite sends which is correct, the the tube code is returning 0 as the status and the parasite writes R3 anyway!

hoglet67 commented 5 years ago

Very interesting.... so the I wonder why CMPQB is apparently doing the wrong thing?

         case CMPQ:
         {
            temp2 = (opcode >> 7) & 0xF;
            NIBBLE_EXTEND(temp2);
            temp = ReadGen(0);
            SIGN_EXTEND(OpSize.Op[0], temp);
            CompareCommon(temp2, temp);
            continue;
         }
SteveFosdick commented 5 years ago

My initial though was to log the values temp and temp2 going into CompareCommon but that would generate a lot of output. I wonder if we can do:

         case CMPQ:
         {
            temp2 = (opcode >> 7) & 0xF;
            NIBBLE_EXTEND(temp2);
            temp = ReadGen(0);
            SIGN_EXTEND(OpSize.Op[0], temp);
            if (pc == ??????)
               log_debug("CMPQ: temp=%d, temp2=%d", temp, temp2);
            CompareCommon(temp2, temp);
            continue;
         }

for whatever is the correct variable for PC and the address of the misbehaving instruction. That would also test that the values as seen by the instruction have not been modified by memory read/write functions.

hoglet67 commented 5 years ago

I'm wondering if the SIGN_EXTEND in the above fragment is wrong, becaise CompareCommon seems to attempt to deal with byte sized objects itself:

   if (OpSize.Op[0] == sz8)
   {
      N_FLAG = TEST(((int8_t) src1) > ((int8_t) src2));
   }

What's a compiler meant to do with a (int8_t) cast if the value is already out of range 0..255?

This is definitely "undefined" teratory (i.e. the compiler can do anything)

SteveFosdick commented 5 years ago

My guess would be truncate to 8 bits. Then in old-style C it would re-sign-extend to do the comparison using the machine's "native" word size but I don't know if later C standards have allowed the compiler more scope. We'd have to dig out the standard. This does raise a possibility though in that when compiling for ARM the compiler may be doing something different than when compiling for X86.

SteveFosdick commented 5 years ago

Commenting out the sign extend doesn't fix it:

         case CMPQ:
         {
            temp2 = (opcode >> 7) & 0xF;
            NIBBLE_EXTEND(temp2);
            temp = ReadGen(0);
            //SIGN_EXTEND(OpSize.Op[0], temp);
            CompareCommon(temp2, temp);
            continue;
         }
hoglet67 commented 5 years ago

I tried as well ;-(

I think adding some debugging around CMPQ is the right way to proceed. Like you outlined above.

I need to sign off now....

SteveFosdick commented 5 years ago

Ok, David, no problem. I added the debugging and here is the result:

19/09/2019 23:13:54 DEBUG tube: parasite read R4=20
19/09/2019 23:13:54 DEBUG tube: parasite IRQ lowered
19/09/2019 23:13:54 DEBUG 32016: CMPQ: temp=20 (32), temp2=00 (0)
19/09/2019 23:13:54 DEBUG tube: parasite read R3 status C0
19/09/2019 23:13:54 DEBUG 32016: CMPQ: temp=C0 (192), temp2=00 (0)
19/09/2019 23:13:54 DEBUG tube: parasite write R3=0D
19/09/2019 23:13:54 DEBUG tube: R3 single full
19/09/2019 23:13:54 DEBUG tube: parasite read R3 status 00
19/09/2019 23:13:54 DEBUG 32016: CMPQ: temp=00 (0), temp2=00 (0)
19/09/2019 23:13:54 DEBUG tube: parasite write R3=00
19/09/2019 23:13:54 DEBUG tube: R3 single full

so it appears the bug is in CompareCommon as you suggest.

hoglet67 commented 5 years ago

Steve,

Here's one last idea...

19/09/2019 22:38:43 DEBUG tube: parasite read R3 status 00

Bits 5..0 of the status registers (undefined) should I think be reading back as '1' not '0'

This could well be confusing the test.

Less Than == Z and N flags clear

In this case, Z will be 1, and the test will incorrectly fall through.

hoglet67 commented 5 years ago

PiTubeDirect initializes them as:

   PSTAT1 = 0x40;
   PSTAT2 = 0x7F;
   PSTAT3 = PSTAT2;
   PSTAT4 = PSTAT2;
SteveFosdick commented 5 years ago

I think that's it! I'll give it a go.

SteveFosdick commented 5 years ago

Dave,

With that change:

19/09/2019 23:34:14 DEBUG tube: parasite read R4=20
19/09/2019 23:34:14 DEBUG tube: parasite IRQ lowered
19/09/2019 23:34:14 DEBUG 32016: CMPQ: temp=20 (32), temp2=00 (0)
19/09/2019 23:34:14 DEBUG 32016: CompareCommon: pc=00F00508, a=00000000 (0), b=00000020 (32), N=0
19/09/2019 23:34:14 DEBUG tube: parasite read R3 status FF
19/09/2019 23:34:14 DEBUG 32016: CMPQ: temp=FF (255), temp2=00 (0)
19/09/2019 23:34:14 DEBUG 32016: CompareCommon: pc=00F0050B, a=00000000 (0), b=FFFFFFFF (-1), N=1
19/09/2019 23:34:14 DEBUG tube: parasite write R3=0D
19/09/2019 23:34:14 DEBUG tube: R3 single full
19/09/2019 23:34:14 DEBUG tube: parasite read R3 status 3F
19/09/2019 23:34:14 DEBUG 32016: CMPQ: temp=3F (63), temp2=00 (0)
19/09/2019 23:34:14 DEBUG 32016: CompareCommon: pc=00F00514, a=00000000 (0), b=0000003F (63), N=0
19/09/2019 23:34:14 DEBUG tube: parasite read R3 status 3F
19/09/2019 23:34:14 DEBUG 32016: CMPQ: temp=3F (63), temp2=00 (0)
19/09/2019 23:34:14 DEBUG 32016: CompareCommon: pc=00F00514, a=00000000 (0), b=0000003F (63), N=0
19/09/2019 23:34:14 DEBUG tube: parasite read R3 status 3F

so now it is looping, waiting for R3 to be empty again, and later:

19/09/2019 23:34:14 DEBUG tube: host read R3=0D
19/09/2019 23:34:14 DEBUG tube: parasite read R3 status FF
19/09/2019 23:34:14 DEBUG 32016: CMPQ: temp=FF (255), temp2=00 (0)
19/09/2019 23:34:14 DEBUG 32016: CompareCommon: pc=00F00514, a=00000000 (0), b=FFFFFFFF (-1), N=1
19/09/2019 23:34:14 DEBUG tube: parasite write R3=00
19/09/2019 23:34:14 DEBUG tube: R3 single full

So this looks to have fixed this, at least for the simple test case. I'll try a save and reload on ADFS now and see if that works.

hoglet67 commented 5 years ago

Rather annoying, this is the second time I've encountered this bug. We hit it with PiTubeDirect in October 2017. Here's a fragment of an email exchange between myself, Dom and Ed:

Phew, sorted....

It was failing because the unused bits in the parasite status registers should be read back as set, and in PiTubeDirect they are read back as clear. We make the same mistake on the Host side, but the code there is better written so it doesn't matter.

(This behavior of unused bits is defined in App Note 004)

What makes use of these unused bits you might ask?

Consider the comparison in the type 6 transfer loop:

00F00508 1C A8 78 .(x CMPQB +0,@&FFFFFFF8 00F0050B CA 7D J} BLT L00F00508 00F0050D 54 4D 00 7A TM.z MOVB &00(R1),@&FFFFFFFA ; Send a byte In PiTubeDirect, the values being returned are 0x00 and 0xC0, neither of which allow the BLT to be taken, so there is no flow control.

In Matchbox (and a real Tube chip I assume), the values being returned are 0x3F and 0xFF, and when it's 0x3F the BLT is taken.

I would consider this a bug in Pandora; they should be using a BLE branch here.

I've just tested a fix to this and it works!

Subtle or what?

Dave

So subtle that I completely forgot ever encountering it... Sorry about that.

SteveFosdick commented 5 years ago

Ok, I can can save and re-load from ADFS from within BAS32 on the 32016.

I agree with you - sloppy programming to rely on the value of unused bits when you can as easily code instead to rely on on the bits you're interested in.

SteveFosdick commented 5 years ago

Installing Panos now works and these does not seem to have affected any other co-pros. Thanks for your help, Dave.

SteveFosdick commented 5 years ago

Fix is merged to master.