hoglet67 / PiTubeDirect

Bare-metal Raspberry Pi project that attaches to the Acorn TUBE interface and emulates many BBC Micro Co Processors
GNU General Public License v3.0
188 stars 23 forks source link

Switching language ROMS crash machine, when started in modes other than 7 #70

Closed mincebert closed 3 years ago

mincebert commented 4 years ago

Hello — thanks for the recent Fer-De-Lance update!

I have a Master with a MOS 3.50 and a RPi 3A+ running Fer-De-Lance. I have set the boot co-processor to 2 (65C102 internal) and mostly everything seems to work fine - including Elite co-processor version, Sphere, etc.

However, there's a weird problem - if I start the machine and run the EDIT (the Master text editor) it works fine, if I'm in mode 7. However, if I change the mode to something else (say 0), immediately after BASIC starts (or later) then run EDIT, I get "EDIT" printed and then the machine hangs. Pressing Ctrl+Break will recover things, but the TUBE processor will disappear (I get plain "ACORN MOS" and a beep, and not the TUBE banner); a further Ctrl+Break will recover the TUBE processor on the PiTubeDirect.

I thought at first this was a problem with Tom Seddon's BASIC Editor - I was running 1.43 beta of that and had hangs at times and today I did the upgrade to 1.44 and Fer-De-Lance while I was at it and thought I'd see if my crashes had gone away. (I think this problem occurred under EggEaster too, but I wasn't checking for it as systematically as today.)

However, after logging an Issue with him and doing some recompiles of that with some OSWRCHs in to find where it hung, I found that it occurred after the call to OSBYTE &8E but before the language entry point in the ROM is jumped to: the MOS is printing the language name on the screen and then hanging before starting the language. I then checked and found it occurred with EDIT or even BASIC, when in modes other than 7.

I'm afraid I do have a lot of weird stuff in my Master: a RetroClinic MOS selector (which confirms the problem occurs under 3.20, too) and a GoSDC, although I've *UNPLUGged that and powered off the Master to ensure it's not loading any custom filesystems or otherwise fiddling about. I can't easily physically remove those, if that's got something to do with it.

If no-one else gets this problem, there must be something odd about my machine and I'll perhaps have to do some hardware swaps.

mincebert commented 4 years ago

OK — I've done a bit more digging...

I also have two ROM plug-in Acorn ROM cartridges with home-burnt EPROMs on them: one has Tom Seddon's The HiBASIC Editor and The BASIC Editor; the other has MMFS 1.42 and EXMON II 2.02. The problem only occurs if I have both cartridges connected: if I have one — it doesn't matter which of the two, or which slot it's in — there's no problem.

It makes no difference if I *UNPLUG the ROMs in the cartridge: the crash still occurs. However, power off and remove one and it's fine.

Could this be a problem with power or timing or something? It does seem odd, though, that the PiTubeDirect seems to work absolutely fine in all other situations I've tested — left Sphere running for a long time (taking 1.48s per loop), ELITE co-processor edition works fine and doesn't crash after many minutes of playing; I can use the ARM2 PiTubeDirect core and render a mandelbrot set in ARMBASIC in mode 2.

dp111 commented 4 years ago

Just for completeness can you tell us about the interface used to the Pi ?

On Sun, 19 Apr 2020 at 23:23, Robert Franklin notifications@github.com wrote:

OK — I've done a bit more digging...

I also have two ROM plug-in Acorn ROM cartridges with home-burnt EPROMs on them: one has Tom Seddon's The HiBASIC Editor and The BASIC Editor; the other has MMFS 1.42 and EXMON II 2.02. The problem only occurs if I have both cartridges connected: if I have one — it doesn't matter which of the two, or which slot it's in — there's no problem.

It makes no difference if I *UNPLUG the ROMs in the cartridge: the crash still occurs. However, power off and remove one and it's fine.

Could this be a problem with power or timing or something? It does seem odd, though, that the PiTubeDirect seems to work absolutely fine in all other situations I've tested — left Sphere running for a long time (taking 1.48s per loop), ELITE co-processor edition works fine and doesn't crash after many minutes of playing; I can use the ARM2 PiTubeDirect core and render a mandelbrot set in ARMBASIC in mode 2.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hoglet67/PiTubeDirect/issues/70#issuecomment-616235071, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEVVFIRDQWEC4Z4YEEZH4QLRNN2XJANCNFSM4ML7D5EA .

mincebert commented 4 years ago

Thanks for the reply — knew I'd forget something!

It's an internal BBC Master one — the PiTubeDirect Master SignalConverter V1.4, Sundy/System 2017 bought about a year ago (April 2019 or so). It's taking its power from the BBC itself.

Trying the program to test the problem here (if this is at all related) — http://www.sundby.com/index.php/2017/12/19/problem-with-pitubedirect-and-some-bbc-masters/ — does not give me a snow effect (although I am running the RGB output via a SCART cable into an HDMI converter, if that will cause it to be hidden).

I do have a RPi 2B I could swap in and try (instead of the 3A+). I'll do that...

mincebert commented 4 years ago

I've just swapped in the RPi 2B onto the same Sundy/System internal card with the same PiTubeDirect Fer-De-Lance SD card and get the exactly the same problem: fine in MODE 7, crashes in other MODEs.

hoglet67 commented 4 years ago

Version 1.4 of Kjell's level shifter is the version that fixed the "snow effect" bug: http://www.sundby.com/index.php/2018/01/31/1059/

It does sound like this might be timing related, but could you check a couple of further things:

  1. You mention this is happening with Co Pro 2 (the 3MHz 6502). Can you try each of the following Co Pros: 0, 1, 3 and 16 (on Fer-De-Lance).

  2. Could you also go back a test the Egg-Eater release, just to confirm this isn't a recent regression.

Dave

mincebert commented 4 years ago

Back on the RPi 3A+ again... I get the same result with co-pros 0, 1, 2 and 3 (sorry, I did try that but forgot to mention it; I checked again, just to make sure, though!). 16 is little weirder - it's sort of 50/50 whether it hangs or not, but it sometimes goes wrong in some other way: for example *EDIT occasionally starts OK, but it sometimes starts and then crashes, doing odd things like displaying a message about pressing "Shift-F5 D for info".

On other thing to mention is that, when this happens, I lose any memory contents as, if I type OLD and LIST, I get the Sphere program back, so it looks like the RPi is crashing (hence the need for a couple of Ctrl+Breaks, I guess, for it to reboot), although this doesn't happen when *EDIT goes a bit odd under co-pro 16, I think.

I'll try downgrading to Egg-Eater and let you know what happens there...

hoglet67 commented 4 years ago

OK, I've managed to reproduce this with the following setup:

The main symptom is the language transfer across the tube is very unreliable, and occasionally the Pi reboots as well. On my system it actually doesn't matter much what screen mode I'm running in. Most of the time the initial trasfer of BASIC on Ctrl-BREAK fails.

Things that make it go away are:

The differences with the IFEL level shifter are:

Steve Picton is still selling these on ebay: https://www.ebay.co.uk/itm/ACORN-BBC-MASTER-128-ADAPTER-BOARD-LEVEL-SHIFTER-CO-PROCESSOR-RASPBERRY-PI/323801408266

s-l1600

I'll try to get to the bottom of exactly what's failing here. The big clue is the Pi rebooting, which most likely is due to switching noise being coupled across to the nRST signal.

Dave

mincebert commented 4 years ago

Thanks for the update. I just tried EggEater and got the same issue, with both processor 0 and 1.

I also notice that the green light flashes on the RPi when I press Ctrl+Break after the hang (suggesting SD card access, I think, so a reboot). Also, when it does come back, it resets the processor type to the one in cmdline.txt, so all suggesting a reboot.

Thanks for all your work investigating — I'll order one of the IFEL boards to confirm it resolves the issue for me but I'll be happy to do further tests with Kjell's board, if you want.

hoglet67 commented 4 years ago

Another data point: if I put a scope probe on the Phi2 on the Pi signal, then the problem also seems to go away.

@mincebert could you try setting tube_delay=15 in cmdline.txt

I found values between 10 and 25 resolve the issue for me.

This is more of a work around; it just shifts the tube signal samping point a bit.

Dave

mincebert commented 4 years ago

Thanks for the suggestion — I've gone back to Fer-de-lance and changed to tube_delay=15 and that appears to resolve the problem (given a quick test, at least for processor type 2).

I've ordered one of Steve Picton's boards, so I'll try that too, when it arrives.

mincebert commented 4 years ago

The IFEL board arrived this morning and, despite my best efforts with a soldering iron, appears to work fine. I tried it first with tube_delay=15 left in cmdline.txt and everything appeared to work fine (well - my standard test of changing to MODE 6 and typing *EDIT). I then changed it to tube_delay=0 and it all still appears to work fine.

So that all fits — I'll let you decide if that's a problem with PiTubeDirect or the board, but let me know if you want me check anything further.

Thanks very much for all your very speedy help!

hoglet67 commented 4 years ago

This was a clock integrity issue with certain PiTubeDirect level shifers, so closing.

hoglet67 commented 4 years ago

For reference, here are some scope plots of the clock noise issue.

With the Sundby board, which seems to have the issue: IMG_2113 IMG_2114

With the IFEL board, which doesn't: IMG_2111 IMG_2112

The purple trace is the Phi2 clock into the Pi, taken with a scope connected to pins 25/26 of the GPIO: IMG_2116

The pink trace is the RnW signal on the Tube connector.

The scope a HP Infiniium 54845A, and is set to advanced trigger, on the falling edge on nTUBE when RnW is low. The probes are the weak like, being Agilent 10074C (150MHz).

Note, with the scope probe connected at this point, I wasn't actually seeing any problems with the language transfer. As soon as I removed the scope probe, about 1 in 3 of the language transfers failed.

The test was done with my issue 1 Master (white silk screen), and with a ROM cartridge in the rear slot, and Myelin's bus sniffer cartridge in the front slot, and the Master in MODE 1. IMG_2117

To be honest, the presence, or otherwise of the ROM cartridge doesn't change the symptoms, as the Phi2 signal (A8) is not connected.

Dave

mincebert commented 4 years ago

Thanks for the update and explanation of what you were doing. I've dug out my scope and done the same monitoring and get the same results — I have a Master issue 2 motherboard. I thought I'd similarly save this here, in case it's of interest.

Sundby board with Acorn cartridges with 2x EPROMs — clearly showing the noise on Phi2:

Sundby cart

Sundby board with no cartridges — same noise:

Sundby no cart

IFEL board with cartridges — no noise on Phi2:

IFEL cart

IFEL board with no cartridges — cursors showing the initial peak of Phi2, relative to the cartridge version (see below):

IFEL no cart

The only difference I can see between the cartridge and non-cartridge versions is Phi2 is about 9ns (~111MHz) later, which I'm sure is not relevant.

I can't get my Master to crash with the Sundby board in reliably, any more (as in, it seems to work most of the time, although I do get the occasional hang which is recovered with a Ctrl+Break). I did, however, find my GoSDC board (which fits in IC27 / ROM 8) was a bit loose a few weeks ago and pressed that down again — I was getting the occasional hang and this seemed to solve it — so maybe that + the Sundby board + the ROM cartridges tipped things over the edge.

It all seems to be good now, though, thanks.

hoglet67 commented 4 years ago

Thanks for those scope plots Robert.

mincebert commented 4 years ago

I have realised this morning that I stupidly did those traces in MODE 7 and not MODE 1 (or 0-6), which was one of the conditions for the hang, in my case, although I think you said it didn't make any difference to you and it does confirm the noise.

I'll try them again to see if it still hangs and if there's any difference in the trace, with and without the cartridges.

hoglet67 commented 4 years ago

Out of interest, I've just taken a look at the pattern of corruption during the languague transfer. capture0

What's happing here is I have saved a version of BASIC4 so it reloads at &2000.

I'm *RUNing it (which enters the good copy at &8000), and running a comparison.

I'm typically seeing just a single byte corrupted, and that byte always seems to be &FF changing to &00.

I've also compiled PiTubeDirect with DEBUG_TRANSFERS enabled:

For a good transfer I see:

checksum_h = 00004000 9C77B542
checksum_p = 00004004 9C77B542

For a bad transfer I see different values, for example:

checksum_h = 00004000 94DC0A0B
checksum_p = 00004004 94DC0A0B

If I swap the cartridges around (so the bus sniffer is in the rear slot), then I see more errors: capture1

Here's my best guess as to what's occurring.

The problem happens during a tube write (in this case to the R3 FIFO), where all the data bus lines are switching from 0 to 1.

During the first half of the bus cycle, the data bus is carrying video data, and in MODEs 0..6 the most common value in the display RAM will be &00, whereas in MODE 7 it will be &20, so MODE 7 will be less prone to this effect. If the screen memory is filled with &FF in MODE 1, then no longer see any corruption.

The Master PiTubeDirect level shifter enables the data bus only during the second half of the bus cycle. The 74LVC245 OE signal is PHI2 & !RnW (this is done with a couple of 74HCT00 gates). I think at the time the driver turns on, the data bus will still be carrying the video data, and then ~50ns later it changes to the CPU write data.

All of the data bits changing from 0 to 1 is causing a dip in the Phi2 clock signal. It's possible this is due to a local power dip, or it could be due to crosstalk. Whatever the mechanism, sometimes this dip is low enough that the GPU code on the Pi sees a 1->0 transition on Phi2, which causes the write cycle to complete prematurely. The data sample for the write is the value just before this transition, which appears to be &00 (in all of the corruption I have seen).

I tried adding additional 100nF decoupling capacitors across the two 74LVC245 chips on the Sundby board, and that doesn't seem to have helped, so this probably is just down to crosstalk. This will be greatly reduces on a 4-layer board like Steve's.

I do have a possible software work around for this issue. I've run it for 1000 cycles now, and there hasn't been any corruption. But before I push it I'd like to have Dominic look over it.

Here's an extended soak test program: capture2

And for reference, here's the proposed fix to the GPU code:

diff --git a/vidcore/tubevc.s b/vidcore/tubevc.s
index 1db5717..fa38728 100644
--- a/vidcore/tubevc.s
+++ b/vidcore/tubevc.s
@@ -262,6 +262,10 @@ wr_wait_for_clk_high1:
    btst   r7, CLK
    beq    wr_wait_for_clk_high1

+   ld     r7, GPLEV0_offset(r6)
+   btst   r7, CLK
+   beq    wr_wait_for_clk_high1
+
 # spin waiting for clk low
 wr_wait_for_clk_low:
    mov    r8, r7

After seeing the rising edge of Phi2, it does an additional IO read which takes it passed the point where the noise is occurring.

Dave

hoglet67 commented 4 years ago

Re-opening as there is a possible software workaround for this.

mincebert commented 4 years ago

Although I'm sure you've sussed it from your explanation for now, I thought I'd just add this, if it's of interest...

I tried my test again in MODE 1 (with CONFIGURE MODE 1), with two cartridges in (one isn't enough!) and the Master hangs on power up with just a black screen and flashing cursor, or after a CTRL-BREAK. Just pressing BREAK, however, it starts up in BASIC OK. Pressing BREAK again or starting the language ROM (e.g. BE, *EDIT) doesn't cause any trouble any more but CTRL-BREAK hangs again. So maybe reseating the the GoSDC board has partially fixed things for me (in that I no longer get a hang starting the language ROM, just on power up).

The same thing occurs in MODE 2 but oddly not MODE 0 nor MODES 3-7, so I'm not sure what's odd about those two.

Another weird thing is that the oscilloscope doesn't trigger, when it does hang like this. If I just watch RnW, it still goes low, when I push CTRL-BREAK but nTube doesn't: even if I just do a simple edge trigger on RnW, it doesn't fire. As far as I can tell, everything else is the same: the only difference I can see otherwise is that, for each cartridge I add, the rising edge on Phi2 shifts slightly earlier, as I showed before (if I just use BREAK).

mincebert commented 4 years ago

Actually... perhaps ignore that: I've just swapped the IFEL board back in and it does exactly the same thing as the Sundby board (albeit without the noise on Phi2, still): it hangs on power up or CTRL-BREAK with two cartridges in the slot with MODE 1 or 2 selected. If I disable the Tube with *CONFIGURE NOTUBE, no problems. So maybe this is something different.

I don't get my original problem with either board, though: just starting a new language on MODE 1 (or 0-6, that I've seen) is fine.

hoglet67 commented 3 years ago

Mincebert,

Sorry for the delay, I sort-of forgot about this issue!

There is an RC0 release of Gecko in the releases section that contains the fix to the GPU code.

It would be great if you could test this on your problem systems.

It may not fix all of the issues, but it should improve things.

Dave

mincebert commented 3 years ago

OK, the plot thickens (and perhaps exonerates PiTubeDirect)...

I've tried Gecko-rc0 and I still get the same problem I was getting before: the system hangs with *CONFIGURE MODE 1 or 2 after a Ctrl+Break or power on but not in any other mode (including 0), nor if the Tube is disabled. Note I have to press Ctrl+Break after power on because I'm running MOS 3.50 and have a GoSDC that is copying filing systems ROMs into spare SRAM banks. This is all with the IFEL board as I was not getting the problem with the Sundby board again, perhaps because I'd reseated my GoSDC in ROM socket 8, as mentioned earlier.

I then wondered if it would hang on power on, if I didn't have to push Ctrl+Break for GoSDC so did a UNPLUG 8. This stopped it hanging on Ctrl+Break; I then tried a power on (with it still unplugged) and it didn't hang again, nor did it after a INSERT 8 and a Ctrl+Break (albeit this doesn't load the ROMs). Doing *SDCRESET (to force load the ROMs) just restarts and hangs but doesn't copy the GoSDC filing system SROMs in SRAM.

I then switched to MOS 3.20 on my selector (and changing the filing system ROMs GoSDC loaded) and tried again on power on and after Ctrl+Break and no hangs, with it inserted or unplugged.

None of this breaks IF I've not got two cartridges in the slots, or am in MODEs other than 1 or 2, or have the Tube disabled, or have GoSDC *UNPLUGged, or am in a MOS other than 3.50.

So — I don't know what's going on: whether it's GoSDC or PiTubeDirect, or hardware or whatever, but it only happens in a very specific situation! Whatever, Gecko-rc0 hasn't fixed it, but it's not a major issue!

Let me know if you want any more information!

jgharston commented 3 years ago

I think I've tracked down that this is what's been happening to me with my standard BASIC build code(1), I've spent all weekend narrowing down to the smallest test case. Save a BASIC program:

10REM
SAVE "SRC"

Build a text file:

*BUILD MAKE
*BASIC
P.~PAGE,~!0,~!4,~!&8000
LO."SRC"
SA."DST"
A$="":RUN
P.~PAGE,~!0,~!4,~!&8000

Now, with the PiTube active and either TUBE 0, 1, 2 or 3 (the 6502s), do:

*EXEC MAKE

It will normally display (trimmed a bit):

BASIC
800 8020802 80008000 27F001C9
>LO."SRC"
>SA."DST"
800 8020802 80008000 27F001C9
>A$="":RUN
800 8020802 80008000 27F001C9

but quickly PAGE gets corrupted and you get:

BASIC
800 8020802 80008000 27F001C9
>LO."SRC"
>SA."DST"
800 8020802 80008000 27F001C9
>A$="":RUN
300 3020302 3020302 D0D0D0D

The last number is the start of the BASIC code, so instead of the CMP #1:BEQ etc stuff there is &0D loaded multiple times, MDUMPing shows the first byte repeated 40 or 50 times, then the rest of the first 256 bytes of BASIC, then the code from &8100 onwards is fine. (The host sends the language in 256-byte chunks, so it gets back into sync at &8100). So PAGE and the other workspace variables are corrupted because the start of the BASIC code that sets them is trashed. Initial testing suggests it happens with other languages, sometimes View doesn't start properly, and sometimes doing BASIC from View leaves me in View showing the language transfer hasn't happened properly.

If the exec file is changed to:

*BASIC
P.~PAGE,~!0,~!4,~!&8000
LO."SRC"
SA."DST"
LO."DST"
A$="":RUN
P.~PAGE,~!0,~!4,~!&8000

that extra LOAD makes it works prefectly every time (for five minutes of testing).

So, something's mucking up something the initial Host->Client 256-byte data transfer. Something by the Client->Host 1-byte transfer done by SAVE, which gets reset if you do a 1-byte transfer Host->Client by LOAD.

BBC Master, MOS 3.20 all ROMs 0-7 *UNPLUGed, external PiDirect level convertor. I haven't started dismantling it yet to update the SD card, so I'm not sure what Pi hardware or PiTube software.

(1)Eg see http://mdfs.net/Apps/Archivers/BBCZip/MkZip

jgharston commented 3 years ago

(Ignore the odd formatting, treat everything above as plain text)

hoglet67 commented 3 years ago

Hi Jonathan,

The likely cause of the corruption during language transfer being discusses in this issue is a hardware issue (cross-talk induced clock noise on certain level shifters). Some changes were made in the latest PiTubeDIrect that might help.

Which version of PiTubeDirect are you running?

Which level shifter do you have? If you are not sure, post a photo.

Dave

hoglet67 commented 3 years ago

If you think your problem is different, then we should open a new issue.....

jgharston commented 3 years ago

It looks like the same or a very similar issue. I'll tell you what I've got when I've dismantled everything....

hoglet67 commented 3 years ago

You tell the version of PiTubectDirect by powering on the 6502 Co Pro (Co Pro 0) and doing OLD then LIST.

The version (git commit and build name) are embedded in the REM statements at the top of the SPHERE test program.

Please take a look at this and report back.

jgharston commented 3 years ago

I got everything dismantled and found my card reader/writer. Renamed everything into /2020 and copied everything from Gecko into root. Remantled everything and since about 8 o'clock TUBE 0 (fast 6502) has been running the above tests, and so far no problems. (I forgot to check the date, but it was sometime last year.)

If GeckoFix1 fixes mincebert's problem as well, I think this can be closed.

hoglet67 commented 3 years ago

Closing for now....