janbredenbeek / Minerva4Q68

Minerva operating system for the Q68
GNU General Public License v3.0
1 stars 1 forks source link

Issue with Q68 v1.05 firmware #1

Closed execriez closed 5 months ago

execriez commented 10 months ago

Hello,

Some feedback, as I have a Q68 with v1.05 firmware and experience the issues mentioned in the readme. I have tested Minerva4Q68 versions 1.5 and 1.6.

On both versions pressing F1, F2, F3 or F4 takes you to the BASIC screen with a flashing cursor, which immediately freezes. Pressing SHIFT+F1, SHIFT+F2, SHIFT+F3 or SHIFT+F4 takes you to a 128K BASIC screen with a flashing cursor which does not freeze.

For info, after booting into 128K mode; typing "PRINT PEEKL(163872)" correctly returns "262144" However, typing "dir QUB1" and "dir WIN1_" returns "not found" even with a working "QL_BDI.BIN" and "QL_WIN.BIN" on the SDcard (although maybe the hard disk driver requires more than 128K RAM).

Changing the value of Q68_RAMT to be 256*1024+1, boots showing 128K RAM on the F1/F2 screen. With this modified maximum RAM setting, shifted and unshifted F1/F2 combinations both work.

I have checked different values for Q68_RAMT, and 256*1024+1 is the only value that works on my Q68. Its a little odd, but I hope this sheds some light on the issue.

janbredenbeek commented 10 months ago

Hi execriez,

Are you loading Minerva directly from boot (Q68_ROM.SYS) or from SMSQ/E using the boot loader (LRESPR)?

What happens when you disable the QUB driver in the configuration menu? (the Win/QUB driver is not written by me and I don't have the source code).

execriez commented 10 months ago

Hi Jan, I load Minerva directly from boot via Q68_ROM.SYS

Unfortunately I haven't got a dev setup for Minerva at present. I tested various Q68_RAMT settings by patching the ROM directly. In v1.6 this is the value $01000001 at position $041E.

I haven't tried disabling the QUB driver via the configuration menu. I did try disabling the Win/QUB ROM by patching it out. In v1.6 I replaced the value $4afb0001 at position $D6D0 with $0

With the Win/QUB ROM disabled, Minerva boots straight into Monitor mode and skips the F1/F2 prompt. It then freezes. So it wasn't wholly successful as a test - as it threw up a different issue (skipping the F1/F2 prompt).

I notice that the extrarom discards a0 and gives it the value of 0, which it often is but sometimes isn't. Could this be why it skipped F1/F2?. Extension ROMs should preserve a0 and a3. Are you able to test a Minerva ROM without the win/QUB driver on non v1.05 firmware?

Thanks Mark

janbredenbeek commented 10 months ago

Hi Mark,

When I disabled the Win/QUB ROM my Q68 went straight into the Monitor mode but the keyboard did not freeze (I have an older firmware version as my Q68 is from the first batch). This puzzles me as I expected the F1/F2 prompt to appear normally (of course there will be no medium to boot from but that should occur after the prompt).

As for the preservation of A0/A3, as far as I know only A3 should be preserved and A0 will usually be used to print a message to channel #0 (which is the only channel open at the time), but I will take a look at it. (edit: I see now that A0 should be preserved too, this explains the quirk mentioned above. I will fix this asap).

I know about problems with v1.05 firmware and the keyboard; there have been some reports about it but others stated that versions from v1.3 onwards (which support the external keyboard interrupt) work fine on v1.5 firmware. There is a thread about it on the QL forum at https://qlforum.co.uk/viewtopic.php?p=51725#p51725.

You might try to patch out the instructions which test for the keyboard interrupt or try the older version which doesn't test for this interrupt (and uses only polled interrupt) at https://github.com/janbredenbeek/Minerva4Q68-legacy/tree/11a71cbb3d3c4d4ff309a1cf1823edf7ba53d0f5.

I'll come back to this later today or tomorrow.

Best Regards, Jan

janbredenbeek commented 10 months ago

Hi Mark,

I have released v1.61 which fixes the 'A0' bug on initialisation. I doubt whether it solves the keyboard problem, but I have a number of POKEs which could be applied for testing (all word-sized):

04fc-04fe: 4e71 (this disables the keyboard lock at the Minerva initialisation)

c0fe-c100-c102-c104-c106-c108: 4e71 c10a: 6026 (this disables the external keyboard interrupt test so only polled interrupts will be used).

Please let me know the results.

Best Regards, Jan.

execriez commented 10 months ago

Hi Jan, Thanks.

For info, keyboard interrupts don't crash QDOS Classic. However I'm chasing a really weird bug, and I can't rule out interrupts being the cause.

I'm heading out to friends for the new year now. I'll give your new ROM a go tomorrow morning.

Happy New Year Mark

execriez commented 10 months ago

Hi Jan,

I have just tested.

Firstly, I disabled the Win/QUB driver ROM to exclude that from the issue. Pressing F1 or F2 freezes at the BASIC screen. Pressing SHIFT F1 or SHIFT F2 works fine. So it's not the Win/QUB driver...

Next I tested with your patches Original v1.61 ROM < 000004f0 02 2b 00 1f 02 08 02 2b 00 7f 01 68 42 2b 01 44 < 0000c0f0 00 50 70 1d 4e 41 41 fa 0d 4a 29 48 00 10 50 f9 < 0000c100 00 01 c1 48 10 39 00 01 c1 48 6a 26 43 fa 0b 7e < 0000d6d0 46 df 4e 75 00 00 4a fb 00 01 46 34 00 2e 00 24 Patched v1.61 ROM > 000004f0 02 2b 00 1f 02 08 02 2b 00 7f 01 68 4e 71 4e 71 > 0000c0f0 00 50 70 1d 4e 41 41 fa 0d 4a 29 48 00 10 4e 71 > 0000c100 4e 71 4e 71 4e 71 4e 71 4e 71 60 26 43 fa 0b 7e > 0000d6d0 46 df 4e 75 00 00 00 00 00 00 46 34 00 2e 00 24

Same issue, Pressing F1 or F2 freezes at the BASIC screen. Pressing SHIFT F1 or SHIFT F2 works fine.

So it's probably not keyboard interrupts. It feels like it's getting another interrupt that isn't being serviced, although the fact that it works with 128K is odd.

BTW I had a go at getting a system together that can re-assemble the sources. From the readme, it seemed like it might involve quite a few steps.

However, an out-of-the-box install of QPC2 will re-assemble with just two changes to the Minerva sources. Add this at the start of the file /extrarom/link program extrarom_bin Add this at the start of the file /m/ROM/link [edited for typo] program m_ROM_bin

I'll keep tinkering.

janbredenbeek commented 10 months ago

Hi Mark,

I noticed that the hardware probing for keyboard interrupt capability was done in user mode... shame on me, this should really be done in supervisor mode with disabled interrupts since this might generate an interrupt before the initialisation routine (which is called in user mode) has a chance to link in the interrupt service routine. This might explain the lock-ups on newer firmware versions, though I'm a bit confused too why it does work in 128K mode.

So here is release 1.62, I'm looking forward for the test results... If lock-ups still occur, I might have to move the initialisation code from the extrarom section to Minerva itself so it can run from Minerva's hardware initialisation code without going into user mode first. But that's something I would rather avoid.

Best wishes for 2024, Jan

execriez commented 10 months ago

Hi Jan,

Yes it still locks up. I would hang-fire on the move of the initialisation rom... I can still do some testing at this end.

I've booted quite a few times and have a better description of what I see. Choosing F1 or F2 does not always crash immediately. Sometimes there is enough time to type 3 or 4 characters at the flashing cursor before the cursor stops dead.

As a test, I set up a XINT and POLL debug routine (see listing below). The routine flips a different square on screen depending on receiving XINT or POLL interrupts.

Choosing 128K at boot time... you can see the POLL square flickering constantly, and the XINT square flipping on and off as you press a key.

Choosing F1 or F2 at boot time... again you can see the POLL square flickering constantly, and the XINT square flipping on and off as you press a key. But almost immediately the POLL square stop flickering, and the system locks up.

I tested the debug routine again by poking out (disabling) the q68kbd_init call. At boot it automatically drops into monitor mode, the POLL square flickers for a short time then stops. The XINT square never flips.

From this I suspect that the issue is not with the keyboard init routine. The issue feels like interrupts are being disabled somewhere else and are not being re-enabled for some reason.

Regards, Mark

For info in case you want to see what I'm talking about, I include the test routine below:

I added debug init code after the other init calls in q68hw.asm

        bsr.s   q68kbd_init       ; initialise keyboard
        bne.s   initerr
        jsr     ser_init
        bne.s   initerr

        GENIF   Q68_M33 <> 0
        jsr     q68scr_init       ; initialise screen driver
        ENDGEN

        GENIF   DBGMSW = 1
        jsr     DBG_INIT
        ENDGEN

I added the debug routine at the end of q68hw.asm

        GENIF   DBGMSW = 1

DBG_INIT:

;  save register entry values

        movem.l  d1-d3/a0-a3,-(a7)

;  allocate space for linkage block

        moveq   #0,d2           ; owner is superBASIC
        move.l  #$28,d1         ; length
        moveq   #mt.alchp,d0
        trap    #1              ; allocate space
        tst.l   d0
        bne     DBG_EXIT        ; exit if error
        move.l  a0,a3           ; base of linkage block

;  enter supervisor mode and disable interrupts

        trap    #0
        ori.w   #$0700,sr       ; disable interrupts

; link in external interrupt to act on keyboard press

        lea     DBG_XINT(pc),a1 ; address of routine
        lea     SV_LXINT(a3),a0

        move.l  a1,4(a0)
        moveq   #MT.LXINT,d0
        trap    #1

;  link in polled task routine to handle keyboard

        lea     DBG_POLL(pc),a1 ; address of routine
        lea     SV_LPOLL(a3),a0
        move.l  a1,4(a0)        ; address of polled task
        moveq   #MT.LPOLL,d0
        trap    #1

;  enable interrupts and re-enter user mode

        andi.w  #$D8FF,sr
        moveq   #0,d0

DBG_EXIT:

;  restore register entry values and exit

        movem.l (a7)+,d1-d3/a0-a3
        rts

DBG_POLL:
        movem.l d0/d7/a0,-(a7)
        lea     131076,a0
        bsr.s   DBG_FLIP
        movem.l (a7)+,d0/d7/a0
        rts

DBG_XINT:
        movem.l d0/d7/a0,-(a7)
        lea     131072,a0
        bsr.s   DBG_FLIP
        movem.l (a7)+,d0/d7/a0
        rts

;  routine to flip 8 bytes on screen

DBG_FLIP:
        moveq   #8,d7

DBG_LUP:    
        move.w  #$FFFF,d0
        sub.w   (a0),d0
        move.w  d0,(a0)
        lea     128(a0),a0
        dbra    d7,DBG_LUP
        rts

        ENDGEN
janbredenbeek commented 10 months ago

Hi Mark,

Unfortunately, my Q68 is at firmware 1.00 which doesn't support the keyboard interrupt, so I could only see the polled interrupt. I could see the XINT square when using the serial port, but could not reproduce the lock-up.

Firmware v1.05 is also different in that it sets the XINT bit in the QL's interrupt register at $18021, which earlier versions don't. This is done to facilitate external interrupt handlers on Minerva and classic QDOS. I have changed Minerva's interrupt handler (in m_ss_int2.asm) to mimic SMSQ/E's behaviour (which only tests bit 3 of $18021 and executes the frame interrupt handler when set, and treats it as an external interrupt when not set). The only thing that might be different is the MOVE.B D7,$18021 at $0684 which is supposed to clear the external interrupt (at least on BBQL hardware) which is missing in SMSQ/E. I don't know if v1.05 does anything with it, but you might try to change this instruction to NOPs to see if it makes any difference.

It really puzzles me why this lock-up occurs, and why not when only 128KB RAM configured. It would help if we could monitor the SR register when it occurs, so we could determine if it's caused by the SR having interrupts disabled or maybe a hardware issue.

Unfortunately there is no CTRL-ALT-7 debugging facility in the Q68... One thing you could do is typing CTRL-TAB as quickly as possible, which switches to the second screen, so you could see if there is still some activity or the system is totally frozen.

One thing you mentioned is that lock-ups don't occur when using classic QDOS. I suppose you're using the original keyboard driver from the Q60 then since the current version depends on Minerva's vectored keyboard handling routines?

Best Regards, Jan.

execriez commented 10 months ago

Hi Jan,

Yes I 'm using a modified Q60 driver for the keyboard on QDOS Classic.

In Minerva on my Q68, a quick CTRL-TAB into the second screen shows activity which then almost immediately stops. Although frozen there is some colour crawl in a couple of places. This could be bits changing or it could be artefacts created by my VGA to composite converter. I use a normal TV for my VGA, I'll dig out a real monitor to check if I get a better view.

It may be that not locking up on 128K is unrelated. It's likely that the issue is hardware related since it only affects v1.05 firmware. Maybe it's due to a privilege violation.

I may be able to modify the privilege violation trap to confirm one way or another tomorrow. ...or I may be able to setup the trace vector to single step, to check register values and to check where it eventually fails. I'll let you know how it goes.

Regards, Mark

execriez commented 10 months ago

Hi Jan,

I haven't made much progress on the above as I'm having a bit of trouble when re-assembling the Minerva ROM.

If I re-assemble version 1.5, I get the same binary as in the 1.5 release... Minerva.bin md5 = 6a9d8fce01b4420efbafc92bf2441123

However if I re-assemble versions 1.6, 1.6.1 or 1.6.2, I get a different Minerva binary than the one in the release. Released ROM 1.6.x Minerva.bin md5 = a4d83920a40c9a20de341b9a4ff44ae2 Re-assembled 1.6.x Minerva.bin md5 = 92ae44d4f4273b5bdebce25a0df6c90a

Also, the re-assembled 1.6 ROM crashes (spectacularly) after the F1/F2 screen no matter what option is chosen.

Do you have any ideas as to what might be going on? It's definitely producing different code than in the the release... see below. Do you think it could it be a flag somewhere, or am I missing something obvious?

Thanks, Mark

The re-assembled Minerva ROM appears to begin to deviate from the released ROM at position 00006f50 in the SD_AREA routine.

Version 1.6 ROM RELEASED

00006F62 7000         moveq     #$0,d0
00006F64 246E007C     move.l    $7C(a6),a2
00006F68 082A00010019 btst      #$1,$19(a2)
00006F6E 45FA3D3C     lea       $ACAC(pc),a2
00006F72 6706         beq.s     $6F7A
00006F74 247900019018 move.l    $19018,a2
00006F7A 605C         bra.s     $6FD8
00006F7C 4EBAFF50     jsr       $6ECE(pc)
00006F80 246E007C     move.l    $7C(a6),a2
00006F84 082A00010019 btst      #$1,$19(a2)
00006F8A 45FA3D7C     lea       $AD08(pc),a2
00006F8E 6706         beq.s     $6F96
00006F90 247900019010 move.l    $19010,a2
00006F96 603C         bra.s     $6FD4
00006F98 246E007C     move.l    $7C(a6),a2
00006F9C 082A00010019 btst      #$1,$19(a2)
00006FA2 45FA3D26     lea       $ACCA(pc),a2
00006FA6 6706         beq.s     $6FAE
00006FA8 247900019008 move.l    $19008,a2
00006FAE 6024         bra.s     $6FD4
00006FB0 5700         subq.b    #3,d0
00006FB2 246E007C     move.l    $7C(a6),a2
00006FB6 082A00010019 btst      #$1,$19(a2)
00006FBC 45FA3D56     lea       $AD14(pc),a2
00006FC0 6706         beq.s     $6FC8
00006FC2 24790001900C move.l    $1900C,a2
00006FC8 082E00030034 btst      #$3,$34(a6)
00006FCE 6704         beq.s     $6FD4
00006FD0 08810000     bclr      #$0,d1
00006FD4 43E80036     lea       $36(a0),a1
00006FD8 2F04         move.l    d4,-(a7)
00006FDA 48E7C7C0     movem.l   d0-d1/d5-d7/a0-a1,-(a7)
00006FDE 4CA803DF0018 movem.w   $18(a0),d0-d4/d6-d7/a0-a1
00006FE4 7807         moveq     #$7,d4
00006FE6 C89F         and.l     (a7)+,d4
00006FE8 673E         beq.s     $7028
00006FEA 5504         subq.b    #2,d4
00006FEC 670C         beq.s     $6FFA
00006FEE 6A14         bpl.s     $7004
00006FF0 B647         cmp.w     d7,d3
00006FF2 6F34         ble.s     $7028
00006FF4 3607         move.w    d7,d3
00006FF6 6B2A         bmi.s     $7022
00006FF8 602E         bra.s     $7028
00006FFA DE49         add.w     a1,d7
00006FFC 6B2A         bmi.s     $7028
00006FFE 9687         sub.l     d7,d3
00007000 6320         bls.s     $7022
00007002 6022         bra.s     $7026
00007004 9689         sub.l     a1,d3
00007006 651A         bcs.s     $7022
00007008 B687         cmp.l     d7,d3
0000700A 6516         bcs.s     $7022
0000700C 3609         move.w    a1,d3
0000700E 5504         subq.b    #2,d4
00007010 6B14         bmi.s     $7026
00007012 9446         sub.w     d6,d2
00007014 630C         bls.s     $7022
00007016 D046         add.w     d6,d0
00007018 4A04         tst.b     d4
0000701A 670A         beq.s     $7026
0000701C C588         exg       d2,a0
0000701E B448         cmp.w     a0,d2
00007020 6304         bls.s     $7026
00007022 45FA000E     lea       $7032(pc),a2
00007026 D247         add.w     d7,d1
00007028 4CDF03F0     movem.l   (a7)+,d4-d7/a0-a1
0000702C 4E92         jsr       (a2)
0000702E 281F         move.l    (a7)+,d4
00007030 7000         moveq     #$0,d0
00007032 4E75         rts       

Version 1.6 ROM WHEN ASSEMBLED

00006F62 7000         moveq     #$0,d0
00006F64 45FA3CB6     lea       $AC1C(pc),a2
00006F68 6026         bra.s     $6F90
00006F6A 4EBAFF62     jsr       $6ECE(pc)
00006F6E 45FA3D08     lea       $AC78(pc),a2
00006F72 6018         bra.s     $6F8C
00006F74 45FA3CC4     lea       $AC3A(pc),a2
00006F78 6012         bra.s     $6F8C
00006F7A 5700         subq.b    #3,d0
00006F7C 45FA3D06     lea       $AC84(pc),a2
00006F80 082E00030034 btst      #$3,$34(a6)
00006F86 6704         beq.s     $6F8C
00006F88 08810000     bclr      #$0,d1
00006F8C 43E80036     lea       $36(a0),a1
00006F90 2F04         move.l    d4,-(a7)
00006F92 48E7C7C0     movem.l   d0-d1/d5-d7/a0-a1,-(a7)
00006F96 4CA803DF0018 movem.w   $18(a0),d0-d4/d6-d7/a0-a1
00006F9C 7807         moveq     #$7,d4
00006F9E C89F         and.l     (a7)+,d4
00006FA0 673E         beq.s     $6FE0
00006FA2 5504         subq.b    #2,d4
00006FA4 670C         beq.s     $6FB2
00006FA6 6A14         bpl.s     $6FBC
00006FA8 B647         cmp.w     d7,d3
00006FAA 6F34         ble.s     $6FE0
00006FAC 3607         move.w    d7,d3
00006FAE 6B2A         bmi.s     $6FDA
00006FB0 602E         bra.s     $6FE0
00006FB2 DE49         add.w     a1,d7
00006FB4 6B2A         bmi.s     $6FE0
00006FB6 9687         sub.l     d7,d3
00006FB8 6320         bls.s     $6FDA
00006FBA 6022         bra.s     $6FDE
00006FBC 9689         sub.l     a1,d3
00006FBE 651A         bcs.s     $6FDA
00006FC0 B687         cmp.l     d7,d3
00006FC2 6516         bcs.s     $6FDA
00006FC4 3609         move.w    a1,d3
00006FC6 5504         subq.b    #2,d4
00006FC8 6B14         bmi.s     $6FDE
00006FCA 9446         sub.w     d6,d2
00006FCC 630C         bls.s     $6FDA
00006FCE D046         add.w     d6,d0
00006FD0 4A04         tst.b     d4
00006FD2 670A         beq.s     $6FDE
00006FD4 C588         exg       d2,a0
00006FD6 B448         cmp.w     a0,d2
00006FD8 6304         bls.s     $6FDE
00006FDA 45FA000E     lea       $6FEA(pc),a2
00006FDE D247         add.w     d7,d1
00006FE0 4CDF03F0     movem.l   (a7)+,d4-d7/a0-a1
00006FE4 4E92         jsr       (a2)
00006FE6 281F         move.l    (a7)+,d4
00006FE8 7000         moveq     #$0,d0
00006FEA 4E75         rts       
janbredenbeek commented 10 months ago

Hi Mark,

Which branch are you building, the Main or Working branch? You should not build the Working branch as I'm using this for the experimental 16-bit colour bit feature, which is not finished yet.

Also, I noticed that there is a mincf file in the extrarom directory in the Main branch, which shouldn't really be there as the mincf file in the m_ directory should be leading. In the extrarom/mincf file the variable Q68M33 is set to 1, which enables code for the 16-bit colour feature. This is wrong, and should be set to 0. It's probably better to delete the mincf file from extrarom, or copy the mincf from the m directory (which has Q68_M33 set to 0) to extrarom if you experience errors. It hasn't occurred to me as I use the make_bas program to build the complete binary, but you might have used a different way to build which caused the wrong mincf to be used.

In case you're interested in the 16-bit colour feature: in the extrarom directory there are two files area16.asm and char16.asm which handle area operations (cls, scrolling, block etc.) and writing characters in 16-bit colour mode (it's actually only 8 colours since it uses the legacy QL palette). Other features like graphics are not implemented yet. As these routines don't fit into the 48K space, they are located in the extrarom area and called from Minerva via RAM vectors at the base of the fast RAM area ($19000). Of course, when these are not initialised properly the system is liable to crash...

(I have updated extrarom_mincf to be the same as m_mincf now).

EDIT: I just did a 'make_clean' on my own repository and also got a different extrarom and cs_area when re-assembling everything... the btst #1,$19(a2) instructions shouldn't be there as they test for 16-bit colour mode (but normally this bit is always reset). In the newly assembled Minerva they aren't there but somehow this makes the system crash. This is weird and I have to sort it out first. I'll come back to you later...

Best Regards, Jan

execriez commented 10 months ago

OK thanks,

The working branch was last updated on 20 September and is on version 1.5 The first version 1.6 I can find is in the main branch. I have been assembling using the main branch.

The latest working branch throws errors when assembling, but the initial commit on 16 August assembles to the Minerva ROM that is included in all the 1.5 releases.

Since I see the same issues with version 1.5, I'll use the 1.5 Minerva rom for the moment with the 1.6 extra ROM to do my testing.

One thing... I notice that "CALL 390," hangs on the F1/F2 screen on my Q68 hardware. CALL 390,262165 also hangs (this is the value of d1 if you choose SHIFT F1 for 128k) ...does this work on your earlier hardware? I'm wondering if this issue is also related.

For info, sometimes call 390... hangs with just the logo, sometimes with logo + F1/F2 message, and sometimes one of the ROM posting messages might appear before the hang...

Thanks Mark

janbredenbeek commented 10 months ago

Hi Mark,

I've found the reason for the difference in builds. I've experimented with the 16-bit display driver in September, but since it wasn't ready and I wanted to release a new build with serial support, I disabled the Q68_M33 feature in mincf. However, I forgot to do a makeclean before rebuilding so some mode-dependent code in cs and sd_ remained. This didn't become apparent because this code was never activated. However after I did a clean rebuild today the code was removed, after which a bug emerged in sd_curw (the cursor printing routine) which crashed the machine.

I have uploaded v1.63 now which is a clean rebuild with the 16-bit display code removed. The size of Minerva itself has shrunk by 160 bytes, so maybe we have enough room now to include some exception debugging code...

The Working branch is not up to date with the main branch and I wouldn't recommend building from it. However if you want to test the 16-bit display driver you may copy the area16.asm and char16.asm files to extrarom in the main branch, assemble these (you must also uncomment the commented lines in the link file) and re-assemble the rest with Q68_M33 set to 1. Be prepared for some strange things happening though (the Pointer Environment doesn't seem to work well in 16-bit mode, even when patched).

As for CALL 390, this works without problems on my machine. However I noticed that booting with 128K and then starting and quitting an application causes the system to freeze, even CTRL-TAB doesn't work anymore. Maybe it's caused by memory corruption, which is hard to debug...

Best Regards, Jan

execriez commented 10 months ago

Hi Jan,

Thanks, I've download your update and will take another look in the morning. I notice that you have updated "make_bas" so that make_clean clears out the "extrarom" directory too.

...Mark

execriez commented 10 months ago

Hi Jan, The version 1.6.3 sources assemble fine for me producing the same binaries as in the release. So I guess I now know my QPC2 setup is OK.

In contrast to previous versions,1.6.3 consistently freezes after the F1/F2 screen irrespective of what option is chosen. So I have been testing on version 1.5 as it lets me do some testing of the "call 390," issue.

With the help of the addition of some debug screen pokes at various locations in the "ss_int2" routine, I noticed that when "call 390" froze, external interrupts were triggering like crazy and not getting cleared.

I realised that at first boot, Q68 hardware interrupts get enabled. When we Call 390, we enter supervisor mode, disable system interrupts, check the memory, clear all the tables, re-enable system interrupts, then call the external ROM init routines. Unfortunately, as soon as we enable system interrupts an external interrupt is triggered by the Q68 hardware - because Q68 hardware interrupts are still enabled from the initial boot.

I fixed the "call 390" issue by disabling Q68 hardware interrupts at the start of the "ss_ramt" routine. I didn't like adding Q68 specific code outside of the extra ROM, but it doesn't look like there is any option.

I added this...

ss_ramt
        GENIF Q68 <> 0   

        ; Q68 hardware interrupts are enabled by the extra ROM at boot time. 
        ; So at 2nd boot during a "call 390", ints need to be disabled before we exit supervisor mode

q68_ethi        equ $1c040      ; Ethernet int enable address
kbd_status      equ $1c148      ; ps2 key status register
mouse_status    equ $1c168      ; mouse status register
uart_status     equ $1c208      ; ser port status
q68..txstat     equ 6           ; bit set to enable transmit interrupt
q68..rxstat     equ 7           ; bit set to enable receive interrupt

        sf      q68_ethi                    Disable Ethernet ints
        sf      kbd_status                  Disable keyboard ints
        sf      mouse_status                Disable mouse ints
        bclr    #q68..rxstat,uart_status    clear rx interrupt
        bclr    #q68..txstat,uart_status    clear transmit interrupt

        ENDGEN

This addition fixed the call 390 hang. I was hoping that this would also fix the main issue, but it didn't. The system still hangs after choosing F1 or F2.

The screen pokes I mentioned earlier prove that interrupts are not being triggered during the hang. That's about as far as I have got...

I'm back at work Monday, so its back to weekend coding. This could take some time to track down.

Best Regards, Mark

janbredenbeek commented 10 months ago

Hi Mark,

I have changed the interrupt routine; it will now scan the keyboard only on external interrupts and use polled interrupts only for autorepeat and when external interrupts are not active. I'm not sure if this fixes the problem but I noticed that SMSQ/E does the same. The patched 1.63.1 ROM image is in the extrarom directory (you need of course to combine it with the Minerva image and SD-card driver).

I've also noticed a strange bug in the SHIFT F1/F2 startup option; sometimes it restarts twice and then you'll get 28MB RAM rather than 128K! The Minerva system cannot handle this amount of memory and will crash sooner or later. So be cautious when using this option. I still have to sort out why this happens.

Most of the Q68 hardware initialisation is done in the ss_init module before interrupts are enabled and this should be equivalent to the instructions in ss_ramt you added. The question remains whether the hang occurs before or after the extrarom initialisation. Maybe I have to move the keyboard initialisation code into Minerva itself so it can be done before interrupts are enabled, but this will make things more complicated.

Best Regards, Jan

execriez commented 9 months ago

Hi Jan

I managed to get some testing done yesterday and today, although I still have no answers. I ran out of time, it's frustrating that I only have time to code at the weekends. I've condensed my notes below, as I've been through the assemble>test>debug cycle many times.

I can confirm that the keyboard still works OK after the changes you made. However I'm still getting the lock-ups.

Sorry, I didn't notice the q68 init section in ss_init. So the only thing missing in the existing routine is a "sf kbd_status" to disable keyboard interrupts.

As I mentioned, I have been testing with version 1.5 as version 1.6 was locking up even with SHIFT-F1. (see fix later) With a slight modification to the interrupt initialisation, and by moving the routine out of ss_init.asm and into ramt.asm the "Call 390" lock up during the first ROM posting can be fixed. It is necessary to put a "st pc_intr" before and after disabling ints otherwise "Call 390" on v1.5 locks up immediately after the ROM postings. I don't know why:

      GENIF Q68 <> 0          ; Q68 hardware initialisation
        st      pc_intr                     Clear ints - before
        sf      q68_ethi                    Disable Ethernet ints
        clr.b   kbd_unlock                  no key may be got
        sf      kbd_status                  Disable keyboard ints
        sf      mouse_status                Disable mouse ints
        sf      uart_status                 disable serial interrupts
        clr.b   pc_tctrl                    clear transmit control reg
        st      pc_intr                     Clear ints - after
        move.l  #q68_sramb+4,q68_sramb      first free space in fast sram mem
      ENDGEN * Q68 <> 0 * 

Adding the command "sf q68_reset" to the commands above caused SHIFT-F1 to hang on v1.5 which is the same behaviour I see on version 1.6x - So it seems that q68_reset is affecting behaviour.

If I put in a delay after doing the "sf q68_reset", then SHIFT-F1 works again, although F1 still doesn't.

ss_ramt
      GENIF Q68 <> 0          ; Q68 hardware initialisation (from SMSQ/E) dbgmsw

        sf      q68_reset
        move.w  #100,d0                     wait 2 secs...        
ss_initdelay
        btst    #3,pc_intr                  frame int?
        beq.s   ss_initdelay                no
        move.b  #$08,pc_intr                clear frame int
        dbra    d0,ss_initdelay

        sf      pc_intr                     clear ints
        sf      q68_ethi                    Disable Ethernet ints
        clr.b   kbd_unlock                  no key may be got
        sf      kbd_status                  Disable keyboard ints
        sf      mouse_status                Disable mouse ints
        sf      uart_status                 disable serial interrupts
        clr.b   pc_tctrl                    clear transmit control reg
        sf      pc_intr                     clear ints

        move.l  #q68_sramb+4,q68_sramb      first free space in fast sram mem

      ENDGEN * Q68 <> 0 * 

The above gave me hope of getting SHIFT-F1 working on version 1.6. Which it now does. I added the below to "ss_ramt_asm".

ss_ramt

      GENIF Q68 <> 0          ; Q68 hardware initialisation (from SMSQ/E)

        lea     q68_sramt,a3
        sf      q68_reset-q68_sramt(a3)                 reset Q68 hardware
        move.w  #100,d0                                 wait 2 secs...        
ss_initdelay
        btst    #3,pc_intr-q68_sramt(a3)                frame int?
        beq.s   ss_initdelay                            no
        move.b  #$08,pc_intr-q68_sramt(a3)              clear frame int
        dbra    d0,ss_initdelay

        st      pc_intr-q68_sramt(a3)                   clear any triggered ints
        sf      q68_ethi-q68_sramt(a3)                  block CP2200 interrupts v. 1.03
    clr.b   kbd_unlock-q68_sramt(a3)                no key may be got
        sf      kbd_status-q68_sramt(a3)                Disable keyboard ints
        sf      mouse_status-q68_sramt(a3)              Disable mouse ints
        sf      uart_status-q68_sramt(a3)               Disable serial interrupts
        clr.b   pc_tctrl-q68_sramt(a3)                  clear transmit control reg
        st      pc_intr-q68_sramt(a3)                   clear any triggered ints
;       st      led-q68_sramt(a3)                       LED on
        move.l  #q68_sramb+4,q68_sramb-q68_sramt(a3)    first free space in fast sram mem

      ENDGEN * Q68 <> 0 * 

I removed most of the interrupt init code from "ss_init_asm" but had to leave "st pc_intr " for some reason. If I didn't clear ints, then SHIFT-F1 locked up. So the contents of pc_intr at this point might be significant. I ran out of time so will investigate the value of pc_intr at this point later...

      GENIF Q68 <> 0          ; Q68 hardware initialisation (from SMSQ/E)
        st      pc_intr                 clear any triggered ints
      ENDGEN * Q68 <> 0 * 

All this didn't really get me any closer to getting F1 working. So I tried single stepping the code in trace mode to see if anything showed up

I increased the window size of #0, added some trace code to the extrarom, and enabled trace in "sb_start_asm" just before the return to BASIC.

rts4
        trap    #0      ; enter supervisor mode 
        ori.w   #$8000,sr       ; enable trace - dbgmsw
        andi.w  #$DFFF,sr   ; re-enter user mode

The single-step trace got all the way to the trap#3 IO.EDLIN command line input routine without incident. Typing into EDLIN eventually locked up. I attach a screen recording.

MinervaTrace

Again I ran out of time, so it will be next weekend before I look any further.

Best Regards Mark

janbredenbeek commented 9 months ago

Hi Mark,

Thanks for your very extensive debugging effort.

There are issues with some Q68 boards with v1.05 firmware, as mentioned on the QL forum (https://qlforum.co.uk/viewtopic.php?p=54633#p54633). You might want to check with Derek if your Q68 is affected by this.

I've tried to mimic SMSQ/E's initialisation as much as possible, though this might not always work out given the very different way it starts up compared to QDOS/Minerva. The q68_reset instruction was added to allow proper initialisation when 'warm' booting (e.g. from SMSQ/E) but I'm not sure if this is necessary; from your findings it looks that it's doing more bad than good. It's not used by SMSQ/E anyway. You must disable all interrupts (keyboard, serial, mouse, ethernet) though before it gets to sb_init, else it might lock up before the extrarom init is called.

I'm puzzled by the lock-up when typing a command line. Does it occur during typing or after you have pressed ENTER? I might have to check the cursor printing routine which is part of the scheduler task loop...

Best Regards, Jan

execriez commented 9 months ago

Hi, Yes, it's worthwhile me checking with Derek if my Q68 is affected.

The lock up happened as I was typing. The gif above is real-time and the cursor froze as I was typing PRINT

Superficially it appears that the longer it takes to process level 2 interrupts, the more likely it is that I get a freeze. So the more debug screen pokes I added, the more it froze - even before pressing F1 or SHIFT F1.

Mark

Mark

janbredenbeek commented 9 months ago

Maybe the freeze is caused by the Q68 being too long in the interrupt service code (> 20 ms)?

Anyway, I have the impression that POKEing something in the q68_reset port isn't of any use. When I do it from BASIC, nothing happens. What is your experience?

I'll try to build a new version before the next weekend with the q68_reset instruction removed. I am a bit reluctant to put hardware initialisation code into ss_ramt as ss_init is likely a more appropriate place.

I've seen your post on the QL forum, hope that a firmware update may be helpful in solving the issue. I'm also thinking about sending my Q68 to Derek for an update, however for me it's probably more hassle since I have to send it from abroad.

Best Regards, Jan

execriez commented 9 months ago

Just a quick update...

My Q68 has ID Q68-04-22 so is part of the batch with the new FPGA code programmed in. So the Minerva issue I'm seeing is unlikely to be related to the FPGA issue mentioned in the forum.

I probably agree about not moving code into ss_ramt from ss_init. It worked for me to get call 390 working but at the moment I have no idea why. Some kind of timing issue, which makes no sense as Interrupts are disabled in base_asm at 390 and should be disabled all the way through ss_ramt into ss_init.

Regards, Mark

janbredenbeek commented 9 months ago

Hi Mark,

I've had a conversation with Peter about the q68_reset port. This can be used for a hardware reset, causing it to disable all interrupts and rebooting the OS (whilst preserving RAM contents), but requires a magic word ($BAD0) to be written into location $1C024. So the 'sf q68_reset' instruction is wrong and should be deleted.

I have some doubts about the CALL 390 code. It executes a RESET instruction, which should reset the hardware on a real QL without resetting the processor itself, but I'm not sure how this works out on the FPGA in the Q68. Have asked Peter on the forum for that.

It's probably not a good idea to replace it with the 'magic word' reset for the Q68; I've tested it and my Q68 crashed spectacularly. It could be that some 68000 registers are corrupted by the hardware reset (CALL 390 puts a magic word in A5 and reboots Minerva, which uses the value in D1 for startup options if A5 contains the magic word at startup). Maybe the RESET instruction affects the registers as well, which would explain the strange memory size I sometimes get when booting with shift F1/F2 (the 128K boot is in fact a reboot via the CALL 390 code). I'll have to test this further.

EDIT: My bad, I got the CALL 390 parameter wrong... it does work with the magic word reset so I could replace the RESET with MOVE.W $BAD0,$1C024 and see how this works out. Still have to find out why shift-F1/F2 sometimes get 28MB though...

Best Regards, Jan

execriez commented 9 months ago

Hi Jan,

I've still had no success tracking the freeze down. I did note that SHIFT-F1 does lock up eventually, just not immediately or as quickly as if you choose F1

I know that interrupts are not triggering at all during a hang, otherwise the debug screen pokes I added would flash on and off. I need to somehow work out how much of the rest of the system is still running during a hang.

So today I have been tracing io_edlin to try and catch the point at which it hangs up. However, after 6 hours single step tracing to screen it still didn't hang. Almost like slowing the system down caused it to be more stable.

Oddly, although I specified #0 for the trace output, it ended up listing to #2 instead. ...which allowed me to see the cursor slowly drawing and undrawing on #0 During a hang the cursor stops dead.

I'll keep looking

BTW, should that be "MOVE.W #$BAD0,$1C024" rather than "MOVE.W $BAD0,$1C024"?

Best Regards Mark

janbredenbeek commented 9 months ago

Hi Mark,

Of course it should be MOVE.W #$BAD0,... ;)

I discovered a correlation between ROM version 1.6 and the '28MB shift F1/F2 issue' (which doesn't occur on v1.5). There might be a relationship with the lock-ups you're experiencing. It could be in the Minerva ROM scanning code or in the extrarom itself. I'll carefully take a look at the code, but as usual spare time is limited so it will take some days.

Best Regards, Jan

execriez commented 9 months ago

Hi Jan,

Just a quick update.

I modified my trace routine so that it didn't make use of any system calls when printing to screen so I knew for certain my trace wasn't interfering with any results.

I again set trace mode at the beginning of the io_edlin routine. If I enabled trace for the whole routine, it didn't hang. If I limited the number of instructions that were traced, then it did hang. I don't know why this is.

However, when I did limit the number of instructions traced it consistently hung at the same location.

The first screen shot shows the last 5 PC and SR registers after a hang, followed by registers d0-d7 on the left and registers a0-a7 on the right.

trace

You can see that it crashes when the Program Counter becomes "00000000", just after the instruction at $924.

The second screenshot shows a disassembly of the relevant section of code, which shows that $924 contains an RTE instruction.

disassembly

This is the RTE instruction at the end of the routine "ss_rj0" within "reshd.asm"

ss_rj0
        move.l  sv_jbpnt(a6),a0 find address of job in table
        move.l  (a0),a0         thus base of job

        move.l  jb_trapv(a0),sv_trapv(a6) redirect traps
        add.w   #jb_pc,a0
        move.l  (a0),-(sp)      put the program counter on
        move.w  -(a0),-(sp)     and status register
        move.l  -(a0),a1
        move    a1,usp          restore user stack pointer
        assert  jb_d0+8*4,jb_a0
        movem.l jb_d0-jb_a0-7*4(a0),d0-d7/a0-a6 and all other registers

        rte                     and go!

I'm not really familiar with the code but this appears to a routine for switching jobs.

The routine is running in supervisor mode, and pushes the jobs Program Counter and Status Register onto the stack and does a return from exception.

If the result of the RTE is to believed, then the Program Counter for this job is 0.

I don't yet understand why this routine would act differently on my hardware, why timing has any bearing on stability, nor why the job that is being switched to has 0 as its program counter.

I haven't looked any deeper than this at the moment. It may turn out to be a wild goose chase.

Best Regards Mark

janbredenbeek commented 9 months ago

Hi Mark,

Thanks for your extensive report. What is worrying here is the value of A7 (I assume this is the SSP). It should be around $28480, but here it's near RAMTOP at 16MB. Somewhere the SSP is getting corrupted.

As for the other registers, I noticed A0 pointing to the channel definition block of #0 and A3 pointing to the link of Minerva's CON driver (after adding the $18 offset). Interrupt bit 1 of SR is also set, which means that the Q68 was servicing an interrupt (probably frame interrupt, since that exits via the scheduler when the interrupt occurred during user mode).

I suspect the problem might be in one of the scheduler loop linked list routines which calls code in the CON driver (either the 'waiting I/O' task or the 'flash the cursor' task) so these should be subject for review. Apart from that, I will release a new build shortly to fix some other problems (I noticed a test for 68020 in ss_init which of course is useless on the Q68)

Best Regards, Jan

execriez commented 9 months ago

Thanks Jan.

At the moment I've hit a bit of a brick wall and I'm not sure where to go from here. Simply turning trace mode on makes the code act differently, which makes tracing very difficult and makes you question the results of the trace.

For example, you can consistently get to a stable flashing cursor by defining a trace routine that consists of a single RTE instruction, and by then turning trace mode on at the beginning of the routine io_edlin - i.e.

Define a trace routine at the end of extrarom/q68hw.asm

TRACE_INIT
    movem.l d0/a1,-(a7)

    trap    #0          ; enter supervisor mode
    ori.w   #$0700,sr   ; disable interrupts

    lea     TRACE_BEGIN(PC),a1
    move.l  a1,$24

    andi.w  #$D8FF,sr   ; enable interrupts and re-enter user mode

    movem.l (a7)+,d0/a1
    RTS

TRACE_BEGIN:
    rte                 ; trace routine is a single command that just returns from the exception

Put a call to the init routine just after the M33 screen init code in extrarom/q68hw.asm

    GENIF   Q68_M33 <> 0
    jsr     q68scr_init ; initialise screen driver
    ENDGEN

    jsr     TRACE_INIT

Enable trace mode at the start of the io_edlin routine in od/conx.asm

io_edlin
    ori.w   #$8000,sr   ; enable trace

This allows F1 to get all the way to a stable command line just like SHIFT F1 already does. I'm struggling to see how a software bug could give this kind of symptom.

Best Regards Mark

janbredenbeek commented 9 months ago

After watching your previous video, I conclude that the SSP is not being corrupted. The value near RAMTOP is just that from the USP.

I have released v1.64 today which fixes a number of bugs. You might want to try it though I don't know whether it will fix your freeze problems.

Best Regards, Jan

janbredenbeek commented 8 months ago

Another update...

I have upgraded the firmware of my Q68 to v1.05 using an FPGA programmer.

Using v1.64, I didn't experience lock-ups. However, I noticed that HOTKEYs didn't work as expected (they were effectively ignored, though INKEY$ returned correct values). It appears that the HOTKEY system uses a Polled task which monitors the current keyboard queue for ALT-key combinations. This task appears in the list after all the other polled tasks, which ensures that it can read the keyboard queue before any job does. Unfortunately, this only works when using a polled interrupt for keyboard input. When using external interrupt, the keyboard input will be entered into the queue asynchronously to the polled task, so this input might be 'eaten' by another task or job before the HOTKEY task has a chance to get it.

For this reason, I have reverted to using only polled interrupt for scanning the keyboard like pre-1.3 versions did. I don't expect any performance penalty from this and according to Peter Graf the external interrupt facility was only added because a few users using buggy USB to PS/2 converters had issues with the keyboard.

I've had some occasions when the Q68 crashed or froze when loading Minerva from the SMSQ/E bootloader. It turned out that I had the 40 MHz SD-card clock option in SMSQ/E enabled, leading to data corruption. So it's best to disable this option (unfortunately, in current SMSQ/E distributions it's enabled by default). The SD-card driver supplied with Minerva doesn't use it either.

Best Regards, Jan

execriez commented 8 months ago

Hi Jan, I wonder if disabling interrupts at the start of the LVL2 int routine would allow for external ints and HOTKEY to co-exist. No lockups on your v1.05 firmware does sound promising. However the difference in behaviour seems to indicate that my model may have issues. I'll investigate.

Best Regards, Mark

janbredenbeek commented 8 months ago

Hi Mark,

I know about one other user having this problem, he has sent his Q68 back to Derek to be reprogrammed to an earlier firmware version.

Do you have the keyboard connected straight to the Q68, or are you using a mouse/keyboard splitter? When I use a splitter and connect the keyboard to the corresponding connector on the splitter, the Q68 freezes at the boot loader screen, even if I have no mouse connected. I have to reverse keyboard and mouse plugs to make it work. Also, be sure that the 5V power supply is sufficiently stable since the keyboard is drawing directly from it.

The problem with external interrupts and the HOTKEY system could perhaps be solved by copying the polled task from the HOTKEY system to the external interrupt task. But I don't think there will be a noticable performance gain in using external interrupt for keyboard (Peter says it was only added because a few users using buggy USB to PS/2 converters had issues with timing).

Apart from that, external interrupts use the same L2 vector as polled interrupts so disabling the L2 interrupt would disable polled interrupts too (and on entry the SR is already set to ignore interrupts from the same or lower level). Only an interrupt from a higher level (L3 to L7) would be able to override this - I wouldn't expect this from the FPGA in the Q68 but I don't have insight in its inner workings.

Finally, an interrupt from the serial port, mouse port or Ethernet port could cause a freeze without a proper external interrupt handler, but these should all be disabled by the ss_init code before the external ROM initialisation is called.

BTW, are you using a mouse and do you load its driver from the BOOT file?

Best Regards, Jan

execriez commented 8 months ago

Hi Jan, I have a Keyboard connected direct to the Q68 with no mouse.

The SD driver fails to read the disk images from SD so it doesn't get to the point of loading a BOOT file. Could this indicate that 40 MHz SD-card clock option is enabled by default?

I load Minerva direct from Q68_ROM.SYS on SD rather than via SMSQ/E and I have the appropriate images on disk called QLWA.WIN, QL_BDI.BIN and QL_WIN.BIN.

I see your point in not having to disable INTs because the interrupt disable mask value in the SR should prevent further LVL2 interrupts. I wonder if the SR is modified anywhere in the LVL2 routine (i.e. ANDI.W #XYZ,SR) that might momentarily reset the interrupt disable mask causing LVL2 ints to be interrupted and stack up...

My system works fine with SMSQ/E and QDOS Classic so it's all a bit odd.

Best Regards Mark

janbredenbeek commented 8 months ago

Hi Mark,

I've discovered that loading from SMSQ/E (using second fat16 partition) sometimes corrupts the loaded image, even if I have 40MHz SD clock disabled. This is of course undesirable and needs to be sorted out. I've reverted already to SMSQ/E 3.38 but that didn't solve the problem.

I'm now considering reprogramming the FPGA tomorrow or next Monday to version 1.02 to see if this solves the issue. I don't know if this might be related to your problem but I've never experienced this when my Q68 was still at v1.00.

Best Regards, Jan

execriez commented 5 months ago

Hi Jan, I heard recently that this issue may affect a small batch of Q68 due to dodgy FPGA chips supplied from a Chinese PCB company. The problems that I have been experiencing with Minerva indicate that my Q68 is unfortunately one of that batch.

A solution if affected is to get the Q68 reprogrammed to version 1.02.

I have had my Q68 reprogrammed and Minerva now works fine. I am unable to make use of external keyboard interrupts, but at least I now have a working Minerva system.

So I am closing this issue since it is appears to be caused by hardware problems rather than the Minerva software. Thanks for your time, much appreciated.

Best Regards Mark

janbredenbeek commented 5 months ago

Hi Mark,

Thanks for the clarification. This was really head-scratching as I couldn't reproduce it with any firmware version.

Does Derek and/or Peter know about this issue?

Be sure to upgrade SMSQ/E to the latest version (3.41) as earlier versions have a bug in the FAT device driver, causing data corruption when using 16-bit SD card access.

I've reverted to using only polled interrupt in the latest version, as using external interrupt causes problems with the Hotkey system (see release notes for v1.65).

Best Regards, Jan

execriez commented 5 months ago

Hi Jan, Yes, the info about the FPGA came from Peter via Derek. It did cause some very strange unpredictable behaviour.

QDOS Classic worked fine for me on 1.05 firmware except for one command... Open#5;"CON_" more often than not gave an "out of range" error. When I single-step traced the command it worked fine. It's not a problem for me now that it's reprogrammed.

It was a bit of a head-scratcher though.

Best Regards, Mark