rcornwell / sims

Burroughs B5500, ICL1900, SEL32, IBM 360/370, IBM 7000 and DEC PDP10 KA10/KI10/KL10/KS10, PDP6 simulators for SimH
http://sky-visions.com
95 stars 20 forks source link

KA10: Halt in SALV #296

Closed larsbrinkhoff closed 1 year ago

larsbrinkhoff commented 1 year ago

I'll make some notes here about the current SALV problem.

Occasionally during the ITS build, the TRAN subroutine in SALV will halt. It can look like this:

SYSENG RFN    13     OK
SYSNET NETWRK 266    OK
SYSTEM BITS   117    OK
SYSTEM CH10   DEFS1  OK
SYSTEM CH11   DEFS1  OK
SYSTEM CHAOS  290    
HALT instruction, PC: 220323 (HALT 220323)
sim> 

The address 220323 can be resolved by checking SALV BIN. It's READ3+5:

MREAD3: TRNE C,DATREQ
         CONSZ MTC,7            ;SKIP IF TM10A
          JRST MREAD2           ;TM10B OR NO DATA REQUEST
        DATAI MTC,(B)
        SKIPE SHORTL
         JRST 4,.

SHORTL is cleared in two places and set in one. First, in the rewind subroutine. I don't think this is active since SALV is busy reading files from the tape.

REW:    CLEARM EOFCNT
        CLEARM MAGBFP
        CLEARM SHORTL
        CONO MTC,NOOP1  ;CLEAR INTERRRUPT FLAGS
        CONO MTC,REWIND ;INITIATE REWIND

Second, the tape read subroutine. It's likely this is being called when reading files.

MREADA: CONO MTC,REED
MREAD2: CONSO MTS,DATREQ+EOFF+JOBDON+EOTF       ;WAIT FOR NEXT DATA REQUEST
         JRST .-1
        MOVEI C,20.
        SOJG C,.
        CONI MTS,C
        TRNE C,EOTF
         JRST MREOT
        TRNN C,JOBDON
         JRST MREAD3
        TRNN C,EOFF
         JRST MREAD6
        AOS EOFCNT
        SETOM EOFLG
        CLEARM SHORTL
        JRST MREAD9

Finally, it's set further down in MREAD:

MREAD6: SETOM SHORTL
MREAD9: CONSZ MTC,7             ;SKIP IF TM10A
         JRST MREADB
        HLLZS B
        MOVNS B
        ADDM B,MAGBFP
        JRST MREAD4
larsbrinkhoff commented 1 year ago

I hacked the build script to stop and loop around after SALV finishes successfully, or output a trace if it halts.

Apparently, we take the SOGJ loop after MREAD2, then JRST MREAD3, and there fails the SHORTL skip. But this also happens before a successful skip.

Checking uses of SHORTL, it's normally just checked by the skip instruction and not updated. Out of 5,000,000 instructions in the trace, SHORTL is only updated seven times. So occasionally it's set in MREAD6 and shortly after (461 instructions) cleared in MREAD2.

MREAD2 clears SHORTL if CONI MTS signals an EOFF condition. I assume that's end of file. When SALV halts, this doesn't happen.

larsbrinkhoff commented 1 year ago

trace.zip

larsbrinkhoff commented 1 year ago

Here's what normally happens.

220300   000000000064  222457     476000222457  777777777777  300004  476000222457  SETOM 0,222457
220301   000000000000  000007     734300000007  000000000000  300004  734300000007  CONSZ 340,7
220325   000000000000  000001     734600000001  000000000001  300004  734600000001  CONO 344,1
220326   000002222571  000100     734740000100  000000000100  300004  734740000100  CONSO 344,100
220330   000000000013  440000     734700440000  000000000000  300004  734700440000  CONSZ 344,440000
220332   000002222571  020600     734740020600  000000000000  300004  734740020600  CONSO 344,20600
220250   000000000000  560200     734200560200  000000560200  300004  734200560200  CONO 340,560200
220251   256011000010  000007     734340000007  000000000000  300004  734340000007  CONSO 340,7
220260   000000000000  562200     734200562200  000000562200  300004  734200562200  CONO 340,562200
220261   000002222571  014101     734740014101  000000000000  300004  734740014101  CONSO 344,14101
220261   000002222571  014101     734740014101  000000010100  300004  734740014101  CONSO 344,14101  <<<
220265   000000000002  000003     734640000003  000170011142  300004  734640000003  CONI 344,3       <<<
220276   000000000064  222457     402000222457  000000000000  300004  402000222457  SETZM 0,222457
220301   000000000000  000007     734300000007  000000000000  300004  734300000007  CONSZ 340,7
220325   000000000000  000001     734600000001  000000000001  300004  734600000001  CONO 344,1
220326   000002222571  000100     734740000100  000000000100  300004  734740000100  CONSO 344,100
220330   000000000000  440000     734700440000  000000000000  300004  734700440000  CONSZ 344,440000
220332   000002222571  020600     734740020600  000000000000  300004  734740020600  CONSO 344,20600
220250   000000000000  560200     734200560200  000000560200  300004  734200560200  CONO 340,560200
220251   256011000010  000007     734340000007  000000000000  300004  734340000007  CONSO 340,7
220260   000000000000  562200     734200562200  000000562200  300004  734200562200  CONO 340,562200
220261   000003222572  014101     734740014101  000000000000  300004  734740014101  CONSO 344,14101
220261   000003222572  014101     734740014101  000000000001  300004  734740014101  CONSO 344,14101
220265   000000000760  000003     734640000003  000175000001  300004  734640000003  CONI 344,3
220317   000000000007  000007     734300000007  000000000000  300004  734300000007  CONSZ 340,7
220322   000000000064  222457     000000000000  000000000000  300004  332000222457  SKIPE 0,222457

But before halt.

220300   000000000064  222457     476000222457  777777777777  300004  476000222457  SETOM 0,222457
220301   000000000000  000007     734300000007  000000000000  300004  734300000007  CONSZ 340,7
220325   000000000000  000001     734600000001  000000000001  300004  734600000001  CONO 344,1
220326   000002222571  000100     734740000100  000000000100  300004  734740000100  CONSO 344,100
220330   000000000000  440000     734700440000  000000000000  300004  734700440000  CONSZ 344,440000
220332   000002222571  020600     734740020600  000000000000  300004  734740020600  CONSO 344,20600
220250   000000000000  560200     734200560200  000000560200  300004  734200560200  CONO 340,560200
220251   256011000010  000007     734340000007  000000000000  300004  734340000007  CONSO 340,7
220260   000000000000  562200     734200562200  000000562200  300004  734200562200  CONO 340,562200
220261   000002222571  014101     734740014101  000000000000  300004  734740014101  CONSO 344,14101
220261   000002222571  014101     734740014101  000000000001  300004  734740014101  CONSO 344,14101
220265   000040372700  000003     734640000003  000175000001  300004  734640000003  CONI 344,3       <<<
220317   000000000000  000007     734300000007  000000000000  300004  734300000007  CONSZ 340,7      <<<
220322   000000000064  222457     777777777777  777777777777  300004  332000222457  SKIPE 0,222457
larsbrinkhoff commented 1 year ago

In the first run, CONI returns with JOBDON and EOFF set. This causes SHORTL to be cleared.

In the second run, CONI returns with DATREQ set. SALV doesn't like that.

larsbrinkhoff commented 1 year ago

@rcornwell suggested SALV may be working as intended, and that the tape is in error. That turned out to be the case. There's an EOF tape mark missing after a file on the tape. Supposedly that's why SALV halts, although I haven't confirmed this. But it does halt before processing the next file.

So the question now is: why is the tape malformed? It's written by DUMP under ITS. DUMP will happily list all files from the bad tape; apparently it doesn't care about the missing mark. itstar also doesn't detect any problem.

larsbrinkhoff commented 1 year ago

I ran the build script in a loop to the point where it writes the reboot.tape image and had it break when the tape is missing an EOF mark between files. I had mta debug turned on. I'm attaching the debug output and the tape file.

As you can see, the tape is missing between SYSTEM; CH11 DEFS1 and SYSTEM; CHAOS 290. Below are some lines I grepped out from the log file. I grepped for "Record Writ", "XXX", and "WTM". XXX are my own annotations to show the first record from the two files. There is a WTM between the two, so there should be a mark. CH11 DEFS is a short file, so it's just one record. CHAOS 290 is a longer file, so the first record is 1024 words, or hex 1400 frames.

DBG(11340477055)> MTA STR: MTA0 Record Write len: 00000B09
XXX SYSTEM; CH11 DEFS1
DBG(11340477055)> MTA STR: MTA0 Record Written len: 00000B0A
XXX SYSTEM; CH11 DEFS1
DBG(11340479792)> MTA DETAIL: MT0 WTM
DBG(11340482155)> MTA STR: MTA0 Record Write len: 00000000
DBG(11342638172)> MTA STR: MTA0 Record Write len: 00001400
XXX SYSTEM; CHAOS 290
DBG(11342638172)> MTA STR: MTA0 Record Written len: 00001400
XXX SYSTEM; CHAOS 290
DBG(11344790550)> MTA STR: MTA0 Record Write len: 00001400

I'm puzzled, because everything seems right, and the WTM log line indicates sim_tape_wrtmk is called.

larsbrinkhoff commented 1 year ago

Hmm, an oddity here:

DBG(11340479792)> MTA DETAIL: MT0 WTM
DBG(11340482155)> MTA STR: MTA0 Record Write len: 00000000

Nowhere else does it say "Record Write len: 00000000".

larsbrinkhoff commented 1 year ago

Another run has the same anomaly. There's a WTM and then two 0-length records, and then the first (and here, only) record of a file. Yet, the image file has no tape mark here. In fact, the last record of the previous file seems missing. (It's as if writing the 0-length record erases records backwards.)

DBG(11882976663)> MTA STR: MTA0 Record Write len: 0000007D
DBG(11882976663)> MTA STR: MTA0 Record Written len: 0000007E
DBG(11882978258)> MTA DETAIL: MT0 WTM
DBG(11882980625)> MTA STR: MTA0 Record Write len: 00000000
DBG(11882985654)> MTA STR: MTA0 Record Write len: 00000000
DBG(11883687221)> MTA STR: MTA0 Record Write len: 00000672
DBG(11883687221)> MTA STR: MTA0 Record Written len: 00000672
larsbrinkhoff commented 1 year ago

It would be nice to see the CONO/DATAIO around the write 0

BG(11882979060)> MTA CONO: MT CONO 347 control 70200 0 476372406424 000000000000
DBG(11882979067)> MTA CONO: MT CONO 343 start 60200 0 21 000000060221 000000000000 PC=030135
DBG(11882979067)> MTA EXP: Setting status 000000000002
DBG(11882979205)> MTA CONO: MT CONO 343 start 64200 0 21 000000064221 000000000000 PC=030272
DBG(11882979212)> MTA CONI: MT CONI 346 status2 000170000040 0 000000000040 PC=030111
DBG(11882979218)> MTA CONI: MT CONI 346 status2 000170000040 0 000000000040 PC=030114
DBG(11882979234)> MTA CONI: MT CONI 346 status2 000170000040 0 000000000040 PC=030155
DBG(11882980205)> MTA EXP: MT0 Init write
DBG(11882980625)> MTA STR: MTA0 Record Write len: 00000000
DBG(11882980625)> MTA DETAIL: MT0 Write 0
larsbrinkhoff commented 1 year ago

Here's the SIMH debug log, and the reboot.tape file that was created.
http://lars.nocrew.org/tmp/debug.tgz

larsbrinkhoff commented 1 year ago

The patch seems to fix it. It held up 60 runs.