open-simh / simh

The Open SIMH simulators package
https://opensimh.org/
Other
473 stars 89 forks source link

microvax2 can't boot Ultrix-32m V1.2 #409

Closed hbent closed 2 weeks ago

hbent commented 2 months ago

Using a fixed version of the disk set on Bitsavers, microvax2 can't boot Ultrix-32m V1.2, either from the install disk set or from a fully installed disk image. The error is the same from both methods. An installation succeeds with microvax1; V1.2 should have full uvII support and the uvII is specifically referenced in the kernel on the first installation disk.

sim> show version MicroVAX II (KA630) simulator Open SIMH V4.1-0 Current Simulator Framework Capabilities: 64b data 64b addresses Threaded Ethernet Packet transports:PCAP:TAP:VDE:NAT:UDP Idle/Throttling support is available Virtual Hard Disk (VHD) support RAW disk and CD/DVD ROM support Asynchronous I/O support (Lock free asynchronous event queue) Asynchronous Clock support FrontPanel API Version 12 Host Platform: Compiler: GCC 14.1.1 20240622 Simulator Compiled as C arch: x64 (Release Build) on Jun 24 2024 at 21:43:18 Build Tool: simh-makefile Memory Access: Little Endian Memory Pointer Size: 64 bits Large File (>2GB) support SDL Video support: SDL Version 2.30.3, PNG Version 1.6.43+apng, zlib: 1.3.1 PCRE RegEx (Version 8.45 2021-06-15) support for EXPECT commands OS clock resolution: 1ms Time taken by msleep(1): 1ms Ethernet packet info: libpcap version 1.10.4 (with TPACKET_V3) OS: Linux aelfric 6.9.8-gentoo #2 SMP PREEMPT_DYNAMIC Thu Jul 11 01:12:51 EDT 2024 x86_64 Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz GenuineIntel GNU/Linux Processor Name: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz tar tool: tar (GNU tar) 1.35 curl tool: curl 8.8.0 (x86_64-pc-linux-gnu) libcurl/8.8.0 OpenSSL/3.2.2 zlib/1.3.1 zstd/1.5.6 c-ares/1.32.1 libpsl/0.21.5 nghttp2/1.62.1 git commit id: a944a4bc+uncommitted-changes git commit time: 2024-06-18T11:22:16+0200

config file: set cpu idle=ULTRIX-1.X

set cpu autoboot

att nvr microvax2.nvr set tti 7b set tto 7b set cr dis set lpt dis set dz disabled set rl dis set rq0 rd52 att rq0 rd52-mv2.dsk set rq1 rx50 att rq1 new/1.img set rq2 rx50 att rq2 new/3.img set rq3 dis set ts dis set tq dis set xq dis boot

Changing basics like memory size, RQDX2 vs. RQDX3, enabling and disabling basic devices, etc. doesn't seem to change anything. Switching to the other SIMH branch doesn't change anything either.

What happens is this:

KA630-A.V1.3

Performing normal system tests.

  5..4..3..

Tests completed.

>>> boot dua1

  2..1..0..

ra: I/O error
super block read error
Exit called
THE PROCESSOR CAN NOW BE HALTED

HALT instruction, PC: 000B1830 (BRB B1830)

What is expected is a close analog of this, from microvax1:

.2..1..0.
Boot
: ra(1,0)vmunix
211052+99036+52116 start 0x13d0
Ultrix V1.2 System #1: Wed Feb 19 22:16:59 EST 1986
real mem  = 4190208
avail mem = 3168256
using 95 buffers containing 418816 bytes of memory
MicroVAX-I, Dfloat Microcode, level = 5
Q22 bus
rqd0 at csr 172150 vec 774, ipl 17
ra0 at rqd0 slave 0
ra1 at rqd0 slave 1
ra2 at rqd0 slave 2
NO LOOPBACK
WARNING: -- CHECK AND RESET THE DATE!

Starting installation of ULTRIX-32m

Attached are 1.img and 3.img; rd52-mv2.dsk is expected to be a blank image at install time. ult12install.zip

markpizz commented 2 months ago

Well this is somewhat odd. When booting the DUA1 (RX50 disk), the MicroVAX I first reads LBN 1 (for 0x200 bytes). The MicroVAX II does the same. The MicroVAX I then reads LBN 0 (for 0x200 bytes) and then LBN 1 (for 0x1E00), then LBN 10 (for 0x2000 bytes) and then other stuff. The MicroVAX II reads LBN 1 (for 0x200 bytes) and then LBN 1 (for 0x200 bytes) again then LBN 0 (for 0x200 bytes) and then LBN 1 (for 0x1E00) followed by LBN 10 (for 0x2000 bytes). That is all the MicroVAX II does.

The RX50 has a Ultrix partition table (found at LBA 31), but neither system directly reads this data (it is read as a consequence of the LBN 10 (for 0x2000 bytes).

Is it possible that the MicroVAX II was only installable via tape (TK50)?

Is it possible that there might be another MicroVAX II boot ROM that could work?

Is it possible that with the right boot ROM (not the one we already have) there might be a Boot flag value (passed via R5 or BOOT DUA1/{R5:}flags. That might influence this?

hbent commented 2 months ago

There are two older boot ROMs here: https://microvax2.org/wp/rom-images . I believe that the files provided need to be interleaved back into a single image to work with SIMH, but I don't know what the offsets for doing that are.

It looks like the bootstrap might be ignoring R5 entirely. At least none of the values that I tried changed anything, including things like 8 and 9 which you would expect to work.

hbent commented 2 months ago

As to the TK50 booting question: I have no idea. It took quite a bit of trial and error to figure out what disks were expected to be in what locations in order to boot on the microvax1, because as far as I know no hardcopy documentation exists for any of the 32m distributions. Further complicating things, DEC chose to omit manpages and other documentation from the online distribution in order to save space. The installed distribution only has bootra and raboot in /usr/mdec, so there isn't a straightforward way to create a bootable tape.

What makes me think that this should work is that an installed distribution also doesn't boot on the microvax2 sim, and with the same error. At that point there are no differences in what is installed - both platforms use the same bootblocks, load the same generic kernel, etc.

There are Ultrix 2.0 sources for raboot (first stage, in assembler) and bootra (second stage, in c); the revision history indicates that they should be very, very similar to what was in 1.2. Let me know if they would be helpful and I can attach them here.

markpizz commented 2 months ago

There are two older boot ROMs here: https://microvax2.org/wp/rom-images . I believe that the files provided need to be interleaved back into a single image to work with SIMH, but I don't know what the offsets for doing that are.

I'm was quite sure that we're merely looking at a byte for byte interleave between each of the ROM images. This conclusion is supported by the fact that the end of all of these ROM images contain 0xFF's. I would expect that mixing one or the other first should produce at least a slightly workable complete 64K image. Doing that should result in a combined image which contains some text strings which we find in the repo contained ka630.bin, but that is not the case when trying to combine any of those ROM images...

Leaving the ROMs aside and, as you point out, looking at later Ultrix versions, does Ultrix 2.0 install on the MicroVAX2 simulator?

hbent commented 2 months ago

I didn't realize that the already interleaved ROM (rev AH) is not the same as the one that SIMH uses (and that I can use that as a template for how to interleave the other ROM files). A quick test with that file, which identifies itself as V1.2:

sim> load -r ka630.bin
sim> boot

KA630-A.V1.2

Performing normal system tests.

  5..4..3..
?0  004FF600  00000500
Failure.

Normal operation not possible.
Infinite loop, PC: 20041E57 (BRB 20041E57)

Ultrix V2.0 does boot on microvax2:

>>> boot dua0

  2..1..0..

Ultrixboot (using VMB version 13)

Loading (a)vmunix ...

Sizes:
text = 498700
data = 182696
bss  = 190068
Starting at 0x3485

Ultrix V2.0-1 System #4: Tue Jun 2 17:48:32 EDT 1987
real mem  = 16769024
avail mem = 15267840
using 20 buffers containing 163840 bytes of memory
MicroVAX-II with an FPU
Q22 bus
...

Note that they switched how the bootloader works in V2.0 - the disk bootloader loads /ultrixboot rather than loading the kernel directly.

hbent commented 2 months ago

With the other two ROMs from that site:

sim> load -r ka630-A1-EF.bin
sim> boot

KA630-A.V1.3

Performing normal system tests.

  5..4..3..
?2
Failure.

Normal operation not possible.
Infinite loop, PC: 20041EE0 (BRB 20041EE0)
sim> load -r ka630-AA-AF.bin
sim> boot

KA630-A.V1.3

Performing normal system tests.

  5..4..3..
?2
Failure.

Normal operation not possible.
Infinite loop, PC: 20041EE0 (BRB 20041EE0)

Attached is a zip file with all three ROMs. uv2roms.zip

markpizz commented 2 months ago

How/where did you get those combined ROM images?

Meanwhile, If you look at VAX/ka630_patch.com, you'll see the change(s) needed to let the ROM pass (skip) the boot diagnostics under the simulator. That would certainly explain some of these failures. Some digging would be required to reproduce the changes in the ROM patch set for each of these ROM images.

hbent commented 2 months ago

They're from the site I mentioned above, https://microvax2.org/wp/rom-images. I interleaved them from the raw files byte by byte with this shell script:

paste -d '\0' <(xxd -p -c1 $1) <(xxd -p -c1 $2) | xxd -p -r > $3

SIMH's rom patch is fairly straightforward, I'll see what I can do with these new ROMs.

hbent commented 2 months ago

Here's what I have for now: The A1-EF and AA-AF ROMs are identical, and also identical to SIMH's VAX/ka630_orig.bin, so no need to worry about them. ka630-AH.bin is V1.2 and needs to be patched. The VMS patch seems to be quite powerful and unfortunately I'm not aware of anything like it on UNIX. What needs to happen, as far as I can tell, is REPLACE/INSTRUCTION 02F3C = 'MOVAL W^00003163,B^04(R11)' 'BRW 03728' and then the checksum needs to be updated; I have no idea how the checksum is determined so I don't know what to change it to. I believe the checksum in the V1.2 ROM is at 0x9598 instead of 0xB888 in V1.3.

markpizz commented 2 months ago

You should be able to squash the 3 bytes 31 F5 36 starting at offset 02F3C to achieve the patch you mention. BRW 03728 is a PC relative word displacement branch that is the same no matter where it is located and thus 31 F5 36 is that instruction. The VMS patch verifies that you you say you want to replace is really there before it actually does the replace. If you're sure you can merely replace it. The instruction you're replacing is more than 3 bytes which VMS patch would fill with NOP instructions that really don't matter since there wouldn't ever be a branch into the middle of that instruction.

From the poking around that I did, I noticed that the checksum seems to be easy enough to find by starting at the end of the ROM image moving backwards skipping over all of the FF's until you get to the 2 bytes that are the checksum. Once you know that address, I would compute the ROM address that data will be located at and put a OS (not simh) breakpoint when the data at that address is fetched (in the ROM memory access code, maybe adding a simple compare instruction for that specific value to have a simple place for your breakpoint). Once you know the PC that fetched that address, you can look at what the instructions that compare the fetched data. Looking at the value it is compared to lets you know what you should be setting the value to...

hbent commented 1 month ago

This turned out to be much easier to do, at least for me, with a CPU instruction trace from SIMH. What happens is that the ROM is walked, the bytes that are fetched are rotated, and then at the end the final checksum is compared to what's at 0x9598:

20040083 041F0004| CMPW R1,(R0) 0000EAA7 000019D8

so there's the checksum, and I patched the ROM by changing 0x9598 from D819 to A7EA.

Now I get a different failure:

sim> boot

KA630-A.V1.2

Performing normal system tests.

  5..4..3..
?1
Failure.

Normal operation not possible.
Infinite loop, PC: 20041E57 (BRB 20041E57)

It looks like there is another test that needs to be skipped. Can you give me some pointers on how to do that?

markpizz commented 1 month ago

I have no idea how Matt actually got the details he used while creating his original patches.

\Lacking some source code for this version of the ROM, I can only think of a very hard process to detect and avoid the potential failure. A full capture of the instruction trace that produces the failure and carefully analyzing when the diagnostic decides that a test failed and then goes off to report it. Having an unmodified ROM run and report it's failure and comparing the back end of those activities vs the failure on your modified code will likely have common logic reporting the error. Seeing what happened in the few instructions prior to the error reporting activities will likely be what was found wrong. This will be tedious and painful. Good Luck. Anyone else can jump in and make suggestions...

hbent commented 1 month ago

I turned on RQ debugging on both microvax simulators (with the default ROMs) to try and get a better idea of what's happening. Upon booting from an installed 1.2 distribution on an RD52, both sims read LBN 1 for 0x200 (microvax 2 does it twice, maybe to compare them?), LBN 0 for 0x200, LBN 1 for 0x1E00, LBN 10 for 0x2000, and then microvax1 continues on to LBN 38 while microvax2 gets an RQ HBE. This matches what Mark saw from the floppy boot. So I suppose the question is, why is the RQ on the microvax2 getting an error? I checked again and there is no difference between what happens with an RQDX2 and an RQDX3, and no difference whether or not other devices are configured on the RQ controller.

This also matches what happens when booting Ultrix 2.0 - LBN 1, LBN 0, LBN 10, and then on to later portions of the disk. Note that all of this is well before the read of the kernel starts, or even a read of the directory entry for the root of the filesystem.

If it would help I'd be happy to post the installed disk image. Unfortunately I'm not sure how to proceed with debugging here.

markpizz commented 1 month ago

... LBN 10 for 0x2000, and then microvax1 continues on to LBN 38 while microvax2 gets an RQ HBE.LBN 10 for 0x2000, and then microvax1 continues on to LBN 38 while microvax2 gets an RQ HBE.

What does HBE mean?

This also matches what happens when booting Ultrix 2.0 - LBN 1, LBN 0, LBN 10, and then on to later portions of the disk. Note that all of this is well before the read of the kernel starts, or even a read of the directory entry for the root of the filesystem.

I'm pretty sure that the logic supporting Ultrix boot in the ROM does not have any detail that understands details about the Ultrix file system. As I vaguely recall from looking at VMB code long ago, there is support for a "boot block" boot, whereby, when the ROM doesn't find an ODS2 file system with the expected VMS boot pieces), it attempts a boot block boot where if the boot block has the correct format, it gets loaded from the beginning (or near the beginning) of the disk and the loaded code is a secondary bootstrap which includes knowledge of things like the file system and whatever is needed to get the OS into memory. There is supposed to be a VMB R5 flag bit which explicitly requests a boot block boot, but later VMB versions (ROM based at least) fall back to attempting one in any case.

hbent commented 1 month ago

... LBN 10 for 0x2000, and then microvax1 continues on to LBN 38 while microvax2 gets an RQ HBE.LBN 10 for 0x2000, and then microvax1 continues on to LBN 38 while microvax2 gets an RQ HBE.

What does HBE mean?

Host Bus Error. Here's the very end of the RQ debug log:

DBG(22560694)> RQ REQ: 1FE0 1A C6 00 00 00 00 00 00 26 26 00 00 1A C6 00 00 ........&&......
DBG(22560694)> RQ REQ: 1FF0 0E 88 00 00 32 64 00 00 34 AE 00 00 0C 3E 00 00 ....2d..4....>..
DBG(22560694)> RQ TRACE: rq_hbe
DBG(22560694)> RQ TRACE: rq_rw_end
DBG(22560694)> RQ REQ: rsp=20A1, sts=0069

In all other boot scenarios, there is no error:

DBG(6803972)> RQ REQ: 1FE0 1A C6 00 00 00 00 00 00 26 26 00 00 1A C6 00 00 ........&&......
DBG(6803972)> RQ REQ: 1FF0 0E 88 00 00 32 64 00 00 34 AE 00 00 0C 3E 00 00 ....2d..4....>..
DBG(6803972)> RQ TRACE: rq_rw_end
DBG(6803972)> RQ REQ: rsp=00A1, sts=0000

This matches what the console is printing out when the microvax2 boot fails - the controller is giving us a hard I/O error. That's the part of this that I don't understand - why the controller works on the microvax1 and not 2.

I'm pretty sure that the logic supporting Ultrix boot in the ROM does not have any detail that understands details about the Ultrix file system. As I vaguely recall from looking at VMB code long ago, there is support for a "boot block" boot, whereby, when the ROM doesn't find an ODS2 file system with the expected VMS boot pieces), it attempts a boot block boot where if the boot block has the correct format, it gets loaded from the beginning (or near the beginning) of the disk and the loaded code is a secondary bootstrap which includes knowledge of things like the file system and whatever is needed to get the OS into memory. There is supposed to be a VMB R5 flag bit which explicitly requests a boot block boot, but later VMB versions (ROM based at least) fall back to attempting one in any case.

What happens on Ultrix 1.x is basically what happens on 4.2BSD. The ROM loads the 512 byte raboot (or whichever boot program is appropriate) from lbn 0, which is a simple loader that loads the next 7.5K from the disk (starting from lbn 1; called bootra in this case). At that point we have a basic disk driver and that's the source of the errors after the next read.

crwolff commented 1 month ago

This matches what the console is printing out when the microvax2 boot fails - the controller is giving us a hard I/O error. That's the part of this that I don't understand - why the controller works on the microvax1 and not 2.

From the 4.2BSD sources in sys/stand: uda.c: ubinfo = ubasetup(io, 1); and ('reg' is QBus mapping register and 'o' is offset) uba.c: return ((bdp << 28) | (reg << 9) | o); finally

uda.c: char bda_flg = 0;       /* If set then booting from BDA */
uda.c:  mp->mscp_buffer = (bda_flg? (long)io->i_ma: (ubinfo & 0x3fffff) | (((ubinfo>>28)&0xf)<<24));

So, BDP (buffered data path) is being or'd into the memory address at offset 24 and sent to the MSCP device.

When simh attempts to copy the data from the MSCP device it calls rq_writew which calls Map_WriteW(uint32 ba, int32 bc,... Map_WriteW checks and discovers ba is too large (0x010000C8 in this case) and returns an error. The solution is to mask off everything above 22 bits since QBus only has 22 address bits.

hbent commented 1 month ago

Thanks! I can confirm that this patch does fix the issue, I am able to boot from both the installation floppies and an installed RD52.

pkoning2 commented 2 weeks ago

The fix has been merged.