open-simh / simh

The Open SIMH simulators package
https://opensimh.org/
Other
470 stars 89 forks source link

VAX qe interface has problems on macOS, when compiled with -O1 or -O2, works when DEBUG=1 is set #387

Open NCommander opened 3 months ago

NCommander commented 3 months ago
PING 4.2.2.2 (4.2.2.2): 56 data bytes
64 bytes from 4.2.2.2: icmp_seq=0 ttl=255 time=769335973.549 ms
wrong data byte #8 should be 0x8 but was 0xc0
        38 1 f6 52 4 2 2 2 c0 a8 0 65 0 0 57 54 b8 74 0 0 66 63 b7 e5 0 b e6 df 8 9 a b 
        c d e f 10 11 12 13 14 15 16 17 18 19 1a 1b 
64 bytes from 4.2.2.2: icmp_seq=0 ttl=255 time=769337983.549 ms (DUP!)
wrong data byte #8 should be 0x8 but was 0xc0
        38 1 f6 51 4 2 2 2 c0 a8 0 65 0 0 30 42 b8 74 0 1 66 63 b7 e6 0 c d ef 8 9 a b 
        c d e f 10 11 12 13 14 15 16 17 18 19 1a 1b 
--- 4.2.2.2 ping statistics ---
2 packets transmitted, 1 packets received, 1 duplicates, 50.0% packet loss
round-trip min/avg/max/std-dev = 999999.999/769336978.549/769337983.549/1005.000 ms

# uname -a
OpenBSD vax55.my.domain 5.5 kbuild#7 vax

This problem was discovered running distcc from OpenBSD in an effort to rebuild the OS a bit faster, but shows up in other network related ways. I'm using NAT attachment on macOS. I get similar network weirdness in NetBSD, although I haven't had a network delock on NetBSD like I have had in multiple versions of OpenBSD.

Compiling Open SIMH was DEBUG=1 causes the network stack to behave.

I tried running SIMH on NetBSD/arm64 as an additional data point, but it kept hanging trying to load the KA655 self-test.

pkoning2 commented 2 weeks ago

I'm confused. The title speaks of Mac OS, but the description keeps talking about OpenBSD.

Could you explain more precisely what OS you're building on, with what tools, and what network option (pcap, vde, etc.). What simulator are you building, and what are you running on that simulator, and what is the test that demonstrates the issue?

NCommander commented 2 weeks ago

I'm building on Mac OS, with the MicroVAX 3900 simulator, running OpenBSD 5.8/vax in the simulator.

The problem is that when SIMH is built with -O2, the qe driver breaks, with the OpenBSD kernel messages put above. I'm using distcc in OpenBSD/vax to handle compiling the base system. The errors generally start after a few miutes of network activity, and then networking stops working in the simulator entirely.

The problem largely seems to happen when there are a large number of TCP/IP connections open and closing within a short period, but I'm not sure if that's what actually triggering it.

Building SIMH with -O0 causes the network within OpenBSD/vax to work correctly, and I can use distcc with multiple simulators to great effect. From the behavior, and messages I'm seeing, this feels like an alignment issue, but it only happens when the system is under load.

pkoning2 commented 2 weeks ago

Thanks. One possible answer is that it's an OpenBSD issue related to timing of the emulated device. The emulator runs much faster than the real hardware, so if there are timing bugs they can appear in simulation even if the bug is impossible to reach on the original hardware.

hbent commented 1 week ago

I agree that unless this can be replicated on real hardware, it isn't a SIMH issue. I haven't seen any similar issues with Ultrix, 4.2BSD, NetBSD, etc. and I've used all of them in a fairly heavily loaded way for extended periods of time.

NCommander commented 1 week ago

I've had similar problems with NetBSD as said above; the same fix, compiling DEBUG=1/-O0 resolved it there. So I can reproduce this with different OSes in SIMH.

I'll test more when I get a chance, but from memory, I did have stable network on SIMH with the same OS image on different host platforms; I only had broken networking on macOS as a host platform.