Open agn453 opened 5 years ago
I managed to replicate this issue with a slightly different setup. I'm running on Debian 9.5 and using Xephyr for the X server. Under VMS I'm using TCP/IP services rather than Multinet. I seem to able to get further because I can login to DECwindows and launch applications. It all appears to be running normally but when I check Wireshark I can see the duplicate ACKs and re-transmissions.
Looking through vax_xs.c I found a number of obvious problems, particularly with chaining of larger packets, however having fixed these it doesn't seem to have made any difference. The debug for the LANCE should work (although I found problems there too). Did you remember to specify the destination for the debug messages?
sim> set console debug=xs_debug.log
sim> set xs debug
or you can even use
sim> set console debug=stdout
Thanks Matt.
I looked into this during the weekend too and found similar symptoms when using ftp with large files - so it's not just X11 traffic.
What I think is happening is a burst of large received packets is overrunning the simulated receive buffering faster than the emulation can process them. I've only just started comparing the code logic in vax_xs.c with that in the pdp11_xu.c (DEUNA/DELUA) and pdp11_xq.c (DEQNA/DELQA) modules since these don't cause similar symptoms with the VAXserver 3900 and VAX-11/780 emulators. Perhaps some of the changes in the latter need to be implemented in the vax_xs module too. Hopefully I'll be able to look more into this by the end of the week.
Regarding debugging - I was trying set xs debug
as I've not done debugging of this kind with SIMH in a very long time (expecting output via stderr to the command console by default). Then I discovered the set debug debug.log
command and it all sprang to life!
Also, thanks for the pointers you provided on the simh mailing list to the LANCE AM2990 datasheets and the VAXstation 2000 Technical Manual.
sim> set console debug=stdout
Wow. I haven't seen or used that command syntax forever. The "set console debug=XXXX" form of the equivalent "SET DEBUG XXXX" command.
The earlier syntax still works, but it is deprecated.
Looking through the vax_xs code, I see the potential for incoming packet corruption and/or packet loss. Specifically, the loop which fills buffer descriptors in system memory stops filling them when all of them are full. This is good. What is not good, is that if only part of a packet has made it into some buffer descriptors, but the whole packet hasn't been completed, the received packet which had partial data is discarded and the partial packet is left in the received buffer descriptors. How the OS would make sense of the partial packet is one question, and the dropped partial packet would certainly be require TCP to recover it.
Unlike the hardware in early Ethernet devices, there is no rush to empty received buffers from the sim_ether layer. Real hardware could easily miss incoming back to back packets quite often. The sim_ether layer will not loose data unless it isn't serviced for at least seconds.
Once a packet has been received via eth_read(), care should be taken to make sure that all of that data makes it into the simulated OS memory without discarding any packets or any part of a packet. If the OS driver isn't capable of picking up part of a packet and then realize that the receive descriptor list has been completely consumed, and it has to wait for the rest of a partial packet, then the logic which fills receive buffer descriptors needs to look ahead at the available receive buffers to make sure that sufficient space is available before it inserts any part of a packet.
In order to keep packets flowing while they're really available, the scheduling of xs_svc() should be either 1) short duration waits when receive data is available and things are just waiting for a recieve buffer descriptor, or 2) clock synchronized polling when data hasn't recently been flowing.
Unfortunately I haven't had time to look at this until now.
I can confirm packet corruption on transmit from the simulator by using ftp to transfer a text file from the simulated machine (I compared MULTINET TCPDUMP output on the simulator with Wireshark's capture of the transfer). Transferring files into the simulator using ftp seems to succeed though.
The file I'm transferring is a text file of 80 character lines with the line number and space filled. You can see the Line count in the second 1498 byte packet jumps from line 23 to 207 in the attached screenshot.
I'll look at the code in vax_xs.c some more - in particular the transmit ring buffer processing.
Similarly I haven't looked a this for about a week but should be able to get back to it soon. I've been looking closely at the receive side and as far as I can tell from comparing the debug against Wireshark traces, no packets are being dropped and they are all being transferred to VAX memory. I haven't looked so closely at the transmit side yet so maybe there are some problems there.
I've gone through the receive side too and not seen anything amiss (and I've not seen the 32 entry receive ring buffer become full).
The transmit ring buffer is only 8 entries - and as far as I can see it doesn't become full either. My suspicions are around the TXR_OWN bit in the status register and how the VMS Ethernet driver determines it can transmit another frame. Matt - maybe you can investigate this from the VMS sources. Is the transmit happening too fast? Perhaps the write callback routine is where the TXR_OWN bit should be toggled (after a successful frame write).
Context
Hello Mark (and Matt).
This is going to be a bit of a story...
I'm trying to nail down an issue with "spurious dots for displayed space characters" and corrupted display titles from the GPX graphics console using the vaxstation3100m38 emulator (OpenVMS VAX V7.3 with DECwindows MOTIF V1.2-5 and Multinet V5.5).
To convince myself that it's a GPX issue and not something with the DECwindows MOTIF installation I thought I'd try connecting using Multinet V5.5's XDM session manager using a Xcursion V7.2.177 running under Windows-XP in a VirtualBox emulation. I've previously used this environment to access a DECwindows MOTIF environment as a remote display from a SIMH VAX running on a Raspberry Pi.
I am unable to connect to the DECwindows MOTIF login screen on the vaxstation3100m38 emulator from Xcursion. It partially displays the login screen (without the "COMPAQ" logo from DECwindows MOTIF V1.2-5), delays a bit then disconnects.
I have compared the Multinet V5.5 configuration with a working system that I use with a SIMH VAXserver 3900 emulation and it all seems configured correctly.
Examining the Multinet V5.5's "multinet show/conn=proc/send=name" output, I see a large number of unacknowledged SndQ items to the X11 TCP/IP port on the emulated VAXstation. This usually indicates a connectivity issue for TCP.
Firing up Wireshark on my Mac and capturing the packets between the Xcursion system and the emulated VAX, and I see TCP duplicate ACKs, TCP Out-of-Order, retransmitted packets (when the packet size is 1498 bytes) to the TCP/IP X11 port (6000). Small packets and other protocols (eg. DECnet and LAT) don't seem to be affected.
I've built the vaxstation3100m38 emulator under Microsoft Visual Studio 2015 (Community version) and am running it under a debugging instance on a PC running Windows 10 1903. However, the "set xs debug" capabilty in the vaxstation3100m38 emulator seems to be not implemented so I'm stuck trying to compare what should have been sent to the Ethernet with the Wireshark trace.
I realise Matt Burke has probably much more on his to-do list for the new emulators - so I don't expect this issue to be remedied soon. I can offer help with testing and debugging and can provide the Wireshark trace if you need see it too. My level of expertise with the LANCE hardware and emulation is close to nil though!
the output of "sim> SHOW VERSION" while running the simulator which is having the issue
how you built the simulator or that you're using prebuilt binaries
Debug build under Microsoft Visual Studio Community 2015
the simulator configuration file (or commands) which were used when the problem occurred.
the expected behavior and the actual behavior
you may also need to provide specific pointers to data files that may be necessary to demonstrate the problem