inmbolmie / 5250_usb_converter

Converter to plug an IBM 5251 terminal to a Linux PC via USB emulating a VT52 terminal
GNU General Public License v3.0
34 stars 6 forks source link

Screen gets stuck #6

Open blackbit42 opened 3 years ago

blackbit42 commented 3 years ago

This happens with an IBM 3196-B10 and a factory Twinax cable and validated correct hardware termination setting. The terminal appears to behave normally in test mode (pressing space while powering on). After starting 5250_terminal.py and powering on the terminal, the bash prompt comes up reliably and accepts input. Though, after a short number of key-presses and according output on the screen, communication gets stuck and nothing further is displayed on the screen. debug logs are attached, which I examined, but don't really know what to look for. debug.log write.log read.log

inmbolmie commented 3 years ago

I'll try with my 3196 emulator and look at the provided files. Debug log is pretty low level and is difficult to interpret.

inmbolmie commented 3 years ago

I see the problem, after the second "ls" command is like the terminal gets stuck forever at the "busy" state when sending the command output, I don't know if it's due to the controller not allowing the keystroke to be acknowledged or to another condition, difficult to know without having the terminal...

Cannot reproduce the problem with the 3196 emulator though, it never gets stuck. What is your command line?

I will try to thing something to make the keystroke acknowledge more robust.

imagen

inmbolmie commented 3 years ago

Did you try to set DEFAULT_SLOW_POLLING = true?

It would be useful to know if that improves the situation.

inmbolmie commented 3 years ago

Please, try this branch. This forces a responseLevel change before sending commands to the terminal. May help or may make it worse...

https://github.com/inmbolmie/5250_usb_converter/tree/hotfix/issue-6

blackbit42 commented 3 years ago

I tried branch hotfix/issue-6. No noticeable change. Below are the logs captured with this commandline: python3 5250_terminal.py -c -i -k -t /dev/ttyACM1 3:5250_US:1 debug.log write.log read.log

inmbolmie commented 3 years ago

Isn't working as intended... looks like the change in responseLevel takes way too much time for some reason. Have still one piece missing. What do you see on the screen? In this last example, did it display correctly the output of the "ls" commands and then hang in the middle of the "steam locomotive" like this?

imagen

I have updated the branch with just another change, now the host forces a responseLevel change before sending output commands as well as when receiving scancodes. Please, try it to see if it improves anything.

blackbit42 commented 3 years ago

Everything is slower now, but it still gets stuck. debug.log read.log write.log Video of terminal is at: https://ahuemer.xx.vu/volatile/2020-09-27-0DtTcIgzLTw/5250_1.mp4

inmbolmie commented 3 years ago

It is slower because for some reason it keeps taking a long time for updating its status, so forcing to wait for an update makes it slower. But it is somehow unrelated to the hang itself as it is still happening... No idea what to try now, I'll think about it.

Is it the first time that you are trying with a real terminal? I ask because I had the impression that you made it work prior to this problem, as you was contributing to the project code etc, so I was assuming that it was working for you at some point.

blackbit42 commented 3 years ago

My setup didn't work correctly so far. I held back with reporting this particular issue until now, as I wasn't sure if the hacked-up cabling I had was the issue. Since I am using a proper Twinax cable since recently, I am confident that this cannot be it. The contributions I submitted so far were possible with a partly working setup just fine.

I am not sure if I should suspect a problem with my terminal, the hardware of the adapter, the firmware in the adapter of the host-side code. Since I do not have an AS/400 or whatever to hook the terminal up to, there is not much I can do to validate it is functioning correctly when connected to a different host system.

inmbolmie commented 3 years ago

Not completely sure, for a cabling problem I would expect a lot of frame losses and exceptions. Looking for that I see not exceptions, but it is true that I see some sequences like this:


WRITING POLL:PL
[EOTX]
WRITING POLL:PL

This happens only in your logs and not in mine. The host is sending a POLL command to the converter, and no response back from the terminal is received, so a further POLL is sent. There are like some dozens of those errors. The communication between host and adapter also should be OK as I don't see any lost response [EOTX] at that level, for every POLL sent there is a response from the adapter in your logs, simply some of them don't have the attached terminal response.

Then the issue with the status updates is weird because my 3196 emulator card is just crap and much slower than a real terminal, and even so the status updates are instant and never hangs. I compare side to side the sent commands from both logs from your 3196 and my 3196 and are exactly the same, cannot spot any difference in that regard, so the same commands are sent in both cases.

If you are able to program the converter from the Arduino software, there is a couple of things we can try, like modifying the following parameters in the 5250_interface.ino file:

const int WAIT_CYCLES_RX = 30000; //Time we wait for a response frame after transmission
const int WAIT_CYCLES_RX_PENDING_TX= 5000; //Wait time for a response frame if not reception expected

We can try to double or triple them, giving more time for responses to be received from the terminal. That will cover the case when there are no transmission errors as such but we are not waiting enough time for terminal responses and so we lost some of them.

The sampling interval also can be adjusted:

const int WAIT_CYCLES_RX_SAMPLE = 85; //Cycles between signal samples, approx 8Mhz for 75 cycles (125ns) but can be increased slightly to reduce the probability of incorrect sampling due to clock drift

I didn't find this parameter to be very sensitive, but you can try with some different values between 75 and 90 if nothing else works

inmbolmie commented 3 years ago

Another hint I forgot to mention. The Teensy has an internal red led that would light on if there are reception problems over the Twinax link. Can be directly seen under the vent holes, or also you can open the converter box (4 screws in the bottom under the rubber feet) an look at it.

In normal operation the led should be always off, but if it lights on constantly while in operation it could indicate that there are problems at that level.

blackbit42 commented 3 years ago

The red LED doesn't come on. I played with the values you mentioned above, but no cigar. WAIT_CYCLES_RX = 90000 WAIT_CYCLES_RX_PENDING_TX= 15000 This did not make a difference. I then additionally changed WAIT_CYCLES_RX_SAMPLE. 80 and 90 seem to be a little bit better than 85, it didn't work at all with 75. Logs: https://ahuemer.xx.vu/volatile/2020-09-27-LFMmgIR1aPs/

I observed that there are glitches occasionally. E.g. in sl sometimes characters are written to the screen that don't belong there. In seq 100 I saw a blank line once. I also appears like the cursor is moving in a weird way. seq write consecutive lines. Sometimes it appears like the cursor is moving straight down from the number it just wrote, then miraculously the new number appears on the left of it, then the cursor moves to the left. All pretty weird.

inmbolmie commented 3 years ago

I was able to "reproduce" more or less your problem adding code to simulate randomly transmission errors in the converter firmware. If I simulate that one every 500 messages is lost, the behaviour is similar to yours. So it seems this is most probably a hardware than a software problem. Same pattern, some POLL messages end unanswered and the terminal gets stuck when an "end of command queue" message ("bM" in the logs) is missing because the previously queued commands never get executed. That is also compatible with what you mention about missing or misplaced characters.

You already said you checked for proper termination, just in case if you have a multimeter to check, as seen from the converter end connector you should see slightly more than 55 ohms between each terminal and the connector shield and slightly more than 110 ohms between the two terminals. The same as seem from the terminal end (without plugging the autotermination dongle). I imagine you have this type of dongle with your terminal:

imagen

To fine tune transmission rate you can play with the parameter:

const int WAIT_CYCLES_TX = 300; //Half-bit duration for transmission

But i doubt it will be of any help. You can add or substract time in increments of 5 units to see if it has any effect.

blackbit42 commented 3 years ago

I did some measurements. Terminal -> Controller: 59 Ohm from left lead to ground 56.8 Ohm from right lead to ground 59 Ohm between the leads

Controller -> Terminal: 29kOhm from left lead to ground 300kOhm from right lead to ground 580kOhm between the leads

I guess that's a pointer to the problem... The connector on the board is good, also the solder connection of the connector to the board. Can you suggest what to focus on? 5250_terminal_board Is the black block behind the connector some kind of magnetics?

inmbolmie commented 3 years ago

Terminal -> Controller: 59 Ohm from left lead to ground OK 56.8 Ohm from right lead to ground OK 59 Ohm between the leads Not OK, Are you sure about this? Maybe you are shorting one lead to the connector, should be 110 Ohm

Controller -> Terminal: Are you testing with the autoterminator plugged in? It should be. The input impedance of the terminal itself is irrelevant in the measure as it is much higher than the termination resistance, unless there is a short or something, that seems not to be the case.

Just to clarify the measures:

Terminal -> Controller: without termination dongle, only controller and cable Controller -> Terminal: with termination dongle, terminal, dongle connected to terminal and cable connected to dongle

The black box you mention could be a relay, or maybe an optocoupler, who knows...

blackbit42 commented 3 years ago

I repeated the measurement based on your guidance, this time with proper hook leads instead on pointy ones. Terminal -> Controller: 57.6 Ohm from left lead to ground 56.7 Ohm from right lead to ground 115 Ohm between leads

Controller -> Terminal (with auto-terminating dongle, I didn't have that attached before) 59.3 Ohm from left lead to ground 55.7 Ohm from right lead to ground 114.5 Ohm between leads

inmbolmie commented 3 years ago

Thanks, that should be ok then... The spec says there is a 2% tolerance in impedance difference between leads, you measure a 6% but I don't think it should be a problem, maybe the spec is too rigurous.

Please, could you can try, just to be 100% safe this measure?

Terminal with dongle-> Controller: (dongle + cable + controller, measured as if you were the terminal). Should be approx: 30 Ohm from left lead to ground 30 Ohm from right lead to ground 57 Ohm between leads

If that is OK... 100% sure you have correct termination.

Apart from that, did you try to play with the last parameter I mentioned?

const int WAIT_CYCLES_TX = 300; //Half-bit duration for transmission

If that doesn't work another idea I have is to try to reduce the command rate, but I don't have much hope on that.

blackbit42 commented 3 years ago

30.9 Ohm from left lead to ground. 30.3 Ohm from right lead to ground. 56.4 Ohm between leads.

I wanted to give WAIT_CYCLES_TX a whirl, but now something else is wrong. :-( Every time I start the terminal, the test screen comes on. The booklet that came with the terminal describes this kind of issue and instructs to replace the keyboard. Though,

Maybe some component on the board was half-dead before and died entirely now?

inmbolmie commented 3 years ago

OK for the impedance measured from the terminal

About the sudden malfunction... well, that is something. I doubt very much that to be a keyboard issue. Watching the board image and being 80's IBM stuff, the first thing that comes to my mind are the 3-legged tantalum capacitors. In your image I think that C2, C5 and C7 are of this kind. If you are able to desolder them you can test the board perfectly without them, as they are filtering capacitors and usually are not strictly needed. Those capacitors are well known to blow up or malfunction, specially if the piece of equipment has remain unused for years https://www.youtube.com/watch?v=RNKSut-C5XE

Also capacitors C8 and C9 are on the way of the power rails, but i'm not sure of what kind of capacitors are those, maybe the have some other markings I cannot see.

You can also try to find the +5 and -5 power rails and test them with the multimeter to see if the voltages are on spec and stable. I think that the power supply is at the monitor and the voltages are feeded through the monitor connector. This thick trace should be one of them:

imagen

The other may be on the other side of the board, I'm pretty sure there should be +5 and -5 volts rails, and the -5 volts line has to reach the square metal canned IBM labelled IC, as this is the chip that manages the Twinax link and a negative voltage is required to generate the signals.

For the ground reference maybe you can use the shell of the metallic chips, like the 24Mhz crystal, or well known ground ping of any IC, like say the pin 7 of a 74LS08

imagen

Sorry for the vague hints, it is difficult to say something definitive having only an image from the top.

inmbolmie commented 3 years ago

Also another reason for that screen to appear may be that the coin cell battery is depleted. I think that because this document http://bitsavers.org/pdf/ibm/3196/GA18-2488-2_IBM_3196_Display_Station_Setup_Instructions_Apr1987.pdf says that the first time you power on the terminal when coming from factory that screen should appear, and then you have to configure terminal address, etc. Maybe that's another possibility, but not for sure because you were already using the terminal configured at address 3 so it would be strange for the battery to die just after that.

blackbit42 commented 3 years ago

Very good suggestions, thank you. I'll take care of the de-soldering and testing tomorrow. The battery was already tested, it has >3V.

inmbolmie commented 3 years ago

If it is of any help, my 5250 emulation card has the same metal canned IBM IC, and I've checked its connectivity to the ISA power rails (GND, +5V and -5V).

image

image

blackbit42 commented 3 years ago

I just checked the power rails. Incoming from the monitor connector are: +12V, -12V and +5V. The voltage regulator in the pcb picture below the twinax connector makes -5V from the -12V. I don't have a good scope, but judging from mine the power rails are okay. Voltage levels are not more than 0.1V off and appear stable. The silver ibm chip gets the -5V and +5V as expected. The de-soldering of the three-leg caps will follow in my evening.

blackbit42 commented 3 years ago

Coincidentally I was able to take care of the de-soldering now. All three measure at about 23uF. Resistance starts at 1MOhm, counts down to about 500kOhm, measurement resets and starts at 1MOhm. No change of behavior. Terminal still goes into test mode when powered on. Key presses are still detected.

inmbolmie commented 3 years ago

Do you have a screen capture of the test mode screen? No idea about how it works, I've found this for the 3197 terminal.

You didn't mention if any of those specific error code appears on the test mode screen.

image

image

inmbolmie commented 3 years ago

If the board really thinks there is a problem with the keyboard, it should appear a "K" error there. Depending of which error appears that "problem solving guide" could help or not to isolate it. I cannot find that document online so i'm not sure it that will be useful or not.

Having discarded the "usual suspects" (power, capacitors, solder joints) if we have no clue of what is happening the debugging becomes increasingly difficult, as you would need to go component by component replacing it or testing with the appropiate equipment. Say for example that if the RAM chip is bad, you will have a bad day to detect it, unless the problem isolation procedure clearly points to the RAM.

blackbit42 commented 3 years ago

I'll take a picture of the screen in an hour or so. Regarding the components on the board, what I am happy to do is: Replace the tantal caps, desolder the 7400 components, test them (as that is possible with reasonable effort) and refit them with DIP sockets. I agree, everything else will be very challenging.

inmbolmie commented 3 years ago

With checking the components, the problem is that if it's an intermittent issue it is perfectly possible to test it OK but be defective anyway, so I'm hoping that the test screen could give some advice to isolate the problem.

Then there is also the economic point of view, that you can end spending more in the process of replacing things than the cost of replacing the whole terminal.

blackbit42 commented 3 years ago

signal-2020-09-30-140212

blackbit42 commented 3 years ago

I guess the AD indicates the station address isn't set? Not sure why that would be, but I'll look into setting it.

inmbolmie commented 3 years ago

Thanks, the only other thing remotely similar to an error code is that "LC009" that I have no idea of what it could be.

If somehow the configuration got reset it is not entirely a bad thing, that's something that even can be explicitly tested, remove the 3V battery to force a factory settings reset.

blackbit42 commented 3 years ago

I was able to set the address, which is now at 0. When I got the terminal it was 3. After a reboot it got back to the normal 'cursor in upper right corner' state. So, progress! Seems like the address was lost somehow. Probably an effect from me probing around on the board. Though, nothing happens when I start 5250_terminal.py (with the correct station address). That's either due to the missing caps or a different fault. I'll see if I find suitable caps somewhere. Then I will solder in new ones.

inmbolmie commented 3 years ago

The caps should have no effect in Twinax link as long as voltages are ok, so if you are in a doubt if that can be the cause for the converter not to work you can solder them back.

Address 0 or address 3 should be the same as long as they are configured correctly in both sides. Is there anything interesting in the logs, or does it keeps making WRITING POLL:P@ ... [EOTX] all the time?