Open Fish-Git opened 8 years ago
I will be making a series of commits probably later today to resolve this issue.
I found a relatively nice telnet protocol handling library and have been working this past month or so integrating it into Hercules. (That's why I was silent for so long.)
I basically had to almost rewrite console.c in its entirety. Almost. Quite a bit of existing code was for the most part salvaged, but other code (functions) had to be split apart and reworked into the new design. The end result is, I believe, a much cleaner(?) and more straightforward(?) telnet console support in Hercules, and one that is supposedly RFC compliant and as such should (ideally) work well with just about any telnet client out there.
I've tested it with both PuTTY and Tom Brennan's Vista TN3270 on Windows as well as telnet and C3270 on Linux, and everything works. I expect any telnet client will.
Briefly, the changes that needed to be made to console.c was to simply never send()
any data directly to the client yourself. Pass it to libtelnet instead (telnet_send
). Same goes with receiving data: never directly process any recv()
data. Instead, pass it to libtelnet (telnet_recv
) for processing. It will invoke your defined event handler callback function for all data input and output events as well as for telnet events.
As part of my upcoming series of commits there will be a series of "preparatory" commits where I make a little change here, a little change there, to Hercules (including shuffling the code around in our existing console.c source file) so that when the final commit is made, it will be easier to "see" what changes were actually made. I encourage each of you to use a quality visual diff tool to compare the changes side by side rather than relying solely on "diff" output. If you try using "diff" to see what was changed you'll probably have trouble. Use a high quality visual diff tool instead. (I've been using Exam Diff Pro for years on Windows and frankly couldn't imagine living without it.)
Well, that's it I guess. If you have any questions or concerns I guess we can discuss them on hercules-devel.
Documentation update issue created in Html project.
Did you really have to break a TN3270 client that had been taught how to comply with the previous protocol?
HHC02914I 0:001F COMM: client 0 negotiations complete; ttype = 'IBM-3278-5-E'
HHC01018I 0:001F COMM: client 127.0.0.1 devtype 3270: connected
HHC01090I 0:001F COMM: client 127.0.0.1 devtype 3270: connection reset
The last inbound data are:
tn.c.678:Writing 162 bytes for query reply
0000 88000e81 80808184 85868788 99a68f00 *h..a..adefghrw..*
0010 17818101 00009600 3101029f 07700163 *.aa...o.........*
0020 03f10d1d 1cb60008 8184001c b600001b *.1......ad......*
0030 81858200 0d1d0000 00000700 000002b9 *aeb.............*
0040 04170100 f103c301 36001681 86000800 *....1.C....af...*
0050 f4f1f1f2 f2f3f3f4 f4f5f5f6 f6f7f700 *411223344556677.*
0060 0d818704 00f0f1f1 f2f2f4f4 00078188 *.ag..0112244..ah*
0070 00010200 06819900 00001181 a600000b *.....ar....aw...*
0080 01000050 00180096 00310016 818f0000 *...&...o....a...*
0090 91978840 8481a381 8995845b 86899385 *jph dataind$file*
00a0 ffef *.. *
At which point the connexion is either stalled or reset. Is there something in this query reply that displeases you? b.txt
Just to make sure: My TN client works fine with a real VM system on real hardware, but of course that is the TN3270 in VM; and that still works with Hyperion.
The device that does not work is
001F 3270
My host is Fedora 23, latest Hercules 4.0 source, the guest is z/VM 6.1, and I'm having this problem with X3270 and Vista TN3270 (under Wine) clients connecting to Hercules, though I will only refer to X3270.
If an X3270 session is started and the Disconnect and Connect buttons are used but the window is not closed, or the window is closed and a new session is started, the Hercules 3270 devices are left in limbo, and eventually the client cannot connect. Following is an extract from the Hercules log showing client connection using Disconnect/Connect, though the same sequence occurs using window close/new session. The comments are when I carried out an action on the client.
13:26:00 HHC01603I * Start new X3270
13:26:09 HHC02915I client 1 COMM: Connection received
13:26:09 HHC02914I 0:0200 COMM: client 1 negotiations complete; ttype = 'IBM-3279-4-E'
13:26:09 HHC01018I 0:0200 COMM: client 127.0.0.1 devtype 3270: connected
13:26:22 HHC01603I * VM logo displayed
13:26:34 HHC01603I * Click Disconnect
13:26:38 HHC01022I 0:0200 COMM: client 127.0.0.1 devtype 3270: connection closed by client
13:26:49 HHC01603I * Click Connect
13:26:54 HHC02915I client 2 COMM: Connection received
13:26:54 HHC02914I 0:0200 COMM: client 2 negotiations complete; ttype = 'IBM-3279-4-E'
13:26:54 HHC01018I 0:0200 COMM: client 127.0.0.1 devtype 3270: connected
13:27:05 HHC01603I * Hercules logo displayed
13:27:20 HHC01603I * Click Disconnect
13:27:37 HHC01603I * Click Connect
13:27:40 HHC02915I client 3 COMM: Connection received
13:27:40 HHC02914I 0:0201 COMM: client 3 negotiations complete; ttype = 'IBM-3279-4-E'
13:27:40 HHC01018I 0:0201 COMM: client 127.0.0.1 devtype 3270: connected
13:27:52 HHC01603I * VM logo displayed
13:28:19 HHC01603I * Click disconnect
13:28:23 HHC01022I 0:0201 COMM: client 127.0.0.1 devtype 3270: connection closed by client
13:29:37 HHC01603I * Click Connect
13:29:41 HHC02915I client 4 COMM: Connection received
13:29:41 HHC02914I 0:0201 COMM: client 4 negotiations complete; ttype = 'IBM-3279-4-E'
13:29:41 HHC01018I 0:0201 COMM: client 127.0.0.1 devtype 3270: connected
13:29:54 HHC01603I * Hercules logo displayed
13:30:02 HHC01603I * Click Disconnect
13:30:16 HHC01603I * Click Connect
13:30:19 HHC02915I client 5 COMM: Connection received
13:30:19 HHC02914I 0:0202 COMM: client 5 negotiations complete; ttype = 'IBM-3279-4-E'
13:30:19 HHC01018I 0:0202 COMM: client 127.0.0.1 devtype 3270: connected
13:30:33 HHC01603I * VM logo displayed
13:30:42 HHC01603I * Click Disconnect
13:30:47 HHC01022I 0:0202 COMM: client 127.0.0.1 devtype 3270: connection closed by client
13:31:14 HHC01603I * Click Connect
13:31:21 HHC02915I client 6 COMM: Connection received
13:31:21 HHC02914I 0:0202 COMM: client 6 negotiations complete; ttype = 'IBM-3279-4-E'
13:31:21 HHC01018I 0:0202 COMM: client 127.0.0.1 devtype 3270: connected
13:31:34 HHC01603I * Hercules logo displayed
13:31:43 HHC01603I * Click Disconnect
13:32:00 HHC01603I * Click Connect
13:32:04 HHC02915I client 7 COMM: Connection received
13:32:04 HHC02914I 0:0203 COMM: client 7 negotiations complete; ttype = 'IBM-3279-4-E'
13:32:04 HHC01018I 0:0203 COMM: client 127.0.0.1 devtype 3270: connected
13:32:24 HHC01603I * VM logo displayed
Le 11/05/2016 à 14:33, mcisho a écrit :
My host is Fedora 23, latest Hercules 4.0 source, the guest is z/VM 6.1, and I'm having this problem with X3270 and Vista TN3270 (under Wine) clients connecting to Hercules, though I will only refer to X3270.
If an X3270 session is started and the Disconnect and Connect buttons are used but the window is not closed, or the window is closed and a new session is started, the Hercules 3270 devices are left in limbo, and eventually the client cannot connect. Following is an extract from the Hercules log showing client connection using Disconnect/Connect, though the same sequence occurs using window close/new session. The comments are when I carried out an action on the client.
Confirmed.
Did a t+ on this and I am getting the same results.
On initial connect, DE is presented, VM then sends a WSF to enquire the device characteristics - Then ATTN is presented (with the results) and the logo is displayed.
When the device is then disconnected and reconnected, the DE is presented again (normal - it's a NOT-Ready to Ready transition), the WSF is sent - but no ATTN is ever presented.
Attempting to re-ipl the underlying system is not enough. The only possibility is to quit hercules and restart.
My guess is that there is an issue with the device state after a disconnect which prevents the ATTN to be presented for the device.
I could try to do a git bisect (because I only had this problem appear very recently - I'm pretty sure I did connect/disconnect before without issues).
--Ivan
Trial on Windows 10 x3270, using file - disconnect leaves device in limbo. Hercules version 4.0.0.8493-g617419c-modified (mods unrelated to tn3270/telnet support)
HHC02915I client 5 COMM: Connection received
HHC02914I 0:00A0 COMM: client 5 negotiations complete; ttype = 'IBM-3279-2-E'
HHC01018I 0:00A0 COMM: client 127.0.0.1 devtype 3270: connected
File - Disconnect on x3270
HHC01603I devlist a0
HHC02279I 0:00A0 3270 127.0.0.1 IO[0] open
But devinit cuu corrects the issue.
HHC01603I devinit a0
HHC02245I 0:00A0 device initialized
HHC01603I devlist a0
HHC02279I 0:00A0 3270 * IO[0]
HHC02915I client 6 COMM: Connection received
HHC02914I 0:00A0 COMM: client 6 negotiations complete; ttype = 'IBM-3279-2-E'
HHC01018I 0:00A0 COMM: client 127.0.0.1 devtype 3270: connected
Closing the x3270 session window results in (the expected)
HHC01022I 0:00A0 COMM: client 127.0.0.1 devtype 3270: connection closed by client
I have a fuzzy recollection that this kind of thing (devices in limbo) happens with other device types, and that devinit (or detach/attach) often took care of the issue.
Le 11/05/2016 à 16:15, srorso a écrit :
Trial on Windows 10 x3270, using file - disconnect leaves device in limbo. Hercules version 4.0.0.8493-g617419c-modified (mods unrelated to tn3270/telnet support)
|HHC02915I client 5 COMM: Connection received HHC02914I 0:00A0 COMM: client 5 negotiations complete; ttype = 'IBM-3279-2-E' HHC01018I 0:00A0 COMM: client 127.0.0.1 devtype 3270: connected |
File - Disconnect on x3270
|HHC01603I devlist a0 HHC02279I 0:00A0 3270 127.0.0.1 IO[0] open |
But devinit cuu corrects the issue.
|HHC01603I devinit a0 HHC02245I 0:00A0 device initialized HHC01603I devlist a0 HHC02279I 0:00A0 3270 * IO[0] HHC02915I client 6 COMM: Connection received HHC02914I 0:00A0 COMM: client 6 negotiations complete; ttype = 'IBM-3279-2-E' HHC01018I 0:00A0 COMM: client 127.0.0.1 devtype 3270: connected |
Closing the x3270 session window results in (the expected)
|HHC01022I 0:00A0 COMM: client 127.0.0.1 devtype 3270: connection closed by client |
I have a fuzzy recollection that this kind of thing (devices in limbo) happens with other device types, and that devinit (or detach/attach) often took care of the issue.
Here is my git bisect results :
$ git bisect bad d160706f295d8c58c4e605f8b62773fc4528486d is the first bad commit commit d160706f295d8c58c4e605f8b62773fc4528486d Author: Fish (David B. Trout) fish@infidels.org Date: Fri Apr 22 19:31:31 2016 -0700
RFC compliant telnet support via libtelnet package:
New telnet functionality in console.c for display terminals and
console printer keyboard devices. This fixes a long standing issue with certain telnet clients not working with Hercules. The popular "PuTTY" Windows telnet client for example now works flawlessly out-of-the-box.
closes #100.
Description of problem introduced (psesumptive) :
When a tn3270 device connects, disconnects and reconnects - pending data from the tn3270 device is not presented to the guest (no ATTN is sent). For example, when the OS issues a WSF with enquiry, the query results is never presented nor signaled to the OS (using device status ATTN-0x80). However the not-ready to ready signaling is performed (the unsolicited Device End) and the hercules logo is shown (indicating tn3270 telnet negociation is complete).
--Ivan
Thank you for re-opening this, John!
It's obvious something still isn't right. I apologize to everyone for that. I guess I didn't test my changes thoroughly enough.
The hopefully good news is I think I may know what's going on! (maybe). Both @mcisho and @ivan-w gave me the clues.
Let me take a peek at the code to either confirm or deny my suspicion and I'll report back. If I'm right, the fix should be fairly simple and straightforward but may of course take me a day or three to code and test. But first, let me check the code to see if I'm right.
Hang loose, folks! I'm looking into it!
(and sorry about all this!)
CONFIRMED!
A careless mistake on my part. (grumble)
Hang loose folks. A fix is in progress...
My studies show that the culprit might be that the console FD is no longer selected for pselect read after a disconnect/reconnect... So it's obviously just a glitch - not a fundamental issue.
--Ivan
(PS : I like the libtelnet thingy ! It's much cleaner)
(Oops!) I was wrong. :(
Found it. In the terminal disconnect code, dev->console is erroneously reset. dev->console is a property of the DEVBLK, not a status bit.
diff --git a/console.c b/console.c index f0f56bb..7a28791 100644 --- a/console.c +++ b/console.c @@ -748,7 +748,6 @@ static void disconnect_console_device( DEVBLK* dev ) closes the device and marks it available for reuse. */ dev->connected = 0;
dev->console = 0; dev->fd = -1;
disconnect_telnet_client( dev->tn );
--Ivan
Above change pushed. Tried and seems to fix the issue.
Commit : https://github.com/hercules-390/hyperion/commit/c319867281a9c9a3bc37778a86bcbe7d17f8175d
Confirmed, Ivan-w's change has fixed the issue of devices going into limbo.
So what was jphartmann's TN3270 client?
I think John did mention it was the FAL tn3270 VM client, but all clients did exhibit the issue connect/disconnect/reconnect - and the correction is not client specific. However, I'd like John to try again to see if there isn't any further underlying issue.
Fixed it for me too. Thanks.
The client is one I wrote to resemble PCOMM3270.
Thank you, THANK YOU, THANK YOU, Ivan! The problem was reproducible on my Windows system as well and I too can confirm your patch does indeed resolve the problem.
I apologize for not having performed such a simple and obvious test before committing my original changes.
I think I'll keep this issue open for "a while" in case anything other problems turn up.
Fish, it feels solid (again). Let's get it off the books.
If that's okay with everyone that's fine by me. Closing...
UBUNTU threw a new kernel at me and the issue is back. Perhaps it is timing.
I get no failure if I trace the client. Connection is on the loopback interface.
But it is only the first connect after Hercules starts that fails. Uninitialised variable? Some thread not started?
UBUNTU threw a new kernel at me and the issue is back. Perhaps it is timing.
I get no failure if I trace the client. Connection is on the loopback interface.
But it is only the first connect after Hercules starts that fails. Uninitialised variable? Some thread not started?
Interesting. I could insert some PTT tracing statments into the console.c logic to try and help you track down where in the server (i.e. Hyperion) that it fails. What are the symptoms? Connect followed by immediate disconnect? As if the negotiation were failing? I/O error? Other?
Yes, immediate reset. I get three lines on the Hercules console as if I had terminated my session immediately. My client makes no noise at all, so it looks from my perspective as if the server dropped the connexion.
See my May 11 entry above/below.
But this happens only the first logon after starting Hercules. I now know just to do it again, so it is no big deal.
John,
If you can reliably recreate this problem every time, then can you debug it? (either via Eclipse or GDB or whatever debugger you're comfortable with using)
Set a breakpoint where Hercules is disconnecting you and look at the stack to see how you got there.
It has not been bothering me lately.
It has not been bothering me lately.
Makes no never mind. It's a bug. It shouldn't do that. We need to fix it.
Hercules's current handling of telnet devices (3270 display terminals and non-integrated console printer keyboard devices) does not comply with published standards (RFC's).
The popular PuTTY telnet client for example, does not work with Hercules. Other telnet and TN3270 clients have also been reported as having problems working correctly with Hercules.
The telnet handling logic in console.c needs fixed to be RFC compliant. Given the terrible state of console.c in general, it should probably be redesigned and/or rewritten entirely.