hercules-390 / hyperion

Hercules 390
Other
246 stars 69 forks source link

Hyperion 'runtest.rexx' tests 'hang' when using Open Object Rexx. #234

Open mhoes opened 6 years ago

mhoes commented 6 years ago

I tried running the recently added 'runtest.rexx' test script to the Hyperion code. Although Regina Rexx works, I haven't been able to get Open Object Rexx to work (Fedora 26). Of course it could just be that I'm just doing something wrong, or something is wrong with my setup. Anyway, so far I have gotten these results :

$ /usr/bin/rexx -v Open Object Rexx Version 4.2.0

I first get an error for incorrect usage of the 'uname' command, and then very quickly the processing seems to 'hang' somehow.

$ /usr/bin/rexx ../hyperion/tests/runtest.rexx -h /home/maarten/src/hyperion-topdir-cmake/build uname: extra operand ‘WITH’ Try 'uname --help' for more information. Building test script file 'allTests.testin'... Running Hercules to generate test results...

The last lines of output in 'allTests.out' (after which no more output is produced), is this :

$ tail -20 allTests.out HHC01603I r 360=2C2D2E2F HHC02290I A:0000000000000000 K:06 HHC02290I R:0000000000000360 2C2D2E2F .... HHC01603I r 368=000FF000 HHC02290I A:0000000000000000 K:06 HHC02290I R:0000000000000360 000FF000 ..0. HHC02336I Script 1: test: test starting HHC02339I Script 1: test: duration limit: 0.100000 seconds HHC02228I restart key pressed HHC02333I Script 1: test: running... Feature code 20 wrap 1 keylen 32 pblen 64 Feature code 20 wrap 1 keylen 32 pblen 64 HHC00809I Processor CP00: disabled wait state 0002000180000000 0000000000000000 HHC02334I Script 1: test: test ended HHC02338I Script 1: test: actual duration: 0.000202 seconds HHC01603I *Compare HHC01603I r 00000240.00000004 HHC02290I A:0000000000000000 K:06 HHC02290I R:0000000000000240 00000000 .... HHC016

And the following instances of Hercules are running :

$ ps -ef | grep hercules | grep -v grep maarten 3248 3120 0 15:24 pts/1 00:00:00 sh -c /home/maarten/src/hyperion-topdir-cmake/build/hercules -p /home/maarten/src/hyperion-topdir-cmake/build/ -t1 -f /home/maarten/src/hyperion-topdir-cmake/hyperion/tests/tests.conf -r allTests.testin -d > allTests.out 2>&1 maarten 3249 3248 14 15:24 pts/1 00:00:27 /home/maarten/src/hyperion-topdir-cmake/build/hercules -p /home/maarten/src/hyperion-topdir-cmake/build/ -t1 -f /home/maarten/src/hyperion-topdir-cmake/hyperion/tests/tests.conf -r allTests.testin -d

The first one seems to be in 'wait()' (waiting for the child process, I assume ?) :

$ strace -p 3248 strace: Process 3248 attached wait4(-1,

And the other one in 'read()', without proceeding :

$ strace -p 3249 strace: Process 3249 attached read(0,

I have no idea what's going on here. I did an out-of-tree 'cmake' build, with these options :

cmake ../hyperion -DAUTOMATIC-OPERATOR=YES -DCAPABILITIES=YES -DCCKD-BZIP2=YES -DDEBUG=YES -DEXTERNAL-GUI=YES -DGETOPTWRAPPER=YES -DHET-BZIP2=YES -DINTERLOCKED-ACCESS-FACILITY-2=YES -DIPV6=YES -DLARGEFILE=YES -DMULTI-CPU=YES -DOBJECT-REXX=YES -DOPTIMIZATION=NO -DREGINA-REXX=YES

And an (admittedly very) basic test of Open Object Rexx works :

$ /usr/bin/rexx -e "SAY Hello World" HELLO WORLD

PS: I can make the 'uname' error go away by removing the text ('with output stem plat') after the 'uname -s' command at line 197 in runtest.rexx, but this does not solve the 'hang' issue.

jphartmann commented 6 years ago

The offending command is

address command 'uname -s' with output stem plat.

This usage must be a Regian extension. Parellel to parse value ... with

Most definitely not ANSI REXX.

srorso commented 6 years ago

Hi Maarten & John:

Maarten: many thanks for trying the CMake build, for trying the REXX version runtest.rexx, and providing such detailed diagnostics.

John: many thanks as well for finding the root cause. I might have found the offending line quickly, but what to do about it would have taken much longer.

I will address.

For what it is worth, make test from a CMake build uses John's stable runtest shell script, not my semi-experimental REXX version.

Best Regards, Steve Orso

srorso commented 6 years ago

Hi Maarten:

Sorry for the delay committing a correction: I found three other issues when testing the correction for the issue you reported. All are corrected, and you have my thanks for reporting your issue and leading me to three more.

Commit c81ad1f replaces runtest.rexx. Please feel free to give it a try and let me know how it works for you.

Thanks again, and thanks to John for the root cause analysis.

Best Regards, Steve Orso

mhoes commented 6 years ago

Hi,

I retried things with your latest commit, and although Regina Rexx works, I am still getting an error with Open Object Rexx about not being able to connect to tcp port 5757, even though it does not seem to influence the actual results of the tests (?) when I look at the output of 'allTests.txt'.

$ /usr/bin/rexx -v Open Object Rexx Version 4.2.0

$ /usr/bin/rexx ../hyperion/tests/runtest.rexx -h /home/maarten/src/hyperion-topdir-cmake/build Error:94.101 - Error connecting to 127.0.0.1 on port 5757: "Connection refused" Building test script file 'allTests.testin'... Running Hercules to generate test results... Performing analysis of test results... All tests ran successfully. See allTests.txt for a summary and allTests.out for details.

Because nothing is listening on tcp port 5757. I did some searching, and regina rexx (but not open object rexx, which is executed here) seems to come with 'rxstack', which when started does listen on that port.

$ /usr/local/bin/rxstack -v /usr/local/bin/rxstack: REXX-Regina_3.9.1 5.00 5 Apr 2015 (64 bit)

So running that before I start the Hyperion tests, even though it is part of regina and not open object, makes the error go away.

$ sudo /usr/local/bin/rxstack rxstack listening on port: 5757

The offender seems to be 'rxqueue' at line 197.

/usr/bin/rexx -e "address command 'uname -s | rxqueue '" Error:94.101 - Error connecting to 127.0.0.1 on port 5757: "Connection refused"

$ /usr/bin/rexx -e "address command 'uname -s'" Linux

So I am assuming that when using Open Object Rexx, the statement 'rxqueue' is interpreted as an external command, and then finds that in my $PATH as part of Regine Rexx, and then executes that,

Of course, it also wouldn't surprise me if something is just wrong with my setup/system which leads to this particular behavior.

mhoes commented 6 years ago

Nevermind. If I remove '/usr/local/bin/' (where Regina Rexx lives on my system) from my $PATH, I no longer get any error in the output when using Open Object Rexx.

Apparently, I have both '/usr/bin/rxqueue' (Open Object Rexx) AND '/usr/local/bin/rxqueue' (Regina Rexx). The error only occurs because I have the one in my $PATH before the other. So it's a problem with my setup, and there's nothing wrong with your commit. Sorry.

jphartmann commented 6 years ago

My suggestion,Steve, is that you drop one or the other (or for that sake both) REXX implementations. They also have conflicting header files. Or at least public a health warning about installing both.

mhoes commented 6 years ago

For what it's worth (which admittedly may not be much), my 2$ cents: This is literally the first time I have ever experienced any issues with having them both installed, even though I compile Hyperion with support for both versions/implementations. And it was easily resolved by making sure that my $PATH was set up in a way that the correct version was found. Again, just my 2$.

srorso commented 6 years ago

Hi Maarten:

I suspect that in your original path statement, /usr/local/bin preceeded /usr/bin, and that this issue came up when specifying a full path name to the rexx interpreter instead of just using rexx. I think that if you had just typed rexx ../hyperion/tests[...] Regina Rexx would have been used and the matching rxqueue utility would have run successfully.

Your experiments with rxstack offer a path to a portable solution, I think. Inside runtest.rexx, start rxstack as a daemon and ignore any return code. If address command uname -s | rxqueue finds the Regina Rexx rxqueue but is not a child of a Regina Rexx interpreter, then rxstack will enable correct operation. If it finds the ooRexx rxqueue, no harm done running rxstack ; the ooRexx rxqueue sends stdout to the ooRexx stack. At the end of the section of code, run rxstack -k to end the daemon.

On the Debian 9 system I used to test starting rxstack, sudo was not needed. My Debian id is non-privileged.

The code would look like this:

    address command 'rxstack -d'   /* start Regina stack daemon in case of mixed REXX */
    address command 'uname -s | rxqueue '
    i = 0
    do while queued()
        i = i + 1
        parse pull plat.i
        end
    platform = plat.1
    drop plat.
    address command 'rxstack -k'  

(The above code does not have error checking, nor error message suppression...it will if this becomes the portable solution, and I have not tested it.)

It also seems that the Regina Rexx rxqueue is able to determine that Regina Rexx is the active interpreter and does not require rxstack in that case. It requires rxstack only when invoked from some other environment, which might be ooRexx or some other interpreter entirely (perhaps bash?)

Best Regards, Steve Orso

mhoes commented 6 years ago

Hi,

Yes, you are correct about the PATH setup. The reason I explicitly used full paths for the rexx executables, was to see if everything worked ok with both implementations. I just didn't anticipate the fact that other binaries from the same implementations would be needed, and that the PATH setup would lead to finding the wrong versions of them even though I used a full path for the rexx executable. Classic 'user failure' on my part. ;)

The modified script works, both for Open Object and Regina. (Obviously, you get a rxstack not found error when Regina is not installed).

Just in case, this is what the section looks like for me now :

if upper(left(host,3)) == 'WIN' then do
    path_sep = '\'
    wrong_path_sep = '/'
    type_or_cat = 'type'
    exe_suffix = '.exe'
    loadmod_suffix = '.dll'
    platform="Windows"
    end
else do
    path_sep = '/'
    wrong_path_sep = '\'
    type_or_cat = 'cat'
    exe_suffix = ''
    loadmod_suffix = '.so'
    address command 'rxstack -d'   /* start Regina stack daemon in case of mixed REXX */
    address command 'uname -s | rxqueue '
    i = 0
    do while queued()
        i = i + 1
        parse pull plat.i
        end
    platform = plat.1
    drop plat.
    address command 'rxstack -k'
    end

Thanks for taking the time to look into this. I have to admit that if it would have been up to me, I think I would have left things the way they are now, without over-complicating things, and just accept that things will go wrong if the user (me, in this case) makes the error of mixing binaries from different implementations.

--- EDIT ---

Now that I think about this some more: I absolutely have no experience with rexx whatsoever, but it seems to me that allowing the mixing of binaries from different implementations might open a whole can of worms (in the future), and you really might be better off if the error occurs so that it is at least clear that this is happening, instead of things (silently) leading to possible unexpected and/or undefined behavior.

mhoes commented 6 years ago

By the way, just out of curiosity: Why does one need to know the output of 'uname -s' to begin with ? Does it matter which specific *NIX flavor you are running ? The current code (as far as I can tell) only seems to differentiate between 'Windows' and 'Not Windows' (and therefore Unix) ?

srorso commented 6 years ago

Hi John,

Sorry for the delay in returning to your suggestion. The standardization question is good but the answer is difficult. UNIX-like distributions seem to include Regina Rexx more often than ooRexx, and Regina Rexx seems to be less trouble to build on systems that don't include a package in their repositories. Leap 42.2 includes both in its repository, but each package will not install if the other package is already installed. On FreeBSD and Debian 9, ooRexx 4.2.0 fails to build. A daemon is required to use ooRexx's API. So if I were to pick one, it would be Regina Rexx.

But ooRexx 5.0.0, three years into development with an available beta, may change that.

Dropping both would be interesting, as redtest.rexx would need to be re-written or we would need to standardize on a third REXX. This would have an impact on builds that include REXX support, as the target system would need at least one of ooRexx and Regina Rexx and the REXX required for runtest.rexx and redtest.rexx.

For the moment, though, I am happy to craft a portable solution for uname -s, The solution I put together above likely does not work, Maarten's results notwithstanding.

Best Regards, Steve Orso

mhoes commented 6 years ago

Side note: ooRexx 4.2.0 fails to build with GCC 6.x / 7.x when you use GCC's optimization (GCC 5.3.1 works). A fix has been included for the upcoming 5.0, but not for 4.x. Bug report can be found here: https://sourceforge.net/p/oorexx/bugs/1380/

srorso commented 6 years ago

Hi Maarten:

Thanks for the point-out of the gcc 6 bug; I suspected something like that but I don't want to become an ooRexx developer. :-).

The current documentation for runtest specifies that platform is set to the value of uname -s for non-Windows hosts. I have considered asking John to change that to the result of parse source but that introduces compatibility issues between the current runtest and runtest.rexx.

After messing around with rxstack, Regina Rexx rqueue, and ooRexx rxqueue. I suspect the two rxqueue commands are not inter-operable. The proof will be in the file:

<build-dir>/Testing/Temporary/LastTest.log

Look for something like this at around lines 11-12

Variable ptrsize is set to "8".
Variable platform is set to "Linux".

I suspect when ooRexx uses the Regina Rexx rxqueue, you will find this:

Variable ptrsize is set to "8".
Variable platform is set to "PLAT.1".

There will be no impact on test results because the only testing of the variable $platform in committed test cases is in maintest.tst, and those tests are looking for 'Windows' or not 'Windows'. On a non-Windows platform, a value of 'PLAT.1' is as good as 'Linux'.

John's suggestion to publish a warning about having both REXX interpreters installed is excellent, and I will address that.

I enjoy the challenge of writing really portable, really bulletproof code, and that is why I am pursuing this issue. I considered coding to detect and complain if rxqueue and the parent interpreter came from different packages, and I would have been really happy if I could, within REXX, determined the full path to the interpreter executable, because I could then use that as the target of the uname -s pipe. But no luck on both counts.

Again, thanks for reporting this issue and especially for your efforts since then.

Best Regards, Steve Orso

mhoes commented 6 years ago

Alright, point(s) taken.

By the way, I re-tried the modified version of 'runtest.rexx' with both Regina and Open Object (with Regina first in $PATH, and addressing Open Object rexx with an absolute path), and in both cases $platform seems to be set to 'Linux', according to 'Testing/Temporary/LastTest.log'. (Unless, of course, I made a mistake testing this).

But now that you have shown me the error of my ways of having both REXX implementations installed at the same time, is it of any consequence that the current Hyperion build systems (both autotools and cmake) allow you to build in support for both implementations at the same time ? Or should that be changed to an either/or situation, where you can select either the one or the other at build time, but not both simultaneously ?

And now that were here, I would like to take this moment to repeat my previous remark:

Anyway, thank you for putting up with all this largely irrelevant stuff from me to begin with. Although I fully enjoy doing this, I can also imagine there are bigger issues to address here than how the test suite runs when users purposefully try to find creative ways to break it ;)

srorso commented 6 years ago

Hi Maarten,

Your results with a mis-matched rxqueue are interesting and imply an interoperability that neither REXX development group has documented. Nor, I suspect, would they commit to such interoperability.

Because it isn't documented, I am going to change runtest.rexx to avoid rxqueue. It will write the result to a temporary file and read the file. Maybe not very pretty, but it should be very stable and portable. It only runs once per runtest.rexx execution, so relative efficiency is not a concern. Look for a commit shortly.

With respect to the return of uname -s for $platform, the more I ponder it the more I like it. The current population of test cases mostly test the internal workings of Hercules and are not dependent on the host system. But as that population expands into the interactions between Hercules and the host, knowing a bit about the host could be important. For example, tests that mess with shadow files or networking may need to know. The return values from uname -s are also well defined and easy to test.

It is really helpful to have a user who is purposefully trying to break things, especially when that user is creative, cheerful, and collaborative about the break-fix process. Please keep up the good work.

Best Regards, Steve Orso

jphartmann commented 6 years ago

As for parse source on ooRexx on UNIX, Hessling refused to use uname() to obtain the operating system name and instead hardwired the value returned. So you do have to do uname -s to find out which particular variant you are on.

The value returned on uname -s is not in any way architected or mandated by standards, but it does tend to remain constant.

srorso commented 6 years ago

The Linux systems I test on all return "Linux," as does Windows Subsystem for Linux. FreeBSD returns "UNIX," and Solaris 11 returns "SunOS." Useful and relatively constant.

jphartmann commented 6 years ago

Steve, "returns" here is ambiguous. My FreeBSD 10.2:

[john@FB102 ~]$ uname -s
FreeBSD
srorso commented 6 years ago

My mistake....doing things from memory is dangerous now that I've reached a certain age.

jphartmann commented 6 years ago

:-) I learnt fifty years ago not to trust memory.

mhoes commented 6 years ago

... And the world makes sense again. 'uname -s' returns 'FreeBSD' on FreeBSD instead of 'UNIX'. Phew. Don't ever do that again; you almost gave me a heart attack ;)

srorso commented 6 years ago

Hi Maarten:

I found the source of my mistake that uname -s returns "UNIX" on FreeBSD: Regina Rexx returns "UNIX" on parse source on that platform.

While starting rxstack dealt with error messages when mixing Regina Rexx and ooRexx, I did not wish to code something that depended on undocumented behavior in the interaction of the two packages.

So I punted...and wrote the output of uname -s to a file that runtest.rexx immediately reads. Simple, portable, familiar to people working with UNIX-like systems, and unlikely to generate a github issue.

Please feel free to give it a try. And thanks again for reporting this issue and collaborating on diagnosis and (I hope) resolution.

Best Regards, Steve Orso

mhoes commented 6 years ago

Hi,

I tried it again with the latest commit, and for both rexx implementations, all tests complete without errors, and $platform is set to "Linux" according to allTests.txt.

Thanks !

srorso commented 6 years ago

Hi Maarten,

Wonderful news! Thanks for taking the time to test.

Now on to John's suggestion of something published...while runtest.rexx no longer trips on a mixed-REXX environment, it is easy enough for someone to create the same issue in REXX code they write and expect to run within Hercules.

Best Regards, Steve Orso