hercules-390 / hyperion

Hercules 390
Other
249 stars 67 forks source link

IPL OS/390 2.10 Processor CP00: disabled wait state 000A0000 80009064 #141

Closed clang20 closed 8 years ago

clang20 commented 8 years ago

Hi All,

I used to be able to IPL this system using Hercules 3.10. I'm now using Hercules Hyperion (pulled down source yesterday) and am unable to IPL this system. OSTAILOR is set to OS/390, LPARNUM 10 and CPUVERID 47. Any ideas on what to try next?

Thanks,

Chris

clang20 commented 8 years ago

IEA247I USING IEASYSDP FOR OS/390 02.10.00 HBB7703
IEA304W SYSTEM WAIT STATE CODE 80009064 DURING IEAVNP10 INITIALIZATION

clang20 commented 8 years ago

logfile.txt

clang20 commented 8 years ago

hercules.cfg.txt

gauser317 commented 8 years ago

A work-around for this problem is to issue the following command prior to the IPL:

archlvl disable STFL_EXTENDED

clang20 commented 8 years ago

Thanks. Ill try and will revert back.

clang20 commented 8 years ago

No, Same issue with STFL_EXTENDED disabled

IEA247I USING IEASYSDP FOR OS/390 02.10.00 HBB7703
IEA304W SYSTEM WAIT STATE CODE 80009064 DURING IEAVNP10 INITIALIZATION

clang20 commented 8 years ago

Hi, The setting below makes no difference. Thanks, Chris

gauser317 wrote: A work-around for this problem is to issue the following command prior to the IPL: archlvl disable STFL_EXTENDED

BertLindeman commented 8 years ago

Did you already fix this syntax error in your configuration file?

18:43:14 HHC00811I Processor CP00: architecture mode z/Arch
18:43:14 HHC02205E Invalid argument OS/390
18:43:14 HHC01441E Config file[22] conf/hercules.cfg: error processing statement: ARCHMODE OS/390

From hercules 4.00 help:

? archmode

Command      Description
-------      -------------------------------------------------------
archmode     Alias for archlvl

? archlvl

Command      Description
-------      -------------------------------------------------------
archlvl     *Set Architecture Level

Format: archlvl s/370|als0 | esa/390|als1 | esame|als2 | z/arch|als3
                enable|disable <facility> [s/370|esa/390|z/arch]
                query [<facility> | all]
command without any argument simply displays the current architecture
mode. Entering the command with an argument sets the architecture mode
to the specified value.
clang20 commented 8 years ago

Hi Bert, I had changed it to that from another thread on same wait state for os390. I will change it back but expect the behavior will be the same.

Thanks

clang20 commented 8 years ago

Hi Bert, I correct ARCHMODE and same behavior. BTW I am able to IPL using the 3.12 version of Hercules. So the fault is in the 4.x version of Hercules.

Thanks

BertLindeman commented 8 years ago

Just to be sure: the archmode configuration statement is now accepted So it is no longer reported as error in your log?

clang20 commented 8 years ago

Yes, It's corrected. The system IPLs fine using the 3.12 version of Hercules.

Fish-Git commented 8 years ago

Please try this:

Change your OSTAILOR statement to OSTAILOR NONE and try again and then post your complete Hercules log file so we can see it.

It won't fix your problem of course, but it might provide us with a clue.

clang20 commented 8 years ago

Hi Fish, logfile with OSTAILOR NONE attached.

Thanks logfile.txt

BertLindeman commented 8 years ago

Hi Chris, Would it be difficult for you to try the hercules (4.00) included in Juergen Winkelman's TK4-? http://wotho.ethz.ch/tk4-/tk4-_v1.00_current.zip Unzip and run something like: C:\zos\tk4-\hercules\windows\64\hercules.exe -f conf\hercules.cnf

That hercules is a copy from 2012 if I remember that well with just a few updates.

clang20 commented 8 years ago

Hi Bert, Looks like your build works.

@Fish, This version is from

08:10:00 HHC01413I Hercules version 4.00 08:10:00 HHC01414I (c) Copyright 1999-2012 by Roger Bowler, Jan Jaeger, and others 08:10:00 HHC01415I Built on Feb 6 2015 at 13:49:07 08:10:00 HHC01416I Build information: 08:10:00 HHC01417I Windows (MSVC) build for AMD64 08:10:00 HHC01417I Modes: S/370 ESA/390 z/Arch

I also thought I recall someone on the Yahoo thread having the same issue and that it worked using 4.0 back in October.

Thanks,

Chris

Fish-Git commented 8 years ago

Hi Chris!

I would just like confirm the following:

Is that correct?

If so, please try the (2015-11-20) so we can know where things went wrong.

Thanks.

clang20 commented 8 years ago

Hi Fish,

2016-08-10 fails 2016-04-23 Not sure perhaps per OP in yahoo thread it fails. He also mentioned a 4.0 version from October 2015 that he said Worked. Not sure about 2015-11-20 or whether these last two do work.

I'll check, but if you could give me the git command line to pull down 4.0 by date I'm happy to try starting from October 2015 (when it possibly last worked) to present to try to help narrow it down.

Did any of the CP exceptions in the log shed any insight on the failures?

Thanks,

Chris PS I also sent your 35 or so for a copy of CTCI.

clang20 commented 8 years ago

Hi Fish, I've tried this

git clone https://github.com/hercules-390/hyperion.git .

git rev-list -1 --before="Oct 10 2016" master 05652fb9d7993d942138196c4e9bad6b1c7a63e1

git checkout 05652fb9d7993d942138196c4e9bad6b1c7a63e1

but still seem to have a set of working files from HEAD. What am I missing?

clang20 commented 8 years ago

Hi Fish, I think I'm still pulling the latest branch. I would help if I corrected the date to 2015 lol. I'll try again.

15:44:03 HHC01413I Hercules version 4.0.0.8598-g05652fb-modified (4.0.0.8598) 15:44:03 HHC01414I (C) Copyright 1999-2016 by Roger Bowler, Jan Jaeger, and others 15:44:03 HHC01415I Build date: Aug 13 2016 at 15:39:23 15:44:03 HHC01417I Built with: Microsoft Visual C 190024213 1 15:44:03 HHC01417I Build type: Windows MSVC AMD64 host architecture build

clang20 commented 8 years ago

Hi Fish, Looks like VS2015 support was not available yet in October 2015. Argh. I can update the makefile.bat though if there were some binaries I could test instead that would be helpful. Does the project maintain sets of prebuilt binaries?

clang20 commented 8 years ago

Looks like hacking the build is not the way to go, I've worked around vs2015 support, issue with the VERSION define in version.h, a redefinition of timespec in fthreads.h and now NMAKE : fatal error U1073: don't know how to make 'msvc.AMD64.obj\SoftFloat-specialise.obj' I only see softfloat.c.

clang20 commented 8 years ago

Hi Fish, I tried all the 4.0 pre-builts back to 2015-07-19 and none of them work.

Thanks

clang20 commented 8 years ago

Hi Fish,

Final outcome of testing all the pre-guilts is that none of them work. Looks like the report about the October binaries working was wrong. The TK-4 build I tried from 2012 I believe, did work, and the version said 4.0 but I guess there is a question as to whether that was actually 4.0.

FAILS: Hercules-4.0.0.8597-g05652fb-x64.zip (2016-08-12) FAILS: Hercules-4.00.0.8473-gfe2c24f-x64.zip (2016-04-23) FAILS: Hercules-4.00.0-git-8396-g9b24d74-x64.zip (2015-11-20) FAILS: Hercules-4.00.0-git-8243-gf4bf8ff-x64.zip (2015-10-06) FAILS: Hercules-4.00.0.8213-gfce758d-x64.zip (2015-07-19)

Thanks,

Chris

Fish-Git commented 8 years ago

Chris,

Your issue with OS/390 2.10:

IEA247I USING IEASYSDP FOR OS/390 02.10.00 HBB7703 IEA304W SYSTEM WAIT STATE CODE 80009064 DURING IEAVNP10 INITIALIZATION

is very similar to the issue Christian Birr had with OS/390 2.08, as documented in GitHub issue 13:

IEA247I USING IEASYSDB FOR OS/390 02.08.00 HBB6608 IEA304W SYSTEM WAIT STATE CODE 80009064 DURING IEAVNP10 INITIALIZATION

Christian's ultimate problem was a bad (damaged) CCKD DASD image file, but your problem is probably not that. I suspect your dasd image files are probably okay.

However...

I happened to notice that his configuration file starts with:

ARCHMODE  ESA/390
ALRF      ENABLE

whereas your configuration file does not.

Your configuration begins with:

ARCHMODE  OS/390

which of course is wrong. You told Bert that you corrected it:

Hi Bert, I correct ARCHMODE and same behavior.

but you did not tell us what ARCHMODE value you changed it to.

What ARCHMODE value are you now using? Are you using ARCHMODE ESA/390 now?

I did some tests and due to Hyperion's different (more accurate) implementation of the STFL and STFLE instructions, the Facility List bytes returned by Hyperion when archmode ESAME is used is very different from the Facility List bytes that it returns when archmode z/Arch is used.

Have you tried adding a ARCHLVL ENABLE ASN_LX_REUSE statement (same thing as ALRF ENABLE) to your configuration file immediately following your ARCHMODE ESA/390 statement? (which, technically, for Hyperion, should actually be specified as ARCHLVL ESA/390 instead)

Enabling the ASN and LX Reuse Facility might be the solution to your problem?

IN SUMMARY:

Try removing your ARCHMODE OS/390 statement and replace it instead with:

ARCHLVL  ESA/390
ARCHLVL  ENABLE  ASN_LX_REUSE

and then try your test again using one of the pre-built snapshots from my SoftDevLabs web site.

Please let us know if that helps you.

Thanks!

clang20 commented 8 years ago

Hi Fish The setting is correct. Its esa/390. The same cfg works fine with 3.x. My dads is fine too.

Thanks

On Aug 14, 2016, at 20:29, Fish-Git notifications@github.com wrote:

Chris,

Your issue with OS/390 2.10:

IEA247I USING IEASYSDP FOR OS/390 02.10.00 HBB7703 IEA304W SYSTEM WAIT STATE CODE 80009064 DURING IEAVNP10 INITIALIZATION

is very similar to the issue Christian Birr had with OS/390 2.08, as documented in GitHub issue 13:

IEA247I USING IEASYSDB FOR OS/390 02.08.00 HBB6608 IEA304W SYSTEM WAIT STATE CODE 80009064 DURING IEAVNP10 INITIALIZATION

Christian's ultimate problem was a bad (damaged) CCKD DASD image file, but your problem is probably not that. I suspect your dasd image files are probably okay.

However...

I happened to notice that his configuration file starts with:

ARCHMODE ESA/390 ALRF ENABLE whereas your configuration file does not.

Your configuration begins with:

ARCHMODE OS/390 which of course is wrong. You told Bert that you corrected it:

Hi Bert, I correct ARCHMODE and same behavior.

but you did not tell us what ARCHMODE value you changed it to.

What ARCHMODE value are you now using? Are you using ARCHMODE ESA/390 now?

I did some tests and due to Hyperion's different (more accurate) implementation of the STFL and STFLE instructions, the Facility List bytes returned by Hyperion when archmode ESAME is used is very different from the Facility List bytes that it returns when archmode z/Arch is used.

Have you tried adding a ARCHLVL ENABLE ASN_LX_REUSE statement (same thing as ALRF ENABLE) to your configuration file immediately following your ARCHMODE ESA/390 statement? (which, technically, for Hyperion, should actually be specified as ARCHLVL ESA/390 instead)

Enabling the ASN and LX Reuse Facility might be the solution to your problem?

IN SUMMARY:

Try removing your ARCHMODE OS/390 statement and replace it instead with:

ARCHLVL ESA/390 ARCHLVL ENABLE ASN_LX_REUSE and then try your test again using one of the pre-built snapshots from my SoftDevLabs web site.

Please let us know if that helps you.

Thanks!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

Fish-Git commented 8 years ago

Hi Fish The setting is correct. Its esa/390. The same cfg works fine with 3.x.

Okay. Good.

My dads is fine too.

Huh? What does that mean? I don't understand.

What does "My dads is fine" mean?

clang20 commented 8 years ago

Hi Fish. The problem isn't my cfg file.

Thanks

clang20 commented 8 years ago

Sorry stupid auto correct. I'm on my phone.

Thanks

clang20 commented 8 years ago

Hi Fish, please pardon the autocorrect on my phone lol. Thanks

Chris

ivan-w commented 8 years ago

On 8/15/2016 6:01 AM, clang20 wrote:

On Aug 14, 2016, at 22:32, Fish-Git notifications@github.com wrote:

Hi Fish The setting is correct. Its esa/390. The same cfg works fine with 3.x.

I was wondering... Is it possible ASN and LX reuse is enable by default on hyperion and disabled by default on spinhawk ?

There is known bug in some versions on OS/390 2.10 where the control register explicitly requires use of the ASN and LX reuse feature, but is then issuing the wrong Set Address Space Control instruction leading to a program interrupt.

On hyperion try adding a "ARCHLVL ASN_LX_REUSE disabled" in your configuration file.

Ivan

g4ugm commented 8 years ago

Spelling correct changing dasd to dads perhaps?

clang20 commented 8 years ago

Hi Ivan, Exactly lol.

Chris

clang20 commented 8 years ago

Hi Dave rather :)

clang20 commented 8 years ago

Hi Ivan. Will try as soon as I can tomorrow and will let you know.

Thanks

Chris

Fish-Git commented 8 years ago

Spelling correct changing dasd to dads perhaps?

(Doh!) Of course. :)

ivan-w commented 8 years ago

On 8/15/2016 4:34 PM, Fish-Git wrote:

Spelling correct changing dasd to dads perhaps?

(Doh!) Of course. :)

Come on Fish.... didn't you auto back translated that one ?

But seriously ....

Don't you think the actual issue looks curiously close to the reason we had an option to disable ASN and LX reuse for OS/390 2.10 ?

Ivan

Fish-Git commented 8 years ago

I was wondering... Is it possible ASN and LX reuse is enable by default on hyperion and disabled by default on spinhawk ?

Good guess, Ivan. I was thinking the same thing (but the opposite), which is why I ran the tests I did. ("stfle.txt" test in the tests subdirectory) The results were, shall we say, quite illuminating. :)

I ran the test on both 3.12 and Hyperion first using archmode ESAME by itself (the test's default) and then again using archmode ESA/390 + ALRF ENABLE (as well as just archmode z/Arch by itself without any ALRF statement on Hyperion only), and this was the results:

(first 10 bytes of Facility List): (Note: the 'EE's you see is an artifact of the test; see "stfle.txt")

0  1  2  3  4  5  6  7  8  9

F1 F0 FF FB FC FC 80 00 04 1C (Spinhawk ESAME)
F1 F0 FF FB F0 70 08 00 EE EE (Hyperion ESAME)
81 00 CA 82 00 40 00 00 00 0C (Spinhawk 390+ALRF)
82 00 00 00 00 00 08 00 EE EE (Hyperion 390+ALRF)
F1 F4 FF FB F8 F5 08 00 20 1C (Hyperion z/Arch)

Quite interesting, isn't it?

Especially bit 69. ;-)

There is known bug in some versions on OS/390 2.10 where the control register explicitly requires use of the ASN and LX reuse feature, but is then issuing the wrong Set Address Space Control instruction leading to a program interrupt.

I'm thinking it has to be something along those lines too, simply because we're running out of possibilities. It seems 3.x is doing something incorrectly that just by coincidence happens to make OS/390 happy, whereas Hyperion is doing it correctly leading to the problem.

That may seem a tad arrogant but we do know for a fact Spinhawk is not setting its Facility List bits correctly whereas Hyperion sets them correctly. We also know other parts of Hyperion are more architecturally compliant as well (such as its LPARNUM and CPUIDFMT handling as well as its Channel Subsystem too) so the problem, as I theorize, is more likely IMO within 3.x than 4.x but it's presenting itself as the complete opposite! (at least in this particular specific case anyway)

On hyperion try adding a "ARCHLVL ASN_LX_REUSE disabled" in your configuration file.

I'm recommending the complete opposite! Enable ASN_LX_RESUSE, just like GitHub issue 13 shows Christian has it in his configuration file for OS/390 2.08.

But trying both ways (with and without it) can't hurt either of course, seeing how we're still stumbling in the dark on this issue.

clang20 commented 8 years ago

Hi Fish, I'll try the setting and will let you know.

Thanks,

Chris

Fish-Git commented 8 years ago

Don't you think the actual issue looks curiously close to the reason we had an option to disable ASN and LX reuse for OS/390 2.10 ?

Absolutely!

But since the default on Hyperion is disabled (i.e. ALRF DISABLE or ARCHLVL DISABLE ASN_LX_REUSE is the default unless overridden), I'm suggesting that he try enabling it instead to see if that helps.

Further, since Christian Birr had/has it enabled in his configuration file for OS/390 2.08 (as documented in GitHub issue 13), I'm thinking Chris should try the same thing with his OS/390 2.10.

And since we're on the subject of ASN and LX REUSE, something we should probably discuss in the developers group is Hyperion's technically incorrect handling of the ALRF bit. According to the Principles of Operation the bit should always be on (enabled). That is to say, our default handling is technically wrong:


SA22-7832-10 z/Architecture Principles of Operation:

Facility Indications

(pg. 4-78)

A bit is set to one regardless of the current architec- tural mode if its meaning is true. A meaning applies to the current architectural mode unless it is said to apply to a specific architectural mode.

(pg. 4-76)

bit 6: The ASN-and-LX reuse facility is installed in the z/Architecture architectural mode.


Which would require users not wanting such behavior to specifically disable it (rather than specifically enable like today).

But as I said this is something we should probably discuss in the developers group (or at least in a separate GitHub issue), not here.

Fish-Git commented 8 years ago

Come on Fish.... didn't you auto back translated that one ?

Believe it or not, no I didn't. It seems so obvious now of course, but at the time I honestly did not think of it. Go figure. :)

clang20 commented 8 years ago

Lol pre-guilts, auto correct sucks lol

Cheers

Chris

Fish-Git commented 8 years ago

Any luck yet, @clang20?

Have you had a chance to try the following yet?

ARCHLVL   z/Arch
archlvl enable asn_lx_reuse
archlvl enable bit44

and:

LPARNUM 1

Does it help you any? Does it resolve your problem?

It looks like Wayne Bickerdike was able to successfully use my 2016-08-12 Hyperion pre-built snapshot using the above ARCHLVL statements (see group message 79616), and as I explained in an earlier group post (message 79581) Hyperion uses the LPARNUM value to determine which CPUID format to use, which also might be causing your problem.

Please report back to let us know whether your problem has been resolved yet or not!

Thanks!

clang20 commented 8 years ago

Hi Fish, just got home from being out of town. I will try this afternoon and will let you know.

Thanks

Chris

clang20 commented 8 years ago

Hi Fish,

Same wait state using the settings below.

ARCHLVL z/Arch archlvl enable asn_lx_reuse archlvl enable bit44

Thanks,

Chris

clang20 commented 8 years ago

I also tried LPARNUM 1 with the same result.

Thanks,

Chris

clang20 commented 8 years ago

Hi Fish,

Let me know if there is anything else you want me to try or if there are any additional diagnostics I can perform.

Thanks,

Chris

Fish-Git commented 8 years ago

Let me know if there is anything else you want me to try or if there are any additional diagnostics I can perform.

Well, I don't know what your skill level is, but what I'd like to do (I'm not sure how to do it but nevertheless this is what I'd like to somehow do) is, do an instruction trace of IEAVNP10.

You would need to somehow determine where in storage IEAVNP10 gets loaded, and then issue a t addr-addr (or t addr.len) Hercules command so we can see what it's doing. We need to determine what the heck it's complaining about.

Your OSTAILOR NONE log didn't show anything unusual, so it's still a mystery why OS/390 is saying a program-check occurred when the evidence shows that's not true. I'm suspecting disabled wait 9064 is simply some type of catch-all "something bad happened" type of disabled waits.

But as I said, we need to see what IEAVNP10 is actually doing. Where it is taking its wrong turn. How it reached its "I give up! Let's throw a 9064 disabled wait!" decision.

At least that's what I would like to do unless someone else can come up with a better idea.

Can you do that Chris?

clang20 commented 8 years ago

Hi Fish,

Would this involve traces on MVS or would it involve traces in Hercules or both? Well I am able to IPL the system using 3.10 if needed. Sounds like both information from Hercules and MVS will be needed. It seems like I would need to run a system trace though OS/390 specific documentation seems a bit scarce after a quick google search so I would have to dig deeper. I wonder if this module always loads at the same address. If someone could guide me in the right direction I can try to get what you need. In the meantime I’ll do some more research. Perhaps a standalone dump might help?

Thanks,

Chris

ivan-w commented 8 years ago

Chris,

Could you try SPECIFICALLY disabling ASN and LX reuse ? Again, I kind of remember that some of the (now deprecated) ASN_LX_REUSE statements, now replaced by ARCHLVL, were especially implemented in order to run the AD/CD version of OS/390 2.10 which had a bug in NIP that prevented an IPL. CR0 bit 44 was set to 1 and DID specify the OS wanted to use ASN LX Reuse but then used a Load Address Space Parameter (LASP) instruction which was incompatible with the facility, leading to a specification exception or a special operation exception.

Try looking at any program interrupt by setting OSTAILOR NONE, then looking at CR0 and the instruction parameters at the time at program interrupt.

If this is the case, then there is an issue with how we handle how ASN and LX Reuse is enabled/disabled.

Ivan