hercules-390 / hyperion

Hercules 390
Other
252 stars 68 forks source link

IPL OS/390 2.10 Processor CP00: disabled wait state 000A0000 80009064 #141

Closed clang20 closed 8 years ago

clang20 commented 8 years ago

Hi All,

I used to be able to IPL this system using Hercules 3.10. I'm now using Hercules Hyperion (pulled down source yesterday) and am unable to IPL this system. OSTAILOR is set to OS/390, LPARNUM 10 and CPUVERID 47. Any ideas on what to try next?

Thanks,

Chris

clang20 commented 8 years ago

Hi Ivan,

Here are the exceptions logged. I set ASN_LX_REUSE to disabled.

18:51:53 HHC01603I ipl 01c0
18:51:53 HHC00801I Processor CP00: Special-operation exception code 0013  ilc 4
18:51:53 HHC02324I PSW=0008000080000606 INST=B2790000     SACF  0(0)                   set_address_space_control_fast
18:51:53 HHC02326I R:00000000:K:06=00080000 800005EC 40404040 F0F04040  ........    00
18:51:53 HHC02269I GR00=00000000 GR01=00000000 GR02=00000000 GR03=00000000
18:51:53 HHC02269I GR04=00000000 GR05=00000000 GR06=00000000 GR07=00000000
18:51:53 HHC02269I GR08=00000000 GR09=00000000 GR10=00000000 GR11=00000000
18:51:53 HHC02269I GR12=00000000 GR13=00000000 GR14=00000000 GR15=00000000
18:51:53 HHC02271I CR00=01B00200 CR01=00000000 CR02=00000000 CR03=00000000
18:51:53 HHC02271I CR04=00000000 CR05=00000000 CR06=FE000000 CR07=00000000
18:51:53 HHC02271I CR08=00000000 CR09=00000000 CR10=00000000 CR11=00000000
18:51:53 HHC02271I CR12=00000000 CR13=00000000 CR14=C2000000 CR15=00000000
18:51:54 HHC00107I Starting thread cckd_ra(), active=0, started=0, max=2
18:51:54 HHC00100I Thread id 00002890, prio  0, name Read-ahead thread-1 started
18:51:54 HHC00107I Starting thread cckd_ra() from cckd_ra(), active=1, started=1, max=2
18:51:54 HHC00100I Thread id 00004634, prio  0, name Read-ahead thread-2 started
18:51:57 HHC00801I Processor CP00: Operation exception code 0001  ilc 4
18:51:57 HHC02324I PSW=040C000080004B2C INST=B2650000     SVS   0,0                    set_vector_summary
18:51:57 HHC02326I V:0000479C:K:06=5840C8D8 92004013 58B00010 9602B178  . HQk. .....o...
18:51:57 HHC02326I V:0000479C:K:06=5840C8D8 92004013 58B00010 9602B178  . HQk. .....o...
18:51:57 HHC02269I GR00=8000479C GR01=00000001 GR02=00000000 GR03=00000000
18:51:57 HHC02269I GR04=027627B0 GR05=027627A0 GR06=0003F32E GR07=00F4C008
18:51:57 HHC02269I GR08=0000397C GR09=00FCE480 GR10=00004B30 GR11=00FCD488
18:51:57 HHC02269I GR12=80004490 GR13=00004C0C GR14=8000480E GR15=02762530
18:51:57 HHC02271I CR00=4FB1EE40 CR01=3FFFE07F CR02=3F3C2380 CR03=80000001
18:51:57 HHC02271I CR04=00000001 CR05=3F3C2740 CR06=FE000000 CR07=3FFFE07F
18:51:57 HHC02271I CR08=00000000 CR09=00000000 CR10=00000000 CR11=00000000
18:51:57 HHC02271I CR12=00000000 CR13=3FFFE07F CR14=C00BF32E CR15=00000000
18:51:57 HHC00801I Processor CP00: Operation exception code 0001  ilc 4
18:51:57 HHC02324I PSW=040C20008000481C INST=B25C0040     ????? ,                      ?
18:51:57 HHC02326I V:00000000:K:0E=040E0000 8126CB90 00000000 00000000  ....a...........
18:51:57 HHC02326I V:0000479C:K:06=5840C8D8 92004013 58B00010 9602B178  . HQk. .....o...
18:51:57 HHC02269I GR00=8000479C GR01=7FFFF000 GR02=00000000 GR03=00000000
18:51:57 HHC02269I GR04=00000000 GR05=027627A0 GR06=0003F32E GR07=00F4C008
18:51:57 HHC02269I GR08=0000397C GR09=00FCE480 GR10=00004824 GR11=00FCD488
18:51:57 HHC02269I GR12=80004490 GR13=00004C0C GR14=80004812 GR15=02762530
18:51:57 HHC02271I CR00=4FB1EE40 CR01=3FFFE07F CR02=3F3C2380 CR03=80000001
18:51:57 HHC02271I CR04=00000001 CR05=3F3C2740 CR06=FE000000 CR07=3FFFE07F
18:51:57 HHC02271I CR08=00000000 CR09=00000000 CR10=00000000 CR11=00000000
18:51:57 HHC02271I CR12=00000000 CR13=3FFFE07F CR14=C00BF32E CR15=00000000
18:51:57 HHC00006I SCLP console interface active

Would a stand alone dump be useful? Roughly I would need to find a job to create the dump program, I would then need to create a couple of new volumes one for ipl of the stande alone dump program and one for the dump. After the waitstate I'd have to IPL the dump program and perform the dump. This might take some time and would this work under Hercules?

Just for the sake of it I tried to IPL the system again (without restarting Hercules) and it looks like Hercules just stops executing instructions and it sits there. There is one exception in the log which may be normal.

19:05:53 HHC01603I ipl 01c0
19:05:53 HHC00801I Processor CP00: Special-operation exception code 0013  ilc 4
19:05:53 HHC02324I PSW=0008000080000606 INST=B2790000     SACF  0(0)                   set_address_space_control_fast
19:05:53 HHC02326I R:00000000:K:06=00080000 800005EC 40404040 F0F04040  ........    00
19:05:53 HHC02269I GR00=00000000 GR01=0286B3E0 GR02=00FCD488 GR03=00F4C2D0
19:05:53 HHC02269I GR04=02858AE0 GR05=00000000 GR06=02762530 GR07=00FCD488
19:05:53 HHC02269I GR08=00000000 GR09=02865EB0 GR10=0286B160 GR11=0120E487
19:05:53 HHC02269I GR12=8120D488 GR13=0286B428 GR14=02865EB0 GR15=02865EB0
19:05:53 HHC02271I CR00=01B00200 CR01=00000000 CR02=00000000 CR03=00000000
19:05:53 HHC02271I CR04=00000000 CR05=00000000 CR06=FE000000 CR07=00000000
19:05:53 HHC02271I CR08=00000000 CR09=00000000 CR10=00000000 CR11=00000000
19:05:53 HHC02271I CR12=00000000 CR13=00000000 CR14=C2000000 CR15=00000000
19:09:44 HHC01603I quit

Hope this information helps.

Thanks,

Chris

Fish-Git commented 8 years ago

Chris,

Could you try SPECIFICALLY disabling ASN and LX reuse ?

(sigh) Ivan, this is a waste of time!

We already know for a fact that Hyperion defaults to ALRF being disabled, so specifically disabling it is a no-op since it's already disabled to begin with. I proved this fact (and you can prove it for yourself too if you don't believe me) with my stfle.txt test that I posted four days ago:

0  1  2  3  4  5  6  7  8  9

F1 F0 FF FB FC FC 80 00 04 1C (Spinhawk ESAME)
F1 F0 FF FB F0 70 08 00 EE EE (Hyperion ESAME)
81 00 CA 82 00 40 00 00 00 0C (Spinhawk 390+ALRF)
82 00 00 00 00 00 08 00 EE EE (Hyperion 390+ALRF)
F1 F4 FF FB F8 F5 08 00 20 1C (Hyperion z/Arch)

As you can clearly see, only when ALRF is specifically enabled is the facility bit turned on (byte 0 bit 6 = x'02'). When ALRF is not specifically enabled, it defaults to off.

So asking him to specifically disable it is a waste of time since we know it will always already be off unless he specifically enables it.

And he has already tried enabling it and it didn't help so we know ALRF is not the problem!

It's something else.

ivan-w commented 8 years ago

Ok,

I checked the code (I was afraid ARCHLVL was only setting the STFL bit and not disabling the actual facility - I checked and the code correctly enables/disables the ASN_LX_REUSE path of control instructions depending on the STFL bit).

But now I am wondering : Why is it disabled by default ? The only point of disabling it was especially to circumvent a bug in a specific version of an AD/CD version of OS/390 (basically a workaround to allow things to move forward). ASN and LX Reuse is now pretty much a standard so I think it should be ENABLED by default, and DISABLED when attempting to run a buggy version of some specific version of some specific OS.

Just asking...

Ivan

Fish-Git commented 8 years ago

But now I am wondering : Why is it disabled by default ?

Precisely the question I asked and precisely the topic currently under discussion in GitHub issue #143: "MISSING: "runtest" regression test for STFL/STFLE instructions"

But how issue #143 is eventually ultimately resolved will unfortunately get us no closer to resolving this issue: Chris's inability to IPL his OS/390 2.10 system with Hyperion. We still need to stay focused on that.

The avenue of approach currently being pursued is to trace the execution of IEAVNP10 to try and see if we can determine where (and thus hopefully why) it's taking its wrong turn.

If anyone reading this thread has any better idea(s), I'm all ears!

ivan-w commented 8 years ago

On 8/19/2016 6:26 PM, Fish-Git wrote:

But now I am wondering : Why is it disabled by default ?

Precisely the question I asked and precisely the topic currently under discussion in GitHub issue #143 https://github.com/hercules-390/hyperion/issues/143: "MISSING: "runtest" regression test for STFL/STFLE instructions"

But how issue #143 https://github.com/hercules-390/hyperion/issues/143 is eventually ultimately resolved will unfortunately get us no closer to resolving /this/ issue: Chris's inability to IPL his OS/390 2.10 system with Hyperion. We still need to stay focused on that.

The avenue of approach currently being pursued is to trace the execution of |IEAVNP10| to try and see if we can determine where (and thus hopefully /why/) it's taking its wrong turn.

If anyone reading this thread has any better idea(s), I'm all ears!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hercules-390/hyperion/issues/141#issuecomment-241065491, or mute the thread https://github.com/notifications/unsubscribe-auth/ABjMW5XeJrAbNuKZ87RC-CngJx6jGlkZks5qhdkogaJpZM4JgoAh.

I have proposed all options I have regarding this issue. But since I no longer have any version os OS/390 2.10, I cannot help much.

The only time I have seen OS/390 2.10 fail in NIP was because of ASN and LX Reuse, thus why I was trying to explore this possibility.

But you have closed this path, so I can no longer help.

--Ivan

clang20 commented 8 years ago

Hi Ivan,

Can I contact you offline?

Chris

clang20 commented 8 years ago

Hi Ivan, I was going to ask if it would help if I made my system available for further troubleshooting. Let me know.

Thanks

clang20 commented 8 years ago

Hi Fish, Contrary to what was said on the yahoo group this is 4.00 from Feb 2015 and it works.

22:50:42 HHC01413I Hercules version 4.00 22:50:42 HHC01414I (c) Copyright 1999-2012 by Roger Bowler, Jan Jaeger, and others 22:50:42 HHC01415I Built on Feb 6 2015 at 13:49:07 22:50:42 HHC01416I Build information: 22:50:42 HHC01417I Windows (MSVC) build for AMD64 22:50:42 HHC01417I Modes: S/370 ESA/390 z/Arch 22:50:42 HHC01417I Max CPU Engines: 8

clang20 commented 8 years ago

PS I can pull down the 6 Feb 2015 source and try it.

BertLindeman commented 8 years ago

Copied from my hercules-390 post at hercules-390@yahoogroups.com IEA304W SYSTEM WAIT STATE CODE 80009064 DURING IEAVNP10 INITIALIZATION Thanks for the attention @Fish-Git

= = = Start of copy = = =

Hi Chris,

Maybe a longshot, but . . . .

Default for maxcpu on Hercules 3 is 8
and on Hercules 4 it changed to 32.

Could you please try to add e.g.
maxcpu 8
to your config and re-ipl with an Hyperion snapshot?

Could well be that OS/390 cannot handle so many cpu's.

More recent IEAVNP10 *do* check for number of cpu's

Regards,
Bert 

= = = end of copy = = =

@clang20 No NEED to update your configuration. After a wait 064-009 in hercules issue the command maxcpu 8 and re-ipl. Hopefully this ipl succeeds. (Curious... and hopeful)

clang20 commented 8 years ago

Hi Bert,

You won't believe it but that did it! maxcpu 8 and it IPL's now.

Thanks,

Chris

clang20 commented 8 years ago

Hi Bert, Do you know what the equivalent config file setting is?

Thanks,

Chris

ivan-w commented 8 years ago

On 8/24/2016 8:02 PM, clang20 wrote:

Hi Bert, Do you know what the equivalent config file setting is?

Thanks,

Chris

The same.

--Ivan

BertLindeman commented 8 years ago

That is good news :ok_hand: Thanks @clang20 The config setting is the same as the hercules console command as @ivan-w stated above.

General question from me to the developpers: would this be a reason to set the default maxcpu depending om the archlvl setting? That would complicate things a bit.

User needs to keep a eye on the order of configuration settings.

Just my 'simple hercules user' point of view.

clang20 commented 8 years ago

Hi Bert, Ivan, Thanks.

Bert I think your idea is a good one to set maxcpu based on the archlvl and the user can override if they choose.

Thanks,

Chris

ivan-w commented 8 years ago

On 8/24/2016 9:01 PM, clang20 wrote:

Hi Bert, Ivan, Thanks.

Bert I think your idea is a good one to set maxcpu based on the archlvl and the user can override if they choose.

Thanks,

Chris That'd be a good idea if...

... there were actually any architecture defined limit on the number of CPUs depending on the architecture !

--Ivan

clang20 commented 8 years ago

Hi Ivan,

Right. Though wouldn’t a more reasonable value be appropriate as a default? In order to avoid the issue I ran into.

Thanks,

Chris

ivan-w commented 8 years ago

On 8/24/2016 9:11 PM, clang20 wrote:

Hi Ivan,

Right. Though wouldn’t a more reasonable value be appropriate as a default? In order to avoid the issue I ran into.

Thanks,

Chris Chris,

Not an unreasonable request ;)

It's just not ARCHLVL based. The issue is simply that MAXCPU defaults to the maximum number of CPUs supported by the engine (64 for Windows and 128 for x86_64 linux).

However, defaulting MAXCPU to 8 might be a sensible choice since the default on the maximum number of CPUs supported in the engine used to be

  1. (it's a backward compatibility issue).

I'll patch hyperion in that effect (unless someone objects).

--Ivan

clang20 commented 8 years ago

Thanks Ivan.

Cheers

Chris

BertLindeman commented 8 years ago

No objection from me @ivan-w Thanks

Fish-Git commented 8 years ago

You won't believe it but that did it! maxcpu 8 and it IPL's now.

Fantastic news, Chris! Wow. All this time that's all it was!

THANK YOU BERT! :))

Closing this issue as RESOLVED!!