Closed hockeyrob closed 1 year ago
I added your example value for the HEAP64 parameter, which was: HEAP64(512M,4M,KEEP,256M,4M,KEEP,OK,FREE) and got:
CEE3792I The following messages pertain to the DD:CEEOPTS dataset run-time options.
CEE3614I An invalid character occurred in the numeric string 'OK' of the run-time option HEAP64.
CEE3614I An invalid character occurred in the numeric string 'FREE' of the run-time option HEAP64.
And, no change, still the same ABEND.
Ok, not the same result....it got the same ABEND, but with:
CEE3204S The system detected a protection exception (System Completion Code=0C4).
From compile unit ZZOW04:/ZOWE/tmp/pax-packaging-launcher-1673380536536/content/build/../deps/launcher/common/c/logging.c at
entry point logConfigureDestination at statement 426 at compile unit offset +0000000013CACC2A at entry offset
+00000000000000F2 at address 0000000013CACC2A.
@1000TurquoisePogs Can you please take a look at this problem?
Looks like in the example there is a typo in the 7th parameter (the '0K' one) - there is 'OK' instead of the maybe needed '0'=like ZERO 'K'. See: https://www.ibm.com/docs/en/zos/2.4.0?topic=options-heap64-amode-64-only
Thanks, pinpan....fixed that. It did have an "O" instead of "0", and there needs to be another numeric parameter after it, so where you have "OK" you need "0K,0K". No real change; this is the first output:
CEE3204S The system detected a protection exception (System Completion Code=0C4).
From compile unit ZZOW04:/ZOWE/tmp/pax-packaging-launcher-1673380536536/content/build/../deps/launcher/common/c/logging.c at
entry point logConfigureDestination at statement 426 at compile unit offset +0000000013CACC2A at entry offset
+00000000000000F2 at address 0000000013CACC2A.
so, same result. But you're right, the HEAP64 spec needs changed.
As a shot in the dark I started ZWESISTC so the cross-memory server was running when I start ZWESLSTC; the cross-memory server starts ok, and has been running now for several days, but I still get the listed crash from ZWESLSTC.
What else can I send you?
Working with a customer today @ifakhrutdinov suggested that the error was because all of the components starting on the same address space were colliding with storage access.
The fix that worked for the customer was to change KEEP,256M -> KEEP,32M
.
We may need to update our documentation chapter https://docs.zowe.org/stable/user-guide/configure-uss/#language-environment. @samanthasusu
If I understand well, there was not enough room for memory allocation. It means the method makeLocalLoggingContext
called safeMalloc31
to allocate the memory but the response was NULL
. What about improving safeMalloc31
or adding detection of not created LoggingContext
(rather both)? I guess once a C application asks for memory and it is not created it should at least write a log message about and the missing LoggingContext
should end with an ABEND.
Missing verification (+exit) if it is not null: https://github.com/zowe/launcher/blob/2be472b794647198f756b86cc0ba10223edb094a/src/main.c#L1474
Location to add a new log message about missing resources: https://github.com/zowe/zowe-common-c/blob/541462f70ceff3ca0066aacc203b03df50cdd3d4/c/alloc.c#L491 https://github.com/zowe/zowe-common-c/blob/541462f70ceff3ca0066aacc203b03df50cdd3d4/c/alloc.c#L530 https://github.com/zowe/zowe-common-c/blob/541462f70ceff3ca0066aacc203b03df50cdd3d4/c/alloc.c#L567 https://github.com/zowe/zowe-common-c/blob/541462f70ceff3ca0066aacc203b03df50cdd3d4/c/alloc.c#L844 https://github.com/zowe/zowe-common-c/blob/541462f70ceff3ca0066aacc203b03df50cdd3d4/c/alloc.c#L1053
The recommendation on lower memory to avoid out of memory situation was merged 2 weeks ago, it just needs to be propagated throughout the website https://github.com/zowe/docs-site/pull/2580
Related to https://github.com/zowe/docs-site/pull/2664.
@1000TurquoisePogs and @ifakhrutdinov . In the PR https://github.com/zowe/docs-site/pull/2580 the recomendation is
`HEAP64(4M,4M,KEEP,1M,1M,KEEP,0K,0K,FREE)
whereas when working with the customer Irek suggested
`HEAP64(512M,4M,KEEP,32M,4M,KEEP,0K,FREE)
Both are much less than the 256M that was causing the abend. Which version would you like as the "single version of truth" going forward ? If the smaller values in Sean's 2580 work then we could go with that, however the larger values seem to work also and give us more headroom.
I don't think it's a HEAP64 issue; I've tried:
HEAP64(512M,4M,KEEP,128M,4M,KEEP,0K,FREE)
HEAP64(512M,4M,KEEP,256M,4M,KEEP,0K,0K,FREE)
HEAP64(512M,4M,KEEP,128M,4M,KEEP,0K,0K,FREE)
HEAP64(512M,4M,KEEP,64M,4M,KEEP,0K,0K,FREE)
HEAP64(512M,4M,KEEP,32M,1M,KEEP,0K,0K,FREE)
HEAP64(512M,4M,KEEP,1M,1M,KEEP,0K,0K,FREE)
HEAP64(512M,4M,KEEP,1M,256K,KEEP,0K,0K,FREE)
And then tried reducing the 64-bit heap: HEAP64(256M,4M,KEEP,1M,1M,KEEP,0K,0K,FREE) HEAP64(64M,4M,KEEP,1M,1M,KEEP,0K,0K,FREE)
One of these gave me a slightly different result...
CEE3204S The system detected a protection exception (System Completion Code=0C4).
From compile unit ZZOW04:/ZOWE/tmp/pax-packaging-launcher-1673380536536/content/build/../deps/launcher/common/c/logging.c at
entry point logConfigureDestination at statement 436 at compile unit offset +0000000013CACD06 at entry offset
+00000000000001CE at address 0000000013CACD06.
But the rest gave me the same result as before:
CEE3204S The system detected a protection exception (System Completion Code=0C4).
From compile unit ZZOW04:/ZOWE/tmp/pax-packaging-launcher-1673380536536/content/build/../deps/launcher/common/c/logging.c at
entry point logConfigureDestination at statement 426 at compile unit offset +0000000013CACC2A at entry offset
+00000000000000F2 at address 0000000013CACC2A.
So, I don't think it's a HEAP64 problem, unless I need to give it a lot more than 512M above the bar.
Going forward, you need to make sure to specify 0K,0K for the below-the-line storage; LE will complain if you omit the secondary value.
When helping with this problem there was another thing that needed to happen and it was to change configuration in zowe.yaml and provide following key with the value false.
zowe.launcher.shareAs: false
The info about setting it is here: https://docs.zowe.org/stable/appendix/zowe-yaml-configuration/#launcher-and-launch-scripts
The problem appears to have been with HEAPPOOLS. I turned on RPTSTG and it recommended a different setting, which also didn’t work, and recommended HEAPP=(OFF), which let me get past that problem. Messages in the launcher output say HEAP64 is an invalid runtime option or is not supported in this release of LE.
Bottom line, Zowe still isn’t working, but I got past this ABEND. I’m going to stop the launcher and restart without RPTSTG.
I tried adding it before the components section as zowe.launcher.shareAS: false
and by adding launcher: shareAs: false
and both give me this same result when trying to start ZWESLSTC:
2023-02-23 15:41:14
So, where and how do YOU specify it? The doc at the specified link didn’t show either.
From: Jakub Balhar @.> Sent: Thursday, February 23, 2023 3:45 AM To: zowe/community @.> Cc: Hamilton, Robert @.>; Author @.> Subject: [EXT] Re: [zowe/community] Zowe 2.6.1 fails with ABENDS0C4-4 when starting (Issue #1852)
[Actual Sender is @.**@.>]
When helping with this problem there was another thing that needed to happen and it was to change configuration in zowe.yaml and provide following key with the value false.
zowe.launcher.shareAs: false
The info about setting it is here: https://docs.zowe.org/stable/appendix/zowe-yaml-configuration/#launcher-and-launch-scripts
— Reply to this email directly, view it on GitHubhttps://github.com/zowe/community/issues/1852#issuecomment-1441384946, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A2B6KPVWFGYHYQMMKNITKQLWY4PPRANCNFSM6AAAAAAU6UENJE. You are receiving this because you authored the thread.Message ID: @.**@.>> Confidentiality Notice: This electronic message transmission, including any attachment(s), may contain confidential, proprietary, or privileged information from CAS, a division of the American Chemical Society ("ACS"). If you have received this transmission in error, be advised that any disclosure, copying, distribution, or use of the contents of this information is strictly prohibited. Please destroy all copies of the message and contact the sender immediately by either replying to this message or calling 614-447-3600.
@hockeyrob.
... where and how do YOU specify it? The doc at the specified link didn’t show either
In the zowe.yaml
file add two lines. The first is launcher:
starting at column 2 (if you don't already have this present) and beneath that starting at column 4 shareAs: false
.
Looking at the error you're getting this maybe is because you have shareAS
and not shareAs
?
I put it after the job: specification, under the zowe: specification:
. . . # >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
job:
name: ZWESLSTC
# Prefix of component address space
prefix: ZOWE
launcher: shareAs: false
rbacProfileIdentifier: "1" . . .
This is what it gave me:
2023-02-23 16:09:44
From: Joe Winchester @.> Sent: Thursday, February 23, 2023 11:05 AM To: zowe/community @.> Cc: Hamilton, Robert @.>; Mention @.> Subject: [EXT] Re: [zowe/community] Zowe 2.6.1 fails with ABENDS0C4-4 when starting (Issue #1852)
[Actual Sender is @.**@.>]
@hockeyrobhttps://github.com/hockeyrob.
... where and how do YOU specify it? The doc at the specified link didn’t show either
In the zowe.yaml file add two lines. The first is launcher: starting at column 2 (if you don't already have this present) and beneath that starting at column 4 shareAs: false.
Looking at the error you're getting this maybe is because you have shareAS and not shareAs ?
— Reply to this email directly, view it on GitHubhttps://github.com/zowe/community/issues/1852#issuecomment-1442037666, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A2B6KPUZOYE3TCLJPMJKKK3WY6DEHANCNFSM6AAAAAAU6UENJE. You are receiving this because you were mentioned.Message ID: @.**@.>> Confidentiality Notice: This electronic message transmission, including any attachment(s), may contain confidential, proprietary, or privileged information from CAS, a division of the American Chemical Society ("ACS"). If you have received this transmission in error, be advised that any disclosure, copying, distribution, or use of the contents of this information is strictly prohibited. Please destroy all copies of the message and contact the sender immediately by either replying to this message or calling 614-447-3600.
Yep, noticed the capitalization error…..retried with it corrected, and got the same result. Confidentiality Notice: This electronic message transmission, including any attachment(s), may contain confidential, proprietary, or privileged information from CAS, a division of the American Chemical Society ("ACS"). If you have received this transmission in error, be advised that any disclosure, copying, distribution, or use of the contents of this information is strictly prohibited. Please destroy all copies of the message and contact the sender immediately by either replying to this message or calling 614-447-3600.
So, the launcher still wasn't starting the gateway. I added a few items into my zowe.yaml that were new (since my last install) in example-zowe.yaml, including using the configmgr with validation set to STRICT, so I can see what is failing and when. I added the onComponentConfigureFail: warn, and set debug to "true" for the gateway. This was all so I could see why the gateway won't start for me.
And, I see that there are some protection exceptions (more 0C4s) in logging.c at entry point makeLocalLoggingContext. There are two of these errors, as the gateway and zss fail to start.
The problem appears to have been with HEAPPOOLS. I turned on RPTSTG and it recommended a different setting, which also didn’t work, and recommended HEAPP=(OFF), which let me get past that problem. Messages in the launcher output say HEAP64 is an invalid runtime option or is not supported in this release of LE.
Bottom line, Zowe still isn’t working, but I got past this ABEND. I’m going to stop the launcher and restart without RPTSTG.
Thank you for that. It's a problem known to me but I thought we had fixed it. It's just another occurrence and we hadn't covered them all, so I hope we'll be better off with this change https://github.com/zowe/launcher/pull/64
regarding zowe.launcher.shareAs, the schema documents its values can be "yes" or "no", not true/false: https://github.com/zowe/zowe-install-packaging/blob/v2.x/staging/schemas/zowe-yaml-schema.json#L476 You may not need it at all, but you can give it a try.
Pretty bizarre.... I added this to zowe.yaml, right after the launchScript group:
launcher: shareAs: no
and......
2023-02-24 17:32:03
2023-02-24 17:32:03
2023-02-24 17:32:03
2023-02-24 17:32:03
Validity Exceptions(s) with object at
Validity Exceptions(s) with object at /zowe
Validity Exceptions(s) with object at /zowe/launcher
unspecified additional property not allowed: 'shareAs' at '/zowe/launcher/shareAs'
Taking it back out and retrying. There are still a number of protection exceptions when it starts, mostly from configmgr, which keeps other things from starting. Presuming the SMP/E update was good enough to get the software to 2.6.1, I'm guessing I have problems in the zowe.yaml configuration, so I'm going to write up what I have in mine that's different from the example-zowe.yaml that was distributed. I'll get back to you as soon as I have that.
Okay, these are the configuration items in zowe.yaml that I have changed from the example:
zowe.setup.dataset: prefix, proclib, parmlib, jcllib, loadlib, authLoadlib, authPluginLib zowe.setup.security: product, groups, users, stcs zowe.setup.certificate.pkcs12: directory, lock, etc... zowe.runtimeDirectory zowe.logDirectory zowe.workspaceDirectory zowe.extensionDirectory zowe.configmgr.validation: "STRICT" zowe.job: name, prefix zowe.cookieIdentifier zowe.externalDomains zowe.certificate: keystore values, truststore values, pem values zowe.verifyCertificates: DISABLED java.home node.home zOSMF: host, port components.gateway.debug: true components.caching-service.enabled: false components.zss.tls: false
The caching service is disabled; should we want it it will be vsam, so the infinispan configuration item(s) are removed. Other than these items, everything matches what's in example-zowe.yaml
None of the certificate items should matter, since I've specified verifyCertificates as DISABLED. The launcher can find node and java. Still getting 0C4 protection exceptions which keep the gateway, api-catalog and discovery from starting. What else should I change to get this thing working?
@hockeyrob , did you indent so it looks like
launcher:
shareAs: no
The "shareAs" violation is due to a typo discovered last month in that schema file which isnt fixed until the to-be-release zowe 2.7. It specifically effects "zowe.launcher.shareAs" but does not affect "components.componentname.launcher.shareAs" which should work. It's the difference between global & per-component. The fix for the global one is basically indentation. within zowe/schemas/zowe-yaml-schema.json
change
"launcher": {
"type": "object",
"description": "Set default behaviors of how the Zowe launcher will handle components",
"additionalProperties": false,
"properties": {
"restartIntervals": {
"type": "array",
"description": "Intervals of seconds to wait before restarting a component if it fails before the minUptime value.",
"items": {
"type": "integer"
},
"minUptime": {
"type": "integer",
"default": 90,
"description": "The minimum amount of seconds before a component is considered running and the restart counter is reset."
},
"shareAs": {
"type": "string",
"description": "Determines which SHAREAS mode should be used when starting a component",
"enum": ["no", "yes", "must", ""],
"default": "yes"
}
}
}
},
to
"launcher": {
"type": "object",
"description": "Set default behaviors of how the Zowe launcher will handle components",
"additionalProperties": false,
"properties": {
"restartIntervals": {
"type": "array",
"description": "Intervals of seconds to wait before restarting a component if it fails before the minUptime value.",
"items": {
"type": "integer"
}
},
"minUptime": {
"type": "integer",
"default": 90,
"description": "The minimum amount of seconds before a component is considered running and the restart counter is reset."
},
"shareAs": {
"type": "string",
"description": "Determines which SHAREAS mode should be used when starting a component",
"enum": ["no", "yes", "must", ""],
"default": "yes"
}
}
},
@hockeyrob , did you indent so it looks like
launcher: shareAs: no
Yes, I did the proper indentation...2 spaces before launcher and 4 before shareAs.
@1000TurquoisePogs : Made that change. It starts up faster. Still not accepting incoming connections. Gateway is configured to use port 7554, but there is nothing listening on that port. The only port on which anything is listening is 7557, ZSS. I'm doing more searching through the sysprint and other logs to see what else may have happened. Film at 11.
@1000TurquoisePogs Okay, it's starting up quickly now, which gives me more opportunities to kill it, change something, restart it, and iterate. I've set two of the debugging flags; don't want to set too many to keep down the noise. Still not working.
At this point the original problem is resolved; since I updated that schema it hasn't gotten any protection exceptions, so we can close this case, if you want. It's still not working; 7557 is the only port on which some part of the application is listening, so...the gateway isn't listening, even though the launcher seems to say it's up.
When I stop ZWESLSTC, whether i use zwe stop or the STOP operator command, 8 or 10 process remain running, and I have to cancel most individually by ASID. When I do, I get messages about the task not being undubbed...so I started socket/sockapi traces to see whether there were any tasks connecting to TCP/IP and just not creating a socket for bind/listen...to no avail. There were several things (~100) in the trace that I think are from shell scripts checking on available/required ports...but no other connections to TCP/IP by the ZWESLSTC job.
I've specified verifyCertificates:DISABLED, because I don't have all the certificate pieces figured out from your descriptions, so all i want to do is get the thing running, and then I'll worry about certificates when they matter. Looks like it is still complaining about the pieces/parts that are only partly complete; certificate.pem.key just had a file name and no associated file, which it then couldn't read at all, no surprise. I copied some data to that file, and now I'm getting a complaint about "Unparsed DER bytes remain after ASN.1 parsing" of my CA cert.....I thought when I said not to verify certificates it would, you know, not verify certificates. Is this the next problem to resolve? I've created a .kdb with gskkyman, created/imported a CA cert, generated a server cert....but I can't tell whether Zowe can read the .kdb, or if I need to extract all the certs in the .kdb, and make up some keys for those certs, some of which won't have any...and then figure out....OK, it's getting late. I'll beat on this again tomorrow.
I'd appreciate if this issue ticket was closed and a new one opened up, because for the sake of an open source project, other users with the same problems will want to search for them, and wouldn't find your latest issue at the bottom of this conversation.
Can you make a new issue ticket focusing on your lack of ports listening? As far as I know, STOP is the right command to issue, but STOP sends a SIGTERM to unix processes, and if they are stuck, they may not end without a SIGKILL. Perhaps the fact you cannot STOP correctly is related to why the servers are also not listening.
I thought when I said not to verify certificates it would, you know, not verify certificates.
It controls verification of certificates on network traffic, and if the certificates have valid claims like expiration date and hostname, but servers still need certificates to present to the browsers, so if there's a certificate parsing error it's still an issue that must be solved.
zwe init certificate
can make you a simple keystore, though there is a known issue where the newest versions of java create a keystore that can't be read by systemssl https://github.com/zowe/docs-site/issues/2459 which ZSS uses (7557) though the rest of Zowe would be uneffected, so basically give zwe init certificate
a try to make progress, and certificates can be further configured later, as it's often the most time consuming last step.
Closing this issue. Will open another after spending more time trying to figure out the certs.
Describe the bug Pretty simple; Zowe 2.6.1 fails with ABENDS0C4-4 when starting.
The CEEDUMP from the ABEND starts with: CEE3204S The system detected a protection exception (System Completion Code=0C4).
From compile unit ZZOW04:/ZOWE/tmp/pax-packaging-launcher-1673380536536/content/build/../deps/launcher/common/c/logging.c at
entry point logConfigureDestination at statement 436 at compile unit offset +0000000013CACD06 at entry offset
+00000000000001CE at address 0000000013CACD06.
Steps to Reproduce Fails every time when I try to start ZWESLSTC.
Details I just applied the PTFs to bring Zowe up to 2.6.1 from 2.2.0; created the SZWELOAD data set...but can't see where to put that into any JCL.
The first time I tried it after the upgrade it failed, and included messages about some obsolete LE options I had in my CEEOPTS data set. I commented those out (ALL31, ANYHEAP, HEAP) and tried again; still failed, but this time it didn't complain about any obsolete LE parameter specifications.
This is occurring under z/OS V2R5. Zowe V2.6.1, UO02064, UO02065
logs:
Information for enclave main
Information for thread 1449780000000000
Traceback:
DSA Entry E Offset Statement Load Mod Program Unit Service Status
1 CEEHDSP +00003FD8 CELQLIB CEEHDSP HLE77D0 Call
2 CEEOSIGJ +0000095C CELQLIB CEEOSIGJ HLE77D0 Call
3 CELQHROD +00000266 CELQLIB CELQHROD HLE77D0 Call
4 CEEOSIGG -09D5A350 CELQLIB CEEOSIGG HLE77D0 Call
5 CELQHROD +00000266 CELQLIB CELQHROD HLE77D0 Call
6 logConfigureDestination
+000001CE 436 ZWELNCH logging.c Exception
7 logConfigureStandardDestinations
+0000005E 480 ZWELNCH logging.c Call
8 main +00000120 1246 ZWELNCH main.c Call
9 CELQINIT +00001ACA CELQLIB CELQINIT HLE77D0 Call
Condition Information for Active Routines
Condition Information for (see message CEE3843I below) (DSA address 00000050082FE600)
CIB Address: 00000050082FA9C8
Current Condition:
CEE0198S The termination of a thread was signaled due to an unhandled condition.
Original Condition:
CEE3204S The system detected a protection exception (System Completion Code=0C4).
Location:
Program Unit: (see message CEE3843I below)
Entry: logConfigureDestination
Statement: 436 Offset: +000001CE
CEE3843I The program unit name is too long to be displayed. See the Fully Qualified Names section for the complete name.
Machine State:
ILC..... 0000 Interruption Code..... 0004
PSW..... 0785040180000000 0000000013CACD06
CEE3DMP V2 R5.0: Condition processing resulted in the unhandled condition. Thu Feb 16 20:07:37 2023 Page: 2 ASID: 00C0 Job ID: STC04769 Job name: ZWESLSTC Step name: ZWELNCH PID: 67109054 Parent PID: 1 User name: ZWESLSTC