wls-eng / arm-oraclelinux-wls

Microsoft Azure ARM Templates to create Oracle Linux VM with pre-installed Weblogic Server
Apache License 2.0
0 stars 7 forks source link

For Cluster & Dynamic Cluster Offers, Coherence Cache Server is not starting when Custom SSL is configured on Admin Server #275

Closed gnsuryan closed 3 years ago

gnsuryan commented 3 years ago

For Cluster & Dynamic Cluster Offers, Coherence Cache Server is not starting when Custom SSL is configured on Admin Server

<[severity-value: 16] [rid: 0] [partition-id: 0] [partition-name: DOMAIN] > ####<Feb 27, 2021 10:40:25,267 AM UTC> <<[ACTIVE] ExecuteThread: '5' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <1614422425267> <[[severity-value: 8] [rid: 0] [partition-id: 0] [partition-name: DOMAIN] > <Unable to start the server mspStorage1 : javax.net.ssl.SSLHandshakeExcept ion: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid cert ification path to requested target caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target caus ed by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target>

gnsuryan commented 3 years ago

I have fixed the issue for both Cluster & Dynamic Cluster offers and created PR.

https://github.com/wls-eng/arm-oraclelinux-wls-cluster/pull/128 https://github.com/wls-eng/arm-oraclelinux-wls-dynamic-cluster/pull/115

gnsuryan commented 3 years ago

The Coherence Cache Server fails to start when Custom SSL is configured on Admin Server and also when the default HTTP Listen Port (7001) is disabled on the Admin Server.

Exception Logs:

2021-03-17 16:11:49.037/51.529 Oracle Coherence 12.2.1.3.0 (thread=[STANDBY] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)', member=n/a): Loaded operational configuration from "jar:file:/u01/app/wls/install/oracle/middleware/oracle_home/coherence/lib/coherence.jar!/tangosol-coherence.xml" 2021-03-17 16:11:49.405/51.897 Oracle Coherence 12.2.1.3.0 (thread=[STANDBY] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)', member=n/a): Loaded operational overrides from "jar:file:/u01/app/wls/install/oracle/middleware/oracle_home/coherence/lib/coherence.jar!/tangosol-coherence-override-prod.xml" 2021-03-17 16:11:49.437/51.929 Oracle Coherence 12.2.1.3.0 (thread=[STANDBY] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)', member=n/a): Optional configuration override "/tangosol-coherence-override.xml" is not specified 2021-03-17 16:11:49.484/51.976 Oracle Coherence 12.2.1.3.0 (thread=[STANDBY] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)', member=n/a): Optional configuration override "cache-factory-config.xml" is not specified 2021-03-17 16:11:49.486/51.978 Oracle Coherence 12.2.1.3.0 (thread=[STANDBY] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)', member=n/a): Optional configuration override "cache-factory-builder-config.xml" is not specified 2021-03-17 16:11:49.504/51.996 Oracle Coherence 12.2.1.3.0 (thread=[STANDBY] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)', member=n/a): Optional configuration override "/custom-mbeans.xml" is not specified

Oracle Coherence Version 12.2.1.3.0 Build 68243 Grid Edition: Production mode Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

<Mar 17, 2021 4:11:50,254 PM UTC> <The migrator (the Administration Server for manual JTA migration policy or the Singleton Master for automatic JTA migration policy) is not available. Will skip JTA TRS failback because isStrictOwnershipCheck is [false]. This may lead to potencial TLOG corruption if the TRS of mspStorage2 has been migrated to a backup server and the backup server is accessing the TLOG of mspStorage2. More safety can be achieved by setting isStrictOwnershipCheck to [true].> <Mar 17, 2021 4:11:54,844 PM UTC> <2021-03-17 16:11:54.844/57.335 Oracle Coherence GE 12.2.1.3.0 (thread=Cluster, member=n/a): This member is configured with a compatible but different WKA list then the senior Member(Id=1, Timestamp=2021-03-17 16:05:02.013,Address=10.0.0.7:42000, MachineId=6358, Location=site:internal.cloudapp.net,machine:machine-mspVM1,process:6190,member:msp2, Role=cluster1). It is strongly recommended to use the same WKA list for all cluster members.> <Mar 17, 2021 4:12:06,245 PM UTC> <Mar 17, 2021 4:12:06,327 PM UTC> <Mar 17, 2021 4:12:06,756 PM UTC> <The Logging monitoring service timer has started to check for logged message counts every 30 seconds.> Mar 17, 2021 4:12:10 PM weblogic.wsee.WseeCoreMessages logWseeServiceHalting INFO: The Wsee Service is halting <Mar 17, 2021 4:12:10,523 PM UTC> <Mar 17, 2021 4:12:10,540 PM UTC> <Server failed. Reason:

There are 1 nested errors:

weblogic.rmi.extensions.DisconnectMonitorUnavailableException: Could not register a DisconnectListener for [null] at weblogic.rmi.extensions.DisconnectMonitorListImpl.addDisconnectListener(DisconnectMonitorListImpl.java:83) at weblogic.security.utils.AdminServerListener.startDisconnectListener(AdminServerListener.java:120) at weblogic.security.utils.AdminServerListener.startListening(AdminServerListener.java:102) at weblogic.security.utils.AdminServerListener.start(AdminServerListener.java:76) at weblogic.server.AbstractServerService.postConstruct(AbstractServerService.java:76) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.glassfish.hk2.utilities.reflection.ReflectionHelper.invoke(ReflectionHelper.java:1287) at org.jvnet.hk2.internal.ClazzCreator.postConstructMe(ClazzCreator.java:333) at org.jvnet.hk2.internal.ClazzCreator.create(ClazzCreator.java:375) at org.jvnet.hk2.internal.SystemDescriptor.create(SystemDescriptor.java:487) at org.glassfish.hk2.runlevel.internal.AsyncRunLevelContext.findOrCreate(AsyncRunLevelContext.java:305) at org.glassfish.hk2.runlevel.RunLevelContext.findOrCreate(RunLevelContext.java:85) at org.jvnet.hk2.internal.Utilities.createService(Utilities.java:2126) at org.jvnet.hk2.internal.ServiceHandleImpl.getService(ServiceHandleImpl.java:116) at org.jvnet.hk2.internal.ServiceHandleImpl.getService(ServiceHandleImpl.java:90) at org.glassfish.hk2.runlevel.internal.CurrentTaskFuture$QueueRunner.oneJob(CurrentTaskFuture.java:1237) at org.glassfish.hk2.runlevel.internal.CurrentTaskFuture$QueueRunner.run(CurrentTaskFuture.java:1168) at org.glassfish.hk2.runlevel.internal.CurrentTaskFuture$UpOneLevel.run(CurrentTaskFuture.java:786) at weblogic.work.SelfTuningWorkManagerImpl$WorkAdapterImpl.run(SelfTuningWorkManagerImpl.java:678) at weblogic.invocation.ComponentInvocationContextManager._runAs(ComponentInvocationContextManager.java:352) at weblogic.invocation.ComponentInvocationContextManager.runAs(ComponentInvocationContextManager.java:337) at weblogic.work.LivePartitionUtility.doRunWorkUnderContext(LivePartitionUtility.java:57) at weblogic.work.PartitionUtility.runWorkUnderContext(PartitionUtility.java:41) at weblogic.work.SelfTuningWorkManagerImpl.runWorkUnderContext(SelfTuningWorkManagerImpl.java:652) at weblogic.work.ExecuteThread.execute(ExecuteThread.java:420) at weblogic.work.ExecuteThread.run(ExecuteThread.java:360)

<Mar 17, 2021 4:12:10,740 PM UTC> <Mar 17, 2021 4:12:10,740 PM UTC> <Mar 17, 2021 4:12:10,977 PM UTC> <Mar 17, 2021 4:12:11,487 PM UTC> <JMX Connector Server stopped at service:jmx:iiop://mspStorageVM2:7501/jndi/weblogic.management.mbeanservers.runtime.> Stopping Derby server... Derby server stopped. <Mar 17, 2021 4:12:12 PM UTC> <The server 'mspStorage2' with process id 5420 is no longer alive; waiting for the process to die.> <Mar 17, 2021 4:12:12 PM UTC> <Mar 17, 2021 4:12:12 PM UTC> <Mar 17, 2021 4:12:12 PM UTC> <get latest startup configuration before deciding/trying to restart the server> <Mar 17, 2021 4:12:12 PM UTC> <Mar 17, 2021 4:12:12 PM UTC> <runMonitor returned, setting finished=true and notifying waiters>

gnsuryan commented 3 years ago

I have fixed and test the issue with Coherence Server startup issue. The fix is working fine. I have created PRs for both Configured cluster & Dynamic Cluster offers.

https://github.com/wls-eng/arm-oraclelinux-wls-cluster/pull/141 https://github.com/wls-eng/arm-oraclelinux-wls-dynamic-cluster/pull/129

gnsuryan commented 3 years ago

closing this issue as it is now resolved.