wls-eng / arm-oraclelinux-wls

Microsoft Azure ARM Templates to create Oracle Linux VM with pre-installed Weblogic Server
Apache License 2.0
0 stars 7 forks source link

Intermittent issue: On Coherence Server VM Restart, Admin Server VM is not able to reach the Coherence Server VM and therefore shows the Coherence Server in SHUTDOWN State even when the Coherence server is in RUNNING state #304

Open gnsuryan opened 3 years ago

gnsuryan commented 3 years ago

Intermittent issue: On Coherence Server VM Restart, Admin Server VM is not able to reach the Coherence Server VM and therefore shows the Coherence Server in SHUTDOWN State even when the Coherence server is in RUNNING state

image

Server Logs:

<Mar 22, 2021 9:25:17,341 AM UTC> <Mar 22, 2021 9:25:19,759 AM UTC> <Loading trusted certificates from the PKCS12 keystore file /u01/domains/clusterDomain/keystores/trust.keystore.> <Mar 22, 2021 9:25:20,783 AM UTC> <The LDAP authentication provider named "AzureActiveDirectoryProvider" failed to make a connection to LDAP server at ldaps://ldaps.wls-security.com:636, the error cause is: ldaps.wls-security.com: Name or service not known.> <Mar 22, 2021 9:25:21,234 AM UTC> <Mar 22, 2021 9:25:30,036 AM UTC> <Mar 22, 2021 9:25:31,814 AM UTC> <JMX Connector Server started at service:jmx:iiop://mspStorageVM2:7501/jndi/weblogic.management.mbeanservers.runtime.> <Mar 22, 2021 9:25:32,656 AM UTC> <Loading the identity certificate and private key stored under the alias servercert from the PKCS12 keystore file /u01/domains/clusterDomain/keystores/identity.keystore.> 2021-03-22 09:25:36.887/51.706 Oracle Coherence 12.2.1.3.0 (thread=[STANDBY] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)', member=n/a): Loaded operational configuration from "jar:file:/u01/app/wls/install/oracle/middleware/oracle_home/coherence/lib/coherence.jar!/tangosol-coherence.xml" 2021-03-22 09:25:37.273/52.091 Oracle Coherence 12.2.1.3.0 (thread=[STANDBY] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)', member=n/a): Loaded operational overrides from "jar:file:/u01/app/wls/install/oracle/middleware/oracle_home/coherence/lib/coherence.jar!/tangosol-coherence-override-prod.xml" 2021-03-22 09:25:37.283/52.102 Oracle Coherence 12.2.1.3.0 (thread=[STANDBY] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)', member=n/a): Optional configuration override "/tangosol-coherence-override.xml" is not specified 2021-03-22 09:25:37.340/52.159 Oracle Coherence 12.2.1.3.0 (thread=[STANDBY] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)', member=n/a): Optional configuration override "cache-factory-config.xml" is not specified 2021-03-22 09:25:37.360/52.179 Oracle Coherence 12.2.1.3.0 (thread=[STANDBY] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)', member=n/a): Optional configuration override "cache-factory-builder-config.xml" is not specified 2021-03-22 09:25:37.378/52.197 Oracle Coherence 12.2.1.3.0 (thread=[STANDBY] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)', member=n/a): Optional configuration override "/custom-mbeans.xml" is not specified

Oracle Coherence Version 12.2.1.3.0 Build 68243 Grid Edition: Production mode Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

<Mar 22, 2021 9:25:45,399 AM UTC> <Failed to send JVMMessage from: '-23294638395981347S:mspStorageVM2:clusterDomain:mspStorage2'to: '7483463705868452771S:10.0.0.5:[-1,-1,7002,-1,-1,-1,-1]:clusterDomain:admin' cmd: 'CMD_REQUEST', QOS: '101', responseId: '1', invokableId: '9', flags: 'JVMIDs Not Sent, TX Context Not Sent, 0x18', abbrev offset: '835'. java.io.IOException: Attempt to send message on closed socket java.io.IOException: Attempt to send message on closed socket at weblogic.rjvm.t3.MuxableSocketT3$T3MsgAbbrevJVMConnection.sendMsg(MuxableSocketT3.java:852) at weblogic.rjvm.MsgAbbrevJVMConnection.sendOutMsg(MsgAbbrevJVMConnection.java:390) at weblogic.rjvm.MsgAbbrevJVMConnection.sendMsg(MsgAbbrevJVMConnection.java:224) at weblogic.rjvm.MsgAbbrevJVMConnection.sendMsg(MsgAbbrevJVMConnection.java:174) at weblogic.rjvm.ConnectionManager.sendMsg(ConnectionManager.java:682) Truncated. see log file for complete stacktrace

<Mar 22, 2021 9:25:45,431 AM UTC> <The migrator (the Administration Server for manual JTA migration policy or the Singleton Master for automatic JTA migration policy) is not available. Will skip JTA TRS failback because isStrictOwnershipCheck is [false]. This may lead to potencial TLOG corruption if the TRS of mspStorage2 has been migrated to a backup server and the backup server is accessing the TLOG of mspStorage2. More safety can be achieved by setting isStrictOwnershipCheck to [true].> <Mar 22, 2021 9:25:53,540 AM UTC> <Mar 22, 2021 9:25:53,640 AM UTC> <Mar 22, 2021 9:25:54,415 AM UTC> <The Logging monitoring service timer has started to check for logged message counts every 30 seconds.> <Mar 22, 2021 9:25:58,630 AM UTC> <Mar 22, 2021 9:26:02,157 AM UTC> <Mar 22, 2021 9:26:03,141 AM UTC> <Mar 22, 2021 9:26:04,225 AM UTC> <Mar 22, 2021 9:26:04,591 AM UTC> <Starting "async" replication service with remote cluster address "null"> <Mar 22, 2021 9:26:05,569 AM UTC> <Channel "Default" is now listening on 10.0.0.8:7501 for protocols iiop, t3, CLUSTER-BROADCAST, ldap, snmp, http.> <Mar 22, 2021 9:26:05,570 AM UTC> <Started the WebLogic Server Managed Server "mspStorage2" for domain "clusterDomain" running in production mode.> <Mar 22, 2021 9:26:05,571 AM UTC> <Channel "Default" is now listening on 10.0.0.8:7501 for protocols iiop, t3, CLUSTER-BROADCAST, ldap, snmp, http.> <Mar 22, 2021 9:26:06,221 AM UTC> <Mar 22, 2021 9:26:06,695 AM UTC> <Mar 22, 2021 9:29:38,786 AM UTC> <Failed to send JVMMessage from: '-23294638395981347S:mspStorageVM2:clusterDomain:mspStorage2'to: '7483463705868452771S:10.0.0.5:[-1,-1,7002,-1,-1,-1,-1]:clusterDomain:admin' cmd: 'CMD_REQUEST', QOS: '101', responseId: '1', invokableId: '34', flags: 'JVMIDs Not Sent, TX Context Not Sent, 0x8', abbrev offset: '22'. java.io.IOException: Attempt to send message on closed socket java.io.IOException: Attempt to send message on closed socket at weblogic.rjvm.t3.MuxableSocketT3$T3MsgAbbrevJVMConnection.sendMsg(MuxableSocketT3.java:852) at weblogic.rjvm.MsgAbbrevJVMConnection.sendOutMsg(MsgAbbrevJVMConnection.java:390) at weblogic.rjvm.MsgAbbrevJVMConnection.sendMsg(MsgAbbrevJVMConnection.java:224) at weblogic.rjvm.MsgAbbrevJVMConnection.sendMsg(MsgAbbrevJVMConnection.java:174) at weblogic.rjvm.ConnectionManager.sendMsg(ConnectionManager.java:682) Truncated. see log file for complete stacktrace

Coherence Server VM2 which was restarted is not reachable, but Coherence Server VM1 is reachable which was not restarted.

[root@adminVM weblogic]# ping mspStorageVM2 ping: mspStorageVM2: Name or service not known

[root@adminVM weblogic]# ping mspStorageVm1 PING mspStorageVm1.i4ye0k4z5rpuleyw5pjpkibqdc.bx.internal.cloudapp.net (10.0.0.9) 56(84) bytes of data. 64 bytes from mspstoragevm1.internal.cloudapp.net (10.0.0.9): icmp_seq=1 ttl=64 time=0.987 ms 64 bytes from mspstoragevm1.internal.cloudapp.net (10.0.0.9): icmp_seq=2 ttl=64 time=0.945 ms

This issue goes away, when Admin VM is restarted.

After admin VM restart:

[oracle@adminvm weblogic]$ ping mspStorageVM2 PING mspStorageVM2.internal.cloudapp.net (10.0.0.8) 56(84) bytes of data. 64 bytes from mspstoragevm2.internal.cloudapp.net (10.0.0.8): icmp_seq=1 ttl=64 time=1.14 ms 64 bytes from mspstoragevm2.internal.cloudapp.net (10.0.0.8): icmp_seq=2 ttl=64 time=1.17 ms 64 bytes from mspstoragevm2.internal.cloudapp.net (10.0.0.8): icmp_seq=3 ttl=64 time=3.06 ms

gnsuryan commented 3 years ago

Nodemanager Status on Coherence Server VM2 once the Admin VM is restarted.

image