Closed hprop closed 3 years ago
@hprop
Please provide following logs: login to racnode2 sudo /bin/bash cd /u01/app/grid tar -cvzf racnode2_gridlogs.tgz * upload the files
Also, paste the output of following systemctl status | grep running
Thanks for your quick reply @psaini79.
And systemctl status from racnode2:
$ systemctl status|grep running
systemctl status|grep running
State: running
├─18234 grep --color=auto running
This is the full systemctl status
output for racnode2 and the docker host, just in case:
@hprop
It seems racnode1 was not reachable from racnode2 and it failed. Please provide following: docker exec -i -t racnode1 /bin/bash ping racnode2 ssh racnode2
cat /etc/hosts
Try the same thing from racnode2 to racnode1. Also, paste the output of following from racnode1: crsctl check cluster crsctl check crs olsnodes
Thanks again for your help @psaini79.
From racnode1:
[grid@racnode1 ~]$ ping racnode2
PING racnode2.example.com (172.16.1.151) 56(84) bytes of data.
64 bytes from racnode2.example.com (172.16.1.151): icmp_seq=1 ttl=64 time=0.077 ms
64 bytes from racnode2.example.com (172.16.1.151): icmp_seq=2 ttl=64 time=0.066 ms
64 bytes from racnode2.example.com (172.16.1.151): icmp_seq=3 ttl=64 time=0.060 ms
64 bytes from racnode2.example.com (172.16.1.151): icmp_seq=4 ttl=64 time=0.063 ms
^C
--- racnode2.example.com ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3075ms
rtt min/avg/max/mdev = 0.060/0.066/0.077/0.010 ms
[grid@racnode1 ~]$ ssh racnode2
Last login: Wed Apr 22 17:54:21 2020
[grid@racnode2 ~]$ hostname
racnode2
[grid@racnode1 ~]$ cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
172.16.1.150 racnode1.example.com racnode1
192.168.17.150 racnode1-priv.example.com racnode1-priv
172.16.1.160 racnode1-vip.example.com racnode1-vip
172.16.1.70 racnode-scan.example.com racnode-scan
172.16.1.15 racnode-cman1.example.com racnode-cman1
172.16.1.151 racnode2.example.com racnode2
192.168.17.151 racnode2-priv.example.com racnode2-priv
172.16.1.161 racnode2-vip.example.com racnode2-vip
[grid@racnode1 ~]$ crsctl check cluster
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
[grid@racnode1 ~]$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
[grid@racnode1 ~]$ olsnodes
racnode1
racnode2
From racnode2:
[grid@racnode2 ~]$ ping racnode1
PING racnode1.example.com (172.16.1.150) 56(84) bytes of data.
64 bytes from racnode1.example.com (172.16.1.150): icmp_seq=1 ttl=64 time=0.066 ms
64 bytes from racnode1.example.com (172.16.1.150): icmp_seq=2 ttl=64 time=0.080 ms
64 bytes from racnode1.example.com (172.16.1.150): icmp_seq=3 ttl=64 time=0.060 ms
64 bytes from racnode1.example.com (172.16.1.150): icmp_seq=4 ttl=64 time=0.046 ms
^C
--- racnode1.example.com ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3090ms
rtt min/avg/max/mdev = 0.046/0.063/0.080/0.012 ms
[grid@racnode2 ~]$ ssh racnode1
Last login: Mon Apr 27 09:13:42 2020
[grid@racnode1 ~]$ hostname
racnode1
[grid@racnode2 ~]$ cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
172.16.1.150 racnode1.example.com racnode1
192.168.17.150 racnode1-priv.example.com racnode1-priv
172.16.1.160 racnode1-vip.example.com racnode1-vip
172.16.1.70 racnode-scan.example.com racnode-scan
172.16.1.15 racnode-cman1.example.com racnode-cman1
172.16.1.151 racnode2.example.com racnode2
192.168.17.151 racnode2-priv.example.com racnode2-priv
172.16.1.161 racnode2-vip.example.com racnode2-vip
@hprop
I looked at the logs and it seems racnode2 is unable to communicate with racnode1 on private inter connect. I found following errors:
gipcd.trc
2020-04-23 01:18:06.383 :GIPCHALO:3033978624: gipchaLowerProcessNode: no valid interfaces found to node for 4294967286 ms, node 0x7f138c2a63f0 { host 'racnode1', haName 'gipcd_ha_name', srcLuid 3719e9da-39e601dd, dstLuid 60fc3aaa-2bfe43fc numInf 1, sentRegister 1, localMonitor 1, baseStream 0x7f138c2a0fe0 type gipchaNodeType12001 (20), nodeIncarnation 3f82d948-042c614a, incarnation 0, cssIncarnation 1, negDigest 4294967295, roundTripTime 368 lastSeenPingAck 885 nextPingId 887 latencySrc 293 latencyDst 75 flags 0x860080c}
cssd.log
clssnmvDHBValidateNCopy: node 1, racnode1, has a disk HB, but no network HB, DHB has rcfg 483023861, wrtcnt, 20868, LATS 90856264, lastSeqNo 20867, uniqueness 1587557846, timestamp 1587578679/90855434
Please check following and provide the details:
Docker Host
systemctl status firewalld
getenforce
Login to racnode1 and racnode2 and paste the output for following from container:
ifconfig
ping -I eth0 192.168.17.151
ping -S 192.168.17.150 192.168.17.151
Note: Make sure 192.168.17.0/24 subnet is on eth0 if not change the network card for ping.
Login to racnode2 and paste the output for following from the container:
ifconfig
ping -I eth0 192.168.17.150
ping -S 192.168.17.151 192.168.17.150
Note: Make sure 192.168.17.0/24 subnet is on eth0 if not change the network card for ping based on that subnet.
@psaini79 please find below the requested info:
Docker host:
$ systemctl status firewalld
Unit firewalld.service could not be found.
$ getenforce
Permissive
From racnode1:
[grid@racnode1 ~]$ ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.17.150 netmask 255.255.255.0 broadcast 192.168.17.255
ether 02:42:c0:a8:11:96 txqueuelen 0 (Ethernet)
RX packets 424002 bytes 59752360 (56.9 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 424341 bytes 59885802 (57.1 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth0:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 169.254.7.39 netmask 255.255.224.0 broadcast 169.254.31.255
ether 02:42:c0:a8:11:96 txqueuelen 0 (Ethernet)
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.16.1.150 netmask 255.255.255.0 broadcast 172.16.1.255
ether 02:42:ac:10:01:96 txqueuelen 0 (Ethernet)
RX packets 217038 bytes 57680347 (55.0 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 235780 bytes 102696763 (97.9 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth1:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.16.1.160 netmask 255.255.255.0 broadcast 172.16.1.255
ether 02:42:ac:10:01:96 txqueuelen 0 (Ethernet)
eth1:2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.16.1.70 netmask 255.255.255.0 broadcast 172.16.1.255
ether 02:42:ac:10:01:96 txqueuelen 0 (Ethernet)
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
loop txqueuelen 1000 (Local Loopback)
RX packets 7462241 bytes 24503423536 (22.8 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 7462241 bytes 24503423536 (22.8 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[grid@racnode1 ~]$ ping -I eth0 192.168.17.151
PING 192.168.17.151 (192.168.17.151) from 192.168.17.150 eth0: 56(84) bytes of data.
64 bytes from 192.168.17.151: icmp_seq=1 ttl=64 time=0.054 ms
64 bytes from 192.168.17.151: icmp_seq=2 ttl=64 time=0.045 ms
64 bytes from 192.168.17.151: icmp_seq=3 ttl=64 time=0.056 ms
64 bytes from 192.168.17.151: icmp_seq=4 ttl=64 time=0.061 ms
64 bytes from 192.168.17.151: icmp_seq=5 ttl=64 time=0.057 ms
^C
--- 192.168.17.151 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4092ms
rtt min/avg/max/mdev = 0.045/0.054/0.061/0.009 ms
[grid@racnode1 ~]$ ping -S 192.168.17.150 192.168.17.151
PING 192.168.17.151 (192.168.17.151) 56(84) bytes of data.
64 bytes from 192.168.17.151: icmp_seq=1 ttl=64 time=0.070 ms
64 bytes from 192.168.17.151: icmp_seq=2 ttl=64 time=0.057 ms
64 bytes from 192.168.17.151: icmp_seq=3 ttl=64 time=0.046 ms
64 bytes from 192.168.17.151: icmp_seq=4 ttl=64 time=0.045 ms
64 bytes from 192.168.17.151: icmp_seq=5 ttl=64 time=0.080 ms
64 bytes from 192.168.17.151: icmp_seq=6 ttl=64 time=0.055 ms
^C
--- 192.168.17.151 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5146ms
rtt min/avg/max/mdev = 0.045/0.058/0.080/0.016 ms
From racnode2:
[grid@racnode2 ~]$ ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.17.151 netmask 255.255.255.0 broadcast 192.168.17.255
ether 02:42:c0:a8:11:97 txqueuelen 0 (Ethernet)
RX packets 403234 bytes 56877333 (54.2 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 402794 bytes 56647857 (54.0 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.16.1.151 netmask 255.255.255.0 broadcast 172.16.1.255
ether 02:42:ac:10:01:97 txqueuelen 0 (Ethernet)
RX packets 216643 bytes 68650401 (65.4 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 198824 bytes 53836522 (51.3 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
loop txqueuelen 1000 (Local Loopback)
RX packets 1571788 bytes 250479733 (238.8 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1571788 bytes 250479733 (238.8 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[grid@racnode2 ~]$ ping -I eth0 192.168.17.150
PING 192.168.17.150 (192.168.17.150) from 192.168.17.151 eth0: 56(84) bytes of data.
64 bytes from 192.168.17.150: icmp_seq=1 ttl=64 time=0.060 ms
64 bytes from 192.168.17.150: icmp_seq=2 ttl=64 time=0.065 ms
64 bytes from 192.168.17.150: icmp_seq=3 ttl=64 time=0.079 ms
64 bytes from 192.168.17.150: icmp_seq=4 ttl=64 time=0.058 ms
64 bytes from 192.168.17.150: icmp_seq=5 ttl=64 time=0.059 ms
^C
--- 192.168.17.150 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4111ms
rtt min/avg/max/mdev = 0.058/0.064/0.079/0.009 ms
[grid@racnode2 ~]$ ping -S 192.168.17.151 192.168.17.150
PING 192.168.17.150 (192.168.17.150) 56(84) bytes of data.
64 bytes from 192.168.17.150: icmp_seq=1 ttl=64 time=0.067 ms
64 bytes from 192.168.17.150: icmp_seq=2 ttl=64 time=0.072 ms
64 bytes from 192.168.17.150: icmp_seq=3 ttl=64 time=0.051 ms
64 bytes from 192.168.17.150: icmp_seq=4 ttl=64 time=0.067 ms
64 bytes from 192.168.17.150: icmp_seq=5 ttl=64 time=0.052 ms
^C
--- 192.168.17.150 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4104ms
rtt min/avg/max/mdev = 0.051/0.061/0.072/0.013 ms
Thanks!
@hprop
Your network setup seems to be right but not sure why network heart beat error is reported during node addition.
Please do following: Login to racnode1 and racnode2, capture following and paste output:
route -n
Login to racnode1
$GRID_HOME/bin/crsctl stat res -t
sudo /bin/bash
$GRID_HOME/bin/crsctl stop crs -f
$GRID_HOME/bin/crsctl start crs
Also, re-run root.sh from racnode2 as root user:
$GRID_HOME/root.sh
Let me know if it is still failing with the network heartbeat error.
Since you are trying to bring up 2 node RAC, did you try to bring up RAC on docker/container using response file? Please check following:
I am trying to understand if the error is only during Addnode or is it coming for 2 node RAC setup using responsefile in your environment.
Thanks @psaini79, please find below the info.
Route tables for racnode1 and racnode2:
[grid@racnode1 ~]$ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.17.1 0.0.0.0 UG 0 0 0 eth0
169.254.0.0 0.0.0.0 255.255.224.0 U 0 0 0 eth0
172.16.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
192.168.17.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
[grid@racnode2 ~]$ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.17.1 0.0.0.0 UG 0 0 0 eth0
172.16.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
192.168.17.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
Clusterware resources info from racnode1:
[grid@racnode1 ~]$ $GRID_HOME/bin/crsctl stat res -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
ONLINE ONLINE racnode1 STABLE
ora.chad
ONLINE ONLINE racnode1 STABLE
ora.net1.network
ONLINE ONLINE racnode1 STABLE
ora.ons
ONLINE ONLINE racnode1 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr(ora.asmgroup)
1 ONLINE ONLINE racnode1 STABLE
2 OFFLINE OFFLINE STABLE
ora.DATA.dg(ora.asmgroup)
1 ONLINE ONLINE racnode1 STABLE
2 OFFLINE OFFLINE STABLE
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE racnode1 STABLE
ora.asm(ora.asmgroup)
1 ONLINE ONLINE racnode1 Started,STABLE
2 OFFLINE OFFLINE STABLE
ora.asmnet1.asmnetwork(ora.asmgroup)
1 ONLINE ONLINE racnode1 STABLE
2 OFFLINE OFFLINE STABLE
ora.cvu
1 ONLINE ONLINE racnode1 STABLE
ora.orclcdb.db
1 ONLINE ONLINE racnode1 Open,HOME=/u01/app/o
racle/product/19.3.0
/dbhome_1,STABLE
ora.qosmserver
1 ONLINE ONLINE racnode1 STABLE
ora.racnode1.vip
1 ONLINE ONLINE racnode1 STABLE
ora.scan1.vip
1 ONLINE ONLINE racnode1 STABLE
--------------------------------------------------------------------------------
bash-4.2# /u01/app/19.3.0/grid/bin/crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'racnode1'
CRS-2673: Attempting to stop 'ora.crsd' on 'racnode1'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on server 'racnode1'
CRS-2673: Attempting to stop 'ora.qosmserver' on 'racnode1'
CRS-2673: Attempting to stop 'ora.chad' on 'racnode1'
CRS-2673: Attempting to stop 'ora.orclcdb.db' on 'racnode1'
CRS-2677: Stop of 'ora.qosmserver' on 'racnode1' succeeded
CRS-2677: Stop of 'ora.orclcdb.db' on 'racnode1' succeeded
CRS-33673: Attempting to stop resource group 'ora.asmgroup' on server 'racnode1'
CRS-2673: Attempting to stop 'ora.DATA.dg' on 'racnode1'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'racnode1'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'racnode1'
CRS-2673: Attempting to stop 'ora.cvu' on 'racnode1'
CRS-2677: Stop of 'ora.DATA.dg' on 'racnode1' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'racnode1'
CRS-2677: Stop of 'ora.cvu' on 'racnode1' succeeded
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'racnode1' succeeded
CRS-2673: Attempting to stop 'ora.racnode1.vip' on 'racnode1'
CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'racnode1' succeeded
CRS-2673: Attempting to stop 'ora.scan1.vip' on 'racnode1'
CRS-2677: Stop of 'ora.racnode1.vip' on 'racnode1' succeeded
CRS-2677: Stop of 'ora.scan1.vip' on 'racnode1' succeeded
CRS-2677: Stop of 'ora.asm' on 'racnode1' succeeded
CRS-2673: Attempting to stop 'ora.ASMNET1LSNR_ASM.lsnr' on 'racnode1'
CRS-2677: Stop of 'ora.chad' on 'racnode1' succeeded
CRS-2677: Stop of 'ora.ASMNET1LSNR_ASM.lsnr' on 'racnode1' succeeded
CRS-2673: Attempting to stop 'ora.asmnet1.asmnetwork' on 'racnode1'
CRS-2677: Stop of 'ora.asmnet1.asmnetwork' on 'racnode1' succeeded
CRS-33677: Stop of resource group 'ora.asmgroup' on server 'racnode1' succeeded.
CRS-2673: Attempting to stop 'ora.ons' on 'racnode1'
CRS-2677: Stop of 'ora.ons' on 'racnode1' succeeded
CRS-2673: Attempting to stop 'ora.net1.network' on 'racnode1'
CRS-2677: Stop of 'ora.net1.network' on 'racnode1' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'racnode1' has completed
CRS-2677: Stop of 'ora.crsd' on 'racnode1' succeeded
CRS-2673: Attempting to stop 'ora.storage' on 'racnode1'
CRS-2673: Attempting to stop 'ora.crf' on 'racnode1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'racnode1'
CRS-2677: Stop of 'ora.crf' on 'racnode1' succeeded
CRS-2677: Stop of 'ora.storage' on 'racnode1' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'racnode1'
CRS-2677: Stop of 'ora.mdnsd' on 'racnode1' succeeded
CRS-2677: Stop of 'ora.asm' on 'racnode1' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'racnode1'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'racnode1' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'racnode1'
CRS-2673: Attempting to stop 'ora.evmd' on 'racnode1'
CRS-2677: Stop of 'ora.ctssd' on 'racnode1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'racnode1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'racnode1'
CRS-2677: Stop of 'ora.cssd' on 'racnode1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'racnode1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'racnode1'
CRS-2677: Stop of 'ora.gipcd' on 'racnode1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'racnode1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'racnode1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
bash-4.2# /u01/app/19.3.0/grid/bin/crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
Re-running root.sh from racnode2:
[grid@racnode2 ~]$ sudo /bin/bash
bash-4.2# /u01/app/19.3.0/grid/root.sh
Check /u01/app/19.3.0/grid/install/root_racnode2_2020-05-01_09-48-54-942804239.log for the output of root script
[grid@racnode2 ~]$ cat /u01/app/19.3.0/grid/install/root_racnode2_2020-05-01_09-48-54-942804239.log
Performing root user operation.
The following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME= /u01/app/19.3.0/grid
Copying dbhome to /usr/local/bin ...
Copying oraenv to /usr/local/bin ...
Copying coraenv to /usr/local/bin ...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Relinking oracle with rac_on option
Using configuration parameter file: /u01/app/19.3.0/grid/crs/install/crsconfig_params
The log of current session can be found at:
/u01/app/grid/crsdata/racnode2/crsconfig/rootcrs_racnode2_2020-05-01_09-48-55AM.log
2020/05/01 09:49:00 CLSRSC-594: Executing installation step 1 of 19: 'SetupTFA'.
2020/05/01 09:49:00 CLSRSC-594: Executing installation step 2 of 19: 'ValidateEnv'.
2020/05/01 09:49:01 CLSRSC-363: User ignored prerequisites during installation
2020/05/01 09:49:01 CLSRSC-594: Executing installation step 3 of 19: 'CheckFirstNode'.
2020/05/01 09:49:01 CLSRSC-4002: Successfully installed Oracle Trace File Analyzer (TFA) Collector.
2020/05/01 09:49:01 CLSRSC-594: Executing installation step 4 of 19: 'GenSiteGUIDs'.
2020/05/01 09:49:02 CLSRSC-594: Executing installation step 5 of 19: 'SetupOSD'.
2020/05/01 09:49:02 CLSRSC-594: Executing installation step 6 of 19: 'CheckCRSConfig'.
2020/05/01 09:49:03 CLSRSC-594: Executing installation step 7 of 19: 'SetupLocalGPNP'.
2020/05/01 09:49:04 CLSRSC-594: Executing installation step 8 of 19: 'CreateRootCert'.
2020/05/01 09:49:04 CLSRSC-594: Executing installation step 9 of 19: 'ConfigOLR'.
2020/05/01 09:49:05 CLSRSC-594: Executing installation step 10 of 19: 'ConfigCHMOS'.
2020/05/01 09:49:36 CLSRSC-594: Executing installation step 11 of 19: 'CreateOHASD'.
2020/05/01 09:49:37 CLSRSC-594: Executing installation step 12 of 19: 'ConfigOHASD'.
2020/05/01 09:49:40 CLSRSC-594: Executing installation step 13 of 19: 'InstallAFD'.
2020/05/01 09:49:41 CLSRSC-594: Executing installation step 14 of 19: 'InstallACFS'.
2020/05/01 09:49:43 CLSRSC-594: Executing installation step 15 of 19: 'InstallKA'.
2020/05/01 09:49:44 CLSRSC-594: Executing installation step 16 of 19: 'InitConfig'.
2020/05/01 09:49:49 CLSRSC-594: Executing installation step 17 of 19: 'StartCluster'.
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-2672: Attempting to start 'ora.mdnsd' on 'racnode2'
CRS-2672: Attempting to start 'ora.evmd' on 'racnode2'
CRS-2676: Start of 'ora.mdnsd' on 'racnode2' succeeded
CRS-2676: Start of 'ora.evmd' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'racnode2'
CRS-2676: Start of 'ora.gpnpd' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'racnode2'
CRS-2676: Start of 'ora.gipcd' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.crf' on 'racnode2'
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'racnode2'
CRS-2676: Start of 'ora.cssdmonitor' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'racnode2'
CRS-2672: Attempting to start 'ora.diskmon' on 'racnode2'
CRS-2676: Start of 'ora.diskmon' on 'racnode2' succeeded
CRS-2676: Start of 'ora.crf' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'racnode2'
CRS-2676: Start of 'ora.cssdmonitor' on 'racnode2' succeeded
CRS-1722: Cluster Synchronization Service daemon encountered an internal error.
CRS-2883: Resource 'ora.cssd' failed during Clusterware stack start.
CRS-4406: Oracle High Availability Services synchronous start failed.
CRS-41053: checking Oracle Grid Infrastructure for file permission issues
PRVH-0116 : Path "/u01/app/19.3.0/grid/crs/install/cmdllroot.sh" with permissions "rw-r--r--" does not have execute permissions for the owner, file's group, and others on node "racnode2".
PRVG-2031 : Owner of file "/u01/app/19.3.0/grid/crs/install/cmdllroot.sh" did not match the expected value on node "racnode2". [Expected = "grid(54332)" ; Found = "root(0)"]
PRVG-2032 : Group of file "/u01/app/19.3.0/grid/crs/install/cmdllroot.sh" did not match the expected value on node "racnode2". [Expected = "oinstall(54321)" ; Found = "root(0)"]
CRS-4000: Command Start failed, or completed with errors.
2020/05/01 10:00:21 CLSRSC-117: Failed to start Oracle Clusterware stack
Died at /u01/app/19.3.0/grid/crs/install/crsinstall.pm line 1970.
Trying to set the correct permissions for cmdllroot.sh and re-running root.sh, but it died at the same point:
[grid@racnode2 ~]$ ls -l /u01/app/19.3.0/grid/crs/install/cmdllroot.sh
-rw-r--r--. 1 root root 1276 Apr 22 13:11 /u01/app/19.3.0/grid/crs/install/cmdllroot.sh
[grid@racnode2 ~]$ sudo /bin/bash
bash-4.2# chmod 755 /u01/app/19.3.0/grid/crs/install/cmdllroot.sh
bash-4.2# chown grid:oinstall /u01/app/19.3.0/grid/crs/install/cmdllroot.sh
bash-4.2# ls -l /u01/app/19.3.0/grid/crs/install/cmdllroot.sh
-rwxr-xr-x. 1 grid oinstall 1276 Apr 22 13:11 /u01/app/19.3.0/grid/crs/install/cmdllroot.sh
bash-4.2# /u01/app/19.3.0/grid/root.sh
Check /u01/app/19.3.0/grid/install/root_racnode2_2020-05-01_10-22-40-969725920.log for the output of root script
bash-4.2# cat /u01/app/19.3.0/grid/install/root_racnode2_2020-05-01_10-22-40-969725920.log
Performing root user operation.
The following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME= /u01/app/19.3.0/grid
Copying dbhome to /usr/local/bin ...
Copying oraenv to /usr/local/bin ...
Copying coraenv to /usr/local/bin ...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Relinking oracle with rac_on option
Using configuration parameter file: /u01/app/19.3.0/grid/crs/install/crsconfig_params
The log of current session can be found at:
/u01/app/grid/crsdata/racnode2/crsconfig/rootcrs_racnode2_2020-05-01_10-22-41AM.log
2020/05/01 10:22:45 CLSRSC-594: Executing installation step 1 of 19: 'SetupTFA'.
2020/05/01 10:22:46 CLSRSC-594: Executing installation step 2 of 19: 'ValidateEnv'.
2020/05/01 10:22:46 CLSRSC-363: User ignored prerequisites during installation
2020/05/01 10:22:46 CLSRSC-594: Executing installation step 3 of 19: 'CheckFirstNode'.
2020/05/01 10:22:46 CLSRSC-4002: Successfully installed Oracle Trace File Analyzer (TFA) Collector.
2020/05/01 10:22:46 CLSRSC-594: Executing installation step 4 of 19: 'GenSiteGUIDs'.
2020/05/01 10:22:47 CLSRSC-594: Executing installation step 5 of 19: 'SetupOSD'.
2020/05/01 10:22:47 CLSRSC-594: Executing installation step 6 of 19: 'CheckCRSConfig'.
2020/05/01 10:22:48 CLSRSC-594: Executing installation step 7 of 19: 'SetupLocalGPNP'.
2020/05/01 10:22:49 CLSRSC-594: Executing installation step 8 of 19: 'CreateRootCert'.
2020/05/01 10:22:49 CLSRSC-594: Executing installation step 9 of 19: 'ConfigOLR'.
2020/05/01 10:22:50 CLSRSC-594: Executing installation step 10 of 19: 'ConfigCHMOS'.
2020/05/01 10:23:21 CLSRSC-594: Executing installation step 11 of 19: 'CreateOHASD'.
2020/05/01 10:23:22 CLSRSC-594: Executing installation step 12 of 19: 'ConfigOHASD'.
2020/05/01 10:23:25 CLSRSC-594: Executing installation step 13 of 19: 'InstallAFD'.
2020/05/01 10:23:26 CLSRSC-594: Executing installation step 14 of 19: 'InstallACFS'.
2020/05/01 10:23:27 CLSRSC-594: Executing installation step 15 of 19: 'InstallKA'.
2020/05/01 10:23:28 CLSRSC-594: Executing installation step 16 of 19: 'InitConfig'.
2020/05/01 10:23:32 CLSRSC-594: Executing installation step 17 of 19: 'StartCluster'.
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-2672: Attempting to start 'ora.evmd' on 'racnode2'
CRS-2672: Attempting to start 'ora.mdnsd' on 'racnode2'
CRS-2676: Start of 'ora.mdnsd' on 'racnode2' succeeded
CRS-2676: Start of 'ora.evmd' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'racnode2'
CRS-2676: Start of 'ora.gpnpd' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'racnode2'
CRS-2676: Start of 'ora.gipcd' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.crf' on 'racnode2'
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'racnode2'
CRS-2676: Start of 'ora.cssdmonitor' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'racnode2'
CRS-2672: Attempting to start 'ora.diskmon' on 'racnode2'
CRS-2676: Start of 'ora.diskmon' on 'racnode2' succeeded
CRS-2676: Start of 'ora.crf' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'racnode2'
CRS-2676: Start of 'ora.cssdmonitor' on 'racnode2' succeeded
CRS-1722: Cluster Synchronization Service daemon encountered an internal error.
CRS-2883: Resource 'ora.cssd' failed during Clusterware stack start.
CRS-4406: Oracle High Availability Services synchronous start failed.
CRS-41053: checking Oracle Grid Infrastructure for file permission issues
CRS-4000: Command Start failed, or completed with errors.
2020/05/01 10:34:04 CLSRSC-117: Failed to start Oracle Clusterware stack
Died at /u01/app/19.3.0/grid/crs/install/crsinstall.pm line 1970.
Fragment of the offending script:
bash-4.2# nl -ba /u01/app/19.3.0/grid/crs/install/crsinstall.pm | sed -n '1949,1974p'
1949 sub start_cluster
1950 {
1951 trace(sprintf("Startup level is %d", $CFG->stackStartLevel));
1952
1953 # start the entire stack in shiphome
1954 if (START_STACK_ALL == $CFG->stackStartLevel)
1955 {
1956 trace("Attempt to start the whole CRS stack");
1957 my $rc = startHasStack($CFG->params('ORACLE_HOME'));
1958
1959 if (WARNING == $rc)
1960 {
1961 # maximum number of hub nodes reached, try this as a rim node.
1962 my $role = NODE_ROLE_RIM;
1963 setNodeRole($role);
1964 stopFullStack("force") || die(dieformat(349));
1965 $rc = startHasStack($CFG->params('ORACLE_HOME'), $role);
1966 }
1967
1968 if ( SUCCESS != $rc )
1969 {
1970 die(dieformat(117));
1971 }
1972
1973 print_info(343);
1974
I have to take a look at the 2-nodes RAC example you pointed out before. Also, any guidance to continue the diagnose above is really appreciated.
Thanks!
@hprop
Sure, I will assist you. Can you please paste the the logs of recent failure:
login to racnode2 sudo /bin/bash cd /u01/app/grid tar -cvzf racnode2_gridlogs.tgz * upload the files
Also, upload the logs of racnode1 as well. Also, from the docker host, please provide output of following : route -n
Note route -n command need to be run on docker host.
@hprop
Any update on this. Also, before doing above tasks (if you have not done), please try following: Check if iptables are up on your machine (docker host) if yes then execute following steps:
systemtctl stop iptables
systemctl disable iptables
Login to racnode1 container
sudo crsctl stop crs -f
Logi to racnode2 container
sudo crsctl stop crs -f
stop racnode2 and racnode1 and start first racnode2 container and check if grid comes up. If yes, start racnode1 container and check if grid comes up.
I asked to do this because racnode2 is already a part of cluster and want to see if grid comes when racnode2 is not running.
@psaini79
Iptables was up on the docker host, so I stopped the service as suggested:
$ systemctl status iptables
● iptables.service - IPv4 firewall with iptables
Loaded: loaded (/usr/lib/systemd/system/iptables.service; enabled; vendor preset: disabled)
Active: active (exited) since Tue 2020-04-21 16:45:28 UTC; 1 weeks 6 days ago
Main PID: 534 (code=exited, status=0/SUCCESS)
Tasks: 0
Memory: 0B
CGroup: /system.slice/iptables.service
Apr 21 16:45:27 ip-172-31-2-173.eu-west-1.compute.internal systemd[1]: Starting IPv4 firewall with iptables...
Apr 21 16:45:28 ip-172-31-2-173.eu-west-1.compute.internal iptables.init[534]: iptables: Applying firewall rules:...]
Apr 21 16:45:28 ip-172-31-2-173.eu-west-1.compute.internal systemd[1]: Started IPv4 firewall with iptables.
Hint: Some lines were ellipsized, use -l to show in full.
/scp:ol7:/home/ec2-user/ #$ sudo systemctl stop iptables
/scp:ol7:/home/ec2-user/ #$ sudo systemctl disable iptables
Removed symlink /etc/systemd/system/basic.target.wants/iptables.service.
Then I followed the steps to stop crs and containers in the specified order. Now after starting racnode2 and then racnode1, I see:
[grid@racnode1 ~]$ crsctl check cluster -all
**************************************************************
racnode1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
racnode2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
[grid@racnode1 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
ONLINE ONLINE racnode1 STABLE
OFFLINE OFFLINE racnode2 STABLE
ora.chad
ONLINE ONLINE racnode1 STABLE
OFFLINE OFFLINE racnode2 STABLE
ora.net1.network
ONLINE ONLINE racnode1 STABLE
ONLINE ONLINE racnode2 STABLE
ora.ons
ONLINE ONLINE racnode1 STABLE
ONLINE ONLINE racnode2 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr(ora.asmgroup)
1 ONLINE ONLINE racnode1 STABLE
2 ONLINE ONLINE racnode2 STABLE
ora.DATA.dg(ora.asmgroup)
1 ONLINE ONLINE racnode1 STABLE
2 ONLINE ONLINE racnode2 STABLE
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE racnode2 STABLE
ora.asm(ora.asmgroup)
1 ONLINE ONLINE racnode1 Started,STABLE
2 ONLINE ONLINE racnode2 Started,STABLE
ora.asmnet1.asmnetwork(ora.asmgroup)
1 ONLINE ONLINE racnode1 STABLE
2 ONLINE ONLINE racnode2 STABLE
ora.cvu
1 ONLINE ONLINE racnode2 STABLE
ora.orclcdb.db
1 ONLINE ONLINE racnode1 Open,HOME=/u01/app/o
racle/product/19.3.0
/dbhome_1,STABLE
ora.qosmserver
1 ONLINE ONLINE racnode2 STABLE
ora.racnode1.vip
1 ONLINE ONLINE racnode1 STABLE
ora.scan1.vip
1 ONLINE ONLINE racnode2 STABLE
--------------------------------------------------------------------------------
I tried to start the listener on racnode2:
[grid@racnode1 ~]$ srvctl start listener -node racnode2
PRCR-1013 : Failed to start resource ora.LISTENER.lsnr
PRCR-1064 : Failed to start resource ora.LISTENER.lsnr on node racnode2
CRS-2805: Unable to start 'ora.LISTENER.lsnr' because it has a 'hard' dependency on resource type 'ora.cluster_vip_net1.type' and no resource of that type can satisfy the dependency
CRS-2525: All instances of the resource 'ora.racnode1.vip' are already running; relocate is not allowed because the force option was not specified
Also, please find the new grid logs below:
@hprop
It is because your RAC setup on node 2 did not complete. I would request if you can re-run root.sh on node2 or recreate the setup. It seems iptables were causing issues as I can see CSSD, CRSD and EVMD process came up fine on node 2.
Please close this thread if the issue is resolved.
My apologies @psaini79 -- I was working on other things and had no chances to come back to this, until now. The issue still persists, I have tried to recreate the whole setup after disabling iptables and encountered the same error when running node2:
$ docker logs -f racnode2
PATH=/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=racnode2
TERM=xterm
SCAN_IP=172.16.1.70
ASM_DISCOVERY_DIR=/oradata
ASM_DEVICE_LIST=/oradata/asm_disk01.img,/oradata/asm_disk02.img,/oradata/asm_disk03.img,/oradata/asm_disk04.img,/oradata/asm_disk05.img
DOMAIN=example.com
PUBLIC_IP=172.16.1.151
PUBLIC_HOSTNAME=racnode2
EXISTING_CLS_NODES=racnode1
PRIV_IP=192.168.17.151
SCAN_NAME=racnode-scan
COMMON_OS_PWD_FILE=common_os_pwdfile.enc
VIP_HOSTNAME=racnode2-vip
PRIV_HOSTNAME=racnode2-priv
ORACLE_SID=ORCLCDB
OP_TYPE=ADDNODE
PWD_KEY=pwd.key
NODE_VIP=172.16.1.161
SETUP_LINUX_FILE=setupLinuxEnv.sh
INSTALL_DIR=/opt/scripts
GRID_BASE=/u01/app/grid
GRID_HOME=/u01/app/19.3.0/grid
INSTALL_FILE_1=LINUX.X64_193000_grid_home.zip
GRID_INSTALL_RSP=gridsetup_19c.rsp
GRID_SW_INSTALL_RSP=grid_sw_install_19c.rsp
GRID_SETUP_FILE=setupGrid.sh
FIXUP_PREQ_FILE=fixupPreq.sh
INSTALL_GRID_BINARIES_FILE=installGridBinaries.sh
INSTALL_GRID_PATCH=applyGridPatch.sh
INVENTORY=/u01/app/oraInventory
CONFIGGRID=configGrid.sh
ADDNODE=AddNode.sh
DELNODE=DelNode.sh
ADDNODE_RSP=grid_addnode.rsp
SETUPSSH=setupSSH.expect
DOCKERORACLEINIT=dockeroracleinit
GRID_USER_HOME=/home/grid
SETUPGRIDENV=setupGridEnv.sh
RESET_OS_PASSWORD=resetOSPassword.sh
MULTI_NODE_INSTALL=MultiNodeInstall.py
DB_BASE=/u01/app/oracle
DB_HOME=/u01/app/oracle/product/19.3.0/dbhome_1
INSTALL_FILE_2=LINUX.X64_193000_db_home.zip
DB_INSTALL_RSP=db_sw_install_19c.rsp
DBCA_RSP=dbca_19c.rsp
DB_SETUP_FILE=setupDB.sh
PWD_FILE=setPassword.sh
RUN_FILE=runOracle.sh
STOP_FILE=stopOracle.sh
ENABLE_RAC_FILE=enableRAC.sh
CHECK_DB_FILE=checkDBStatus.sh
USER_SCRIPTS_FILE=runUserScripts.sh
REMOTE_LISTENER_FILE=remoteListener.sh
INSTALL_DB_BINARIES_FILE=installDBBinaries.sh
GRID_HOME_CLEANUP=GridHomeCleanup.sh
ORACLE_HOME_CLEANUP=OracleHomeCleanup.sh
DB_USER=oracle
GRID_USER=grid
FUNCTIONS=functions.sh
COMMON_SCRIPTS=/common_scripts
CHECK_SPACE_FILE=checkSpace.sh
RESET_FAILED_UNITS=resetFailedUnits.sh
SET_CRONTAB=setCrontab.sh
CRONTAB_ENTRY=crontabEntry
EXPECT=/usr/bin/expect
BIN=/usr/sbin
container=true
INSTALL_SCRIPTS=/opt/scripts/install
SCRIPT_DIR=/opt/scripts/startup
GRID_PATH=/u01/app/19.3.0/grid/bin:/u01/app/19.3.0/grid/OPatch/:/usr/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
DB_PATH=/u01/app/oracle/product/19.3.0/dbhome_1/bin:/u01/app/oracle/product/19.3.0/dbhome_1/OPatch/:/usr/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
GRID_LD_LIBRARY_PATH=/u01/app/19.3.0/grid/lib:/usr/lib:/lib
DB_LD_LIBRARY_PATH=/u01/app/oracle/product/19.3.0/dbhome_1/lib:/usr/lib:/lib
HOME=/home/grid
Failed to parse kernel command line, ignoring: No such file or directory
systemd 219 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD +IDN)
Detected virtualization other.
Detected architecture x86-64.
Welcome to Oracle Linux Server 7.8!
Set hostname to <racnode2>.
Failed to parse kernel command line, ignoring: No such file or directory
Failed to parse kernel command line, ignoring: No such file or directory
Failed to parse kernel command line, ignoring: No such file or directory
/usr/lib/systemd/system-generators/systemd-fstab-generator failed with error code 1.
[/usr/lib/systemd/system/systemd-pstore.service:22] Unknown lvalue 'StateDirectory' in section 'Service'
Cannot add dependency job for unit display-manager.service, ignoring: Unit not found.
[ OK ] Reached target Local Encrypted Volumes.
[ OK ] Reached target Swap.
[ OK ] Created slice Root Slice.
[ OK ] Listening on /dev/initctl Compatibility Named Pipe.
[ OK ] Created slice System Slice.
[ OK ] Created slice system-getty.slice.
[ OK ] Listening on Journal Socket.
Couldn't determine result for ConditionKernelCommandLine=|rd.modules-load for systemd-modules-load.service, assuming failed: No such file or directory
Couldn't determine result for ConditionKernelCommandLine=|modules-load for systemd-modules-load.service, assuming failed: No such file or directory
[ OK ] Created slice User and Session Slice.
[ OK ] Reached target Slices.
[ OK ] Reached target RPC Port Mapper.
Starting Read and set NIS domainname from /etc/sysconfig/network...
Starting Journal Service...
[ OK ] Started Dispatch Password Requests to Console Directory Watch.
[ OK ] Listening on Delayed Shutdown Socket.
Starting Rebuild Hardware Database...
[ OK ] Reached target Local File Systems (Pre).
Starting Configure read-only root support...
[ OK ] Started Forward Password Requests to Wall Directory Watch.
[ OK ] Started Read and set NIS domainname from /etc/sysconfig/network.
[ OK ] Started Journal Service.
Starting Flush Journal to Persistent Storage...
[ OK ] Started Configure read-only root support.
Starting Load/Save Random Seed...
[ OK ] Reached target Local File Systems.
Starting Rebuild Journal Catalog...
Starting Mark the need to relabel after reboot...
Starting Preprocess NFS configuration...
[ OK ] Started Mark the need to relabel after reboot.
[ OK ] Started Load/Save Random Seed.
[ OK ] Started Rebuild Journal Catalog.
[ OK ] Started Flush Journal to Persistent Storage.
Starting Create Volatile Files and Directories...
[ OK ] Started Preprocess NFS configuration.
[ OK ] Started Create Volatile Files and Directories.
Mounting RPC Pipe File System...
Starting Update UTMP about System Boot/Shutdown...
[FAILED] Failed to mount RPC Pipe File System.
See 'systemctl status var-lib-nfs-rpc_pipefs.mount' for details.
[DEPEND] Dependency failed for rpc_pipefs.target.
[DEPEND] Dependency failed for RPC security service for NFS client and server.
[ OK ] Started Update UTMP about System Boot/Shutdown.
[ OK ] Started Rebuild Hardware Database.
Starting Update is Completed...
[ OK ] Started Update is Completed.
[ OK ] Reached target System Initialization.
[ OK ] Listening on RPCbind Server Activation Socket.
Starting RPC bind service...
[ OK ] Started Daily Cleanup of Temporary Directories.
[ OK ] Reached target Timers.
[ OK ] Listening on D-Bus System Message Bus Socket.
[ OK ] Reached target Sockets.
[ OK ] Started Flexible branding.
[ OK ] Reached target Paths.
[ OK ] Reached target Basic System.
Starting Login Service...
Starting GSSAPI Proxy Daemon...
Starting LSB: Bring up/down networking...
Starting Resets System Activity Logs...
Starting Self Monitoring and Reporting Technology (SMART) Daemon...
[ OK ] Started D-Bus System Message Bus.
Starting OpenSSH Server Key Generation...
[ OK ] Started RPC bind service.
[ OK ] Started GSSAPI Proxy Daemon.
[ OK ] Started Resets System Activity Logs.
Starting Cleanup of Temporary Directories...
[ OK ] Reached target NFS client services.
[ OK ] Reached target Remote File Systems (Pre).
[ OK ] Reached target Remote File Systems.
Starting Permit User Sessions...
[ OK ] Started Login Service.
[ OK ] Started Permit User Sessions.
[ OK ] Started Command Scheduler.
[ OK ] Started Cleanup of Temporary Directories.
[ OK ] Started LSB: Bring up/down networking.
[ OK ] Reached target Network.
Starting /etc/rc.d/rc.local Compatibility...
[ OK ] Reached target Network is Online.
Starting Notify NFS peers of a restart...
[ OK ] Started /etc/rc.d/rc.local Compatibility.
[ OK ] Started Console Getty.
[ OK ] Reached target Login Prompts.
[ OK ] Started Notify NFS peers of a restart.
09-29-2020 13:02:02 UTC : : Process id of the program :
09-29-2020 13:02:02 UTC : : #################################################
09-29-2020 13:02:02 UTC : : Starting Grid Installation
09-29-2020 13:02:02 UTC : : #################################################
09-29-2020 13:02:02 UTC : : Pre-Grid Setup steps are in process
09-29-2020 13:02:02 UTC : : Process id of the program :
[ OK ] Started OpenSSH Server Key Generation.
Starting OpenSSH server daemon...
09-29-2020 13:02:02 UTC : : Disable failed service var-lib-nfs-rpc_pipefs.mount
[ OK ] Started OpenSSH server daemon.
Failed to parse kernel command line, ignoring: No such file or directory
Failed to parse kernel command line, ignoring: No such file or directory
Failed to parse kernel command line, ignoring: No such file or directory
09-29-2020 13:02:02 UTC : : Resetting Failed Services
09-29-2020 13:02:02 UTC : : Sleeping for 60 seconds
[ OK ] Started Self Monitoring and Reporting Technology (SMART) Daemon.
[ OK ] Reached target Multi-User System.
[ OK ] Reached target Graphical Interface.
Starting Update UTMP about System Runlevel Changes...
[ OK ] Started Update UTMP about System Runlevel Changes.
Oracle Linux Server 7.8
Kernel 4.14.35-1902.301.1.el7uek.x86_64 on an x86_64
racnode2 login: 09-29-2020 13:03:02 UTC : : Systemctl state is running!
09-29-2020 13:03:02 UTC : : Setting correct permissions for /bin/ping
09-29-2020 13:03:02 UTC : : Public IP is set to 172.16.1.151
09-29-2020 13:03:02 UTC : : RAC Node PUBLIC Hostname is set to racnode2
09-29-2020 13:03:02 UTC : : Preparing host line for racnode2
09-29-2020 13:03:02 UTC : : Adding \n172.16.1.151\tracnode2.example.com\tracnode2 to /etc/hosts
09-29-2020 13:03:02 UTC : : Preparing host line for racnode2-priv
09-29-2020 13:03:02 UTC : : Adding \n192.168.17.151\tracnode2-priv.example.com\tracnode2-priv to /etc/hosts
09-29-2020 13:03:02 UTC : : Preparing host line for racnode2-vip
09-29-2020 13:03:02 UTC : : Adding \n172.16.1.161\tracnode2-vip.example.com\tracnode2-vip to /etc/hosts
09-29-2020 13:03:02 UTC : : racnode-scan already exists : 172.16.1.70 racnode-scan.example.com racnode-scan, no update required
09-29-2020 13:03:02 UTC : : Preapring Device list
09-29-2020 13:03:02 UTC : : Changing Disk permission and ownership /oradata/asm_disk01.img
09-29-2020 13:03:02 UTC : : Changing Disk permission and ownership /oradata/asm_disk02.img
09-29-2020 13:03:02 UTC : : Changing Disk permission and ownership /oradata/asm_disk03.img
09-29-2020 13:03:02 UTC : : Changing Disk permission and ownership /oradata/asm_disk04.img
09-29-2020 13:03:02 UTC : : Changing Disk permission and ownership /oradata/asm_disk05.img
09-29-2020 13:03:02 UTC : : DNS_SERVERS is set to empty. /etc/resolv.conf will use default dns docker embedded server.
09-29-2020 13:03:02 UTC : : #####################################################################
09-29-2020 13:03:02 UTC : : RAC setup will begin in 2 minutes
09-29-2020 13:03:02 UTC : : ####################################################################
09-29-2020 13:03:04 UTC : : ###################################################
09-29-2020 13:03:04 UTC : : Pre-Grid Setup steps completed
09-29-2020 13:03:04 UTC : : ###################################################
09-29-2020 13:03:04 UTC : : Checking if grid is already configured
09-29-2020 13:03:04 UTC : : Public IP is set to 172.16.1.151
09-29-2020 13:03:04 UTC : : RAC Node PUBLIC Hostname is set to racnode2
09-29-2020 13:03:04 UTC : : Domain is defined to example.com
09-29-2020 13:03:04 UTC : : Setting Existing Cluster Node for node addition operation. This will be retrieved from racnode1
09-29-2020 13:03:04 UTC : : Existing Node Name of the cluster is set to racnode1
09-29-2020 13:03:05 UTC : : 172.16.1.150
09-29-2020 13:03:05 UTC : : Existing Cluster node resolved to IP. Check passed
09-29-2020 13:03:05 UTC : : Default setting of AUTO GNS VIP set to false. If you want to use AUTO GNS VIP, please pass DHCP_CONF as an env parameter set to true
09-29-2020 13:03:05 UTC : : RAC VIP set to 172.16.1.161
09-29-2020 13:03:05 UTC : : RAC Node VIP hostname is set to racnode2-vip
09-29-2020 13:03:05 UTC : : SCAN_NAME name is racnode-scan
09-29-2020 13:03:05 UTC : : 172.16.1.70
09-29-2020 13:03:05 UTC : : SCAN Name resolving to IP. Check Passed!
09-29-2020 13:03:05 UTC : : SCAN_IP name is 172.16.1.70
09-29-2020 13:03:05 UTC : : RAC Node PRIV IP is set to 192.168.17.151
09-29-2020 13:03:05 UTC : : RAC Node private hostname is set to racnode2-priv
09-29-2020 13:03:05 UTC : : CMAN_NAME set to the empty string
09-29-2020 13:03:05 UTC : : CMAN_IP set to the empty string
09-29-2020 13:03:05 UTC : : Password file generated
09-29-2020 13:03:05 UTC : : Common OS Password string is set for Grid user
09-29-2020 13:03:05 UTC : : Common OS Password string is set for Oracle user
09-29-2020 13:03:05 UTC : : GRID_RESPONSE_FILE env variable set to empty. AddNode.sh will use standard cluster responsefile
09-29-2020 13:03:05 UTC : : Location for User script SCRIPT_ROOT set to /common_scripts
09-29-2020 13:03:05 UTC : : ORACLE_SID is set to ORCLCDB
09-29-2020 13:03:05 UTC : : Setting random password for root/grid/oracle user
09-29-2020 13:03:05 UTC : : Setting random password for grid user
09-29-2020 13:03:05 UTC : : Setting random password for oracle user
09-29-2020 13:03:05 UTC : : Setting random password for root user
09-29-2020 13:03:05 UTC : : Cluster Nodes are racnode1 racnode2
09-29-2020 13:03:05 UTC : : Running SSH setup for grid user between nodes racnode1 racnode2
09-29-2020 13:03:17 UTC : : Running SSH setup for oracle user between nodes racnode1 racnode2
09-29-2020 13:03:29 UTC : : SSH check fine for the racnode1
09-29-2020 13:03:29 UTC : : SSH check fine for the racnode2
09-29-2020 13:03:29 UTC : : SSH check fine for the racnode2
09-29-2020 13:03:29 UTC : : SSH check fine for the oracle@racnode1
09-29-2020 13:03:29 UTC : : SSH check fine for the oracle@racnode2
09-29-2020 13:03:29 UTC : : SSH check fine for the oracle@racnode2
09-29-2020 13:03:29 UTC : : Setting Device permission to grid and asmadmin on all the cluster nodes
09-29-2020 13:03:29 UTC : : Nodes in the cluster racnode2
09-29-2020 13:03:29 UTC : : Setting Device permissions for RAC Install on racnode2
09-29-2020 13:03:29 UTC : : Preapring ASM Device list
09-29-2020 13:03:29 UTC : : Changing Disk permission and ownership
09-29-2020 13:03:29 UTC : : Command : su - $GRID_USER -c "ssh $node sudo chown $GRID_USER:asmadmin $device" execute on racnode2
09-29-2020 13:03:30 UTC : : Command : su - $GRID_USER -c "ssh $node sudo chmod 660 $device" execute on racnode2
09-29-2020 13:03:30 UTC : : Populate Rac Env Vars on Remote Hosts
09-29-2020 13:03:30 UTC : : Command : su - $GRID_USER -c "ssh $node sudo echo \"export ASM_DEVICE_LIST=${ASM_DEVICE_LIST}\" >> /etc/rac_env_vars" execute on racnode2
09-29-2020 13:03:30 UTC : : Changing Disk permission and ownership
09-29-2020 13:03:30 UTC : : Command : su - $GRID_USER -c "ssh $node sudo chown $GRID_USER:asmadmin $device" execute on racnode2
09-29-2020 13:03:30 UTC : : Command : su - $GRID_USER -c "ssh $node sudo chmod 660 $device" execute on racnode2
09-29-2020 13:03:30 UTC : : Populate Rac Env Vars on Remote Hosts
09-29-2020 13:03:30 UTC : : Command : su - $GRID_USER -c "ssh $node sudo echo \"export ASM_DEVICE_LIST=${ASM_DEVICE_LIST}\" >> /etc/rac_env_vars" execute on racnode2
09-29-2020 13:03:30 UTC : : Changing Disk permission and ownership
09-29-2020 13:03:30 UTC : : Command : su - $GRID_USER -c "ssh $node sudo chown $GRID_USER:asmadmin $device" execute on racnode2
09-29-2020 13:03:30 UTC : : Command : su - $GRID_USER -c "ssh $node sudo chmod 660 $device" execute on racnode2
09-29-2020 13:03:30 UTC : : Populate Rac Env Vars on Remote Hosts
09-29-2020 13:03:30 UTC : : Command : su - $GRID_USER -c "ssh $node sudo echo \"export ASM_DEVICE_LIST=${ASM_DEVICE_LIST}\" >> /etc/rac_env_vars" execute on racnode2
09-29-2020 13:03:30 UTC : : Changing Disk permission and ownership
09-29-2020 13:03:30 UTC : : Command : su - $GRID_USER -c "ssh $node sudo chown $GRID_USER:asmadmin $device" execute on racnode2
09-29-2020 13:03:31 UTC : : Command : su - $GRID_USER -c "ssh $node sudo chmod 660 $device" execute on racnode2
09-29-2020 13:03:31 UTC : : Populate Rac Env Vars on Remote Hosts
09-29-2020 13:03:31 UTC : : Command : su - $GRID_USER -c "ssh $node sudo echo \"export ASM_DEVICE_LIST=${ASM_DEVICE_LIST}\" >> /etc/rac_env_vars" execute on racnode2
09-29-2020 13:03:31 UTC : : Changing Disk permission and ownership
09-29-2020 13:03:31 UTC : : Command : su - $GRID_USER -c "ssh $node sudo chown $GRID_USER:asmadmin $device" execute on racnode2
09-29-2020 13:03:31 UTC : : Command : su - $GRID_USER -c "ssh $node sudo chmod 660 $device" execute on racnode2
09-29-2020 13:03:31 UTC : : Populate Rac Env Vars on Remote Hosts
09-29-2020 13:03:31 UTC : : Command : su - $GRID_USER -c "ssh $node sudo echo \"export ASM_DEVICE_LIST=${ASM_DEVICE_LIST}\" >> /etc/rac_env_vars" execute on racnode2
09-29-2020 13:03:31 UTC : : Checking Cluster Status on racnode1
09-29-2020 13:03:31 UTC : : Checking Cluster
09-29-2020 13:03:31 UTC : : Cluster Check on remote node passed
09-29-2020 13:03:32 UTC : : Cluster Check went fine
09-29-2020 13:03:32 UTC : : CRSD Check went fine
09-29-2020 13:03:32 UTC : : CSSD Check went fine
09-29-2020 13:03:32 UTC : : EVMD Check went fine
09-29-2020 13:03:32 UTC : : Generating Responsefile for node addition
09-29-2020 13:03:32 UTC : : Clustered Nodes are set to racnode2:racnode2-vip:HUB
09-29-2020 13:03:32 UTC : : Running Cluster verification utility for new node racnode2 on racnode1
09-29-2020 13:03:32 UTC : : Nodes in the cluster racnode2
09-29-2020 13:03:32 UTC : : ssh to the node racnode1 and executing cvu checks on racnode2
09-29-2020 13:04:29 UTC : : Checking /tmp/cluvfy_check.txt if there is any failed check.
Verifying Physical Memory ...PASSED
Verifying Available Physical Memory ...PASSED
Verifying Swap Size ...PASSED
Verifying Free Space: racnode2:/usr,racnode2:/var,racnode2:/etc,racnode2:/u01/app/19.3.0/grid,racnode2:/sbin,racnode2:/tmp ...PASSED
Verifying Free Space: racnode1:/usr,racnode1:/var,racnode1:/etc,racnode1:/u01/app/19.3.0/grid,racnode1:/sbin,racnode1:/tmp ...PASSED
Verifying User Existence: oracle ...
Verifying Users With Same UID: 54321 ...PASSED
Verifying User Existence: oracle ...PASSED
Verifying User Existence: grid ...
Verifying Users With Same UID: 54332 ...PASSED
Verifying User Existence: grid ...PASSED
Verifying User Existence: root ...
Verifying Users With Same UID: 0 ...PASSED
Verifying User Existence: root ...PASSED
Verifying Group Existence: asmadmin ...PASSED
Verifying Group Existence: asmoper ...PASSED
Verifying Group Existence: asmdba ...PASSED
Verifying Group Existence: oinstall ...PASSED
Verifying Group Membership: oinstall ...PASSED
Verifying Group Membership: asmdba ...PASSED
Verifying Group Membership: asmadmin ...PASSED
Verifying Group Membership: asmoper ...PASSED
Verifying Run Level ...PASSED
Verifying Hard Limit: maximum open file descriptors ...PASSED
Verifying Soft Limit: maximum open file descriptors ...PASSED
Verifying Hard Limit: maximum user processes ...PASSED
Verifying Soft Limit: maximum user processes ...PASSED
Verifying Soft Limit: maximum stack size ...PASSED
Verifying Architecture ...PASSED
Verifying OS Kernel Version ...PASSED
Verifying OS Kernel Parameter: semmsl ...PASSED
Verifying OS Kernel Parameter: semmns ...PASSED
Verifying OS Kernel Parameter: semopm ...PASSED
Verifying OS Kernel Parameter: semmni ...PASSED
Verifying OS Kernel Parameter: shmmax ...PASSED
Verifying OS Kernel Parameter: shmmni ...PASSED
Verifying OS Kernel Parameter: shmall ...FAILED (PRVG-1201)
Verifying OS Kernel Parameter: file-max ...PASSED
Verifying OS Kernel Parameter: aio-max-nr ...PASSED
Verifying OS Kernel Parameter: panic_on_oops ...PASSED
Verifying Package: kmod-20-21 (x86_64) ...PASSED
Verifying Package: kmod-libs-20-21 (x86_64) ...PASSED
Verifying Package: binutils-2.23.52.0.1 ...PASSED
Verifying Package: compat-libcap1-1.10 ...PASSED
Verifying Package: libgcc-4.8.2 (x86_64) ...PASSED
Verifying Package: libstdc++-4.8.2 (x86_64) ...PASSED
Verifying Package: libstdc++-devel-4.8.2 (x86_64) ...PASSED
Verifying Package: sysstat-10.1.5 ...PASSED
Verifying Package: ksh ...PASSED
Verifying Package: make-3.82 ...PASSED
Verifying Package: glibc-2.17 (x86_64) ...PASSED
Verifying Package: glibc-devel-2.17 (x86_64) ...PASSED
Verifying Package: libaio-0.3.109 (x86_64) ...PASSED
Verifying Package: libaio-devel-0.3.109 (x86_64) ...PASSED
Verifying Package: nfs-utils-1.2.3-15 ...PASSED
Verifying Package: smartmontools-6.2-4 ...PASSED
Verifying Package: net-tools-2.0-0.17 ...PASSED
Verifying Users With Same UID: 0 ...PASSED
Verifying Current Group ID ...PASSED
Verifying Root user consistency ...PASSED
Verifying Node Addition ...
Verifying CRS Integrity ...PASSED
Verifying Clusterware Version Consistency ...PASSED
Verifying '/u01/app/19.3.0/grid' ...PASSED
Verifying Node Addition ...PASSED
Verifying Host name ...PASSED
Verifying Node Connectivity ...
Verifying Hosts File ...PASSED
Verifying Check that maximum (MTU) size packet goes through subnet ...PASSED
Verifying subnet mask consistency for subnet "172.16.1.0" ...PASSED
Verifying subnet mask consistency for subnet "192.168.17.0" ...PASSED
Verifying Node Connectivity ...PASSED
Verifying Multicast or broadcast check ...PASSED
Verifying ASM Integrity ...PASSED
Verifying Device Checks for ASM ...
Verifying Package: cvuqdisk-1.0.10-1 ...PASSED
Verifying ASM device sharedness check ...
Verifying Shared Storage Accessibility:/oradata/asm_disk01.img,/oradata/asm_disk02.img,/oradata/asm_disk03.img,/oradata/asm_disk04.img,/oradata/asm_disk05.img ...PASSED
Verifying ASM device sharedness check ...PASSED
Verifying Access Control List check ...PASSED
Verifying Device Checks for ASM ...PASSED
Verifying Database home availability ...PASSED
Verifying OCR Integrity ...PASSED
Verifying Time zone consistency ...PASSED
Verifying Network Time Protocol (NTP) ...
Verifying '/etc/ntp.conf' ...PASSED
Verifying '/var/run/ntpd.pid' ...PASSED
Verifying '/var/run/chronyd.pid' ...PASSED
Verifying Network Time Protocol (NTP) ...FAILED (PRVG-1017)
Verifying User Not In Group "root": grid ...PASSED
Verifying Time offset between nodes ...PASSED
Verifying resolv.conf Integrity ...FAILED (PRVG-10048)
Verifying DNS/NIS name service ...PASSED
Verifying User Equivalence ...PASSED
Verifying /dev/shm mounted as temporary file system ...PASSED
Verifying /boot mount ...PASSED
Verifying zeroconf check ...PASSED
Pre-check for node addition was unsuccessful on all the nodes.
Failures were encountered during execution of CVU verification request "stage -pre nodeadd".
Verifying OS Kernel Parameter: shmall ...FAILED
racnode2: PRVG-1201 : OS kernel parameter "shmall" does not have expected
configured value on node "racnode2" [Expected = "2251799813685247" ;
Current = "18446744073692774000"; Configured = "1073741824"].
racnode1: PRVG-1201 : OS kernel parameter "shmall" does not have expected
configured value on node "racnode1" [Expected = "2251799813685247" ;
Current = "18446744073692774000"; Configured = "1073741824"].
Verifying Network Time Protocol (NTP) ...FAILED
racnode2: PRVG-1017 : NTP configuration file "/etc/ntp.conf" is present on
nodes "racnode2,racnode1" on which NTP daemon or service was not
running
racnode1: PRVG-1017 : NTP configuration file "/etc/ntp.conf" is present on
nodes "racnode2,racnode1" on which NTP daemon or service was not
running
Verifying resolv.conf Integrity ...FAILED
racnode2: PRVG-10048 : Name "racnode2" was not resolved to an address of the
specified type by name servers "127.0.0.11".
racnode1: PRVG-10048 : Name "racnode1" was not resolved to an address of the
specified type by name servers "127.0.0.11".
CVU operation performed: stage -pre nodeadd
Date: Sep 29, 2020 1:03:34 PM
CVU home: /u01/app/19.3.0/grid/
User: grid
09-29-2020 13:04:29 UTC : : CVU Checks are ignored as IGNORE_CVU_CHECKS set to true. It is recommended to set IGNORE_CVU_CHECKS to false and meet all the cvu checks requirement. RAC installation might fail, if there are failed cvu checks.
09-29-2020 13:04:29 UTC : : Running Node Addition and cluvfy test for node racnode2
09-29-2020 13:04:29 UTC : : Copying /tmp/grid_addnode.rsp on remote node racnode1
09-29-2020 13:04:29 UTC : : Running GridSetup.sh on racnode1 to add the node to existing cluster
09-29-2020 13:05:21 UTC : : Node Addition performed. removing Responsefile
09-29-2020 13:05:21 UTC : : Running root.sh on node racnode2
09-29-2020 13:05:21 UTC : : Nodes in the cluster racnode2
Failed to parse kernel command line, ignoring: No such file or directory
Failed to parse kernel command line, ignoring: No such file or directory
Failed to parse kernel command line, ignoring: No such file or directory
Failed to parse kernel command line, ignoring: No such file or directory
Failed to parse kernel command line, ignoring: No such file or directory
Failed to parse kernel command line, ignoring: No such file or directory
Failed to parse kernel command line, ignoring: No such file or directory
Failed to parse kernel command line, ignoring: No such file or directory
Failed to parse kernel command line, ignoring: No such file or directory
Failed to parse kernel command line, ignoring: No such file or directory
Failed to parse kernel command line, ignoring: No such file or directory
Failed to parse kernel command line, ignoring: No such file or directory
09-29-2020 13:17:47 UTC : : Checking Cluster
09-29-2020 13:17:47 UTC : : Cluster Check passed
09-29-2020 13:17:47 UTC : : Cluster Check went fine
09-29-2020 13:17:47 UTC : : CRSD Check failed!
09-29-2020 13:17:47 UTC : : Error has occurred in Grid Setup, Please verify!
In the same host machine I am able to create and use an Oracle 12c RAC cluster with two nodes. However I am still unable to get a 2-node Oracle 19c RAC working.
Please, let me know any further step to check. I will be focused on this during the next days, hoping to solve it with your help soon.
Thanks!
@hprop
Are you trying RAC on Docker on AWS cloud?
@psaini79 Yes, the Oracle Linux host is an EC2 machine. Is there any limitation for running RAC 19c regarding this? I was able to run a RAC 12c following the same steps in this host.
Thanks for your help.
@hprop
What version of RAC did you test on AWS VMs? Was it Oracle Linux VM deployed in AWS?
@psaini79
The host in both cases is an AWS EC2 machine, with Oracle Linux 7.7.
@onkarnigam14
Sorry for the delayed reply. I recommend using Oracle RAC on Docker on KVM or virtual box on-prem populated with OEL 7.x and UEK5 for further assistance, because Oracle RAC is only supported in the Oracle Cloud: https://www.oracle.com/technetwork/database/options/clustering/overview/rac-cloud-support-2843861.pdf
I am closing this thread. Please reopen if you see any issue on RAC on Docker running on on-prem KVM or Virtual Box.
Hi! I am trying to spin up an Oracle 19c RAC environment with two nodes. I followed the steps described in the README files, and successfully run the first node racnode1. However, when trying to add an additional node, I hit the following error:
The complete docker logs for both nodes are attached below:
Further info from the docker host:
Any help would be very appreciated. Also, please let me know if further information is required. Thanks!