oracle / docker-images

Official source of container configurations, images, and examples for Oracle products and projects
https://developer.oracle.com/use-cases/#containers
Universal Permissive License v1.0
6.53k stars 5.42k forks source link

Deploying with the RAC Storage Container: failed to start containers: racnode1 #955

Closed smbptnk closed 6 years ago

smbptnk commented 6 years ago

Failed to start racnode1 with below error. [root@ip-xxx-xxx-xxx-xxx ~]# docker start racnode1 Error response from daemon: error while mounting volume '/var/lib/docker/volumes/racstorage/_data': error while mounting volume with options: type='nfs' device=':/oradata' o='addr=192.168.17.25,rw,bg,hard,tcp,vers=3,timeo=600,rsize=32768,wsize=32768,actimeo=0': connection refused Error: failed to start containers: racnode1

Thanks in adv.

psaini79 commented 6 years ago

Did you create RAC-Storage container? Please share following:

docker ps -a docker volume ls docker logs -f racnode-storage docker network ls

smbptnk commented 6 years ago

I have created RAC Storage container. As you can see below

  1. docker logs -f racnode-storage ASM_STORAGE_SIZE env variable is not defined! Assining 25GB instead of 50GB default Oracle user will be the owner for /oradata Checking Space Space check passed : /oradata has avilable space 44 and ASM storage set to 25 Creating ASM Disks /oradata/asm_disk01.img of size 5 /oradata/asm_disk01.img file already exist! Skipping file creation Creating ASM Disks /oradata/asm_disk02.img of size 5 /oradata/asm_disk02.img file already exist! Skipping file creation Creating ASM Disks /oradata/asm_disk03.img of size 5 /oradata/asm_disk03.img file already exist! Skipping file creation Creating ASM Disks /oradata/asm_disk04.img of size 5 /oradata/asm_disk04.img file already exist! Skipping file creation Creating ASM Disks /oradata/asm_disk05.img of size 5 /oradata/asm_disk05.img file already exist! Skipping file creation ################################################# Starting NFS Server Setup ################################################# Setting up /etc/exports /oradata *(rw,sync,no_wdelay,no_root_squash) Starting RPC Bind Exporting File System Starting RPC NFSD Starting RPC Mountd Checking NFS server #################################################### NFS Server is up and running Create NFS volume for /oradata/ #################################################### 0

  2. docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 3c775b01ba84 oracle/database-rac:12.2.0.1 "/usr/sbin/oracleinit" 7 minutes ago Created racnode1 47f2ea2124ea oracle/rac-storage-server:12.2.0.1 "/bin/sh -c 'exec ..." 19 minutes ago Up 19 minutes racnode-storage 0fa6bb1b9355 store/oracle/database-enterprise:12.2.0.1 "/bin/sh -c '/bin/..." 2 days ago Exited (137) 41 hours ago catcdb bc6a3587e503 store/oracle/database-enterprise:12.2.0.1 "/bin/sh -c '/bin/..." 2 days ago Exited (137) 41 hours ago mxpdb

  3. docker volume ls DRIVER VOLUME NAME local 04cdc8b6424b293b404dc8835c24eb20ec5ec387852071210dbeedf5f1e4a34b local 0c6cd1bbdd263303a78db4dc3f20985170ac413c9c9ba024ccd52353d54d7134 local 1e0dc3827019b455e4709f48424a815e561461a5106d96452ee739ef9ab9643b local 228e37af4c31af2481c91eb521094937c550f6fb2443f2531b46e6386cf5b73d local 472520202fd81906e9851cd27249df1976c756dceeb8d6612464281edf25ba21 local 4ef7d45140e4ae04b27697651c0f35250cbd75155c5592441fc5f36d72646826 local 6dc993c7612bf16c23bbce9fbcc6fc4de34c51c6ae301621d652ce4ea3e72c11 local 721c75927058c747465be1dccd6bd7437b397f7ace5d255df946792ddfb4c442 local 7b1dbbcd19092145069d06d86c380231bbfe9e75597deb048e2f337325a03951 local b8f57998aaf80b1775473b7f02395119fdb4c6c958434f53a66dcde462aa926d local c4adcbceeab80a52724ab78a958338346437df5777d16d3378f400a0597f2521 local d43e465f94026d5e0e524036dea2e27d62367ccb30fbe85c68f09c305131d803 local racstorage

  4. docker network ls NETWORK ID NAME DRIVER SCOPE 47daa93faab0 bridge bridge local 7f11e556dc51 host host local 2a70667460e9 none null local b3d495bff13b rac_priv1_nw bridge local 5f6e430f9fc0 rac_pub1_nw bridge local

On docker start racnode1 Got Error response from daemon: error while mounting volume '/var/lib/docker/volumes/racstorage/_data': error while mounting volume with options: type='nfs' device=':/oradata' o='addr=192.168.17.25,rw,bg,hard,tcp,vers=3,timeo=600,rsize=32768,wsize=32768,actimeo=0': connection refused Error: failed to start containers: racnode1

FYI: [root@ip-1xx-xx-xx-xxx asm_vol]# ls -l /docker_volumes/asm_vol total 26214400 -rw-r--r--. 1 54321 54321 5368709120 Aug 2 10:37 asm_disk01.img -rw-r--r--. 1 54321 54321 5368709120 Aug 2 10:38 asm_disk02.img -rw-r--r--. 1 54321 54321 5368709120 Aug 2 10:38 asm_disk03.img -rw-r--r--. 1 54321 54321 5368709120 Aug 2 10:39 asm_disk04.img -rw-r--r--. 1 54321 54321 5368709120 Aug 2 10:40 asm_disk05.img

psaini79 commented 6 years ago

Though setup looks correct, I have a question why ASM disks are 25GB total? I think they should be by default 50GB. Did you change the size? if yes, please create ASM disks of size 50G.

Also, can you paste outpu of following from Docker host: showmount -e

Execute following on docker host: rpm -qa | grep nfs-utils

Did you install nfs server packages on docker host? rpm -qa

smbptnk commented 6 years ago

I allowed custom allocation of 25GB instead of default 50 GB. I think it should be fine for my requirement. Evidently we do not see any error due to ASM disk allocation. Per your reference nfs-utils was missing, so I installed the rpm and tried restarting racnode1 , this time it passed the earlier error , now it is different error - $ docker start racnode1 Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:258: applying cgroup configuration for process caused \"failed to write 95000 to cpu.rt_runtime_us: write /sys/fs/cgroup/cpu,cpuacct/docker/f5b353f1026e525258705d503b16e413889a70aa5113f55a8038c9602566896d/cpu.rt_runtime_us: invalid argument\"" Error: failed to start containers: racnode1

my env: AMI ID RHEL-7.5_HVM_GA-20180322-x86_64-1-Hourly2-GP2 (ami-6871xxxx)

psaini79 commented 6 years ago

Hi,

I would request you to look into following threads:

https://github.com/oracle/docker-images/issues/837 https://github.com/oracle/docker-images/issues/838

I also request you to check following thread if you are using overlay FS. https://github.com/oracle/docker-images/issues/839

Param

psaini79 commented 6 years ago

Keep me posted if you see any issue.

smbptnk commented 6 years ago

Thread #838 helped. I can connect to racnode1 container docker logs -f racnode1 Failures were encountered during execution of CVU verification request "stage -pre crsinst". Welcome to Oracle Linux Server 7.5! Set hostname to . Failed to parse kernel command line, ignoring: No such file or directory Failed to parse kernel command line, ignoring: No such file or directory Failed to parse kernel command line, ignoring: No such file or directory /usr/lib/systemd/system-generators/systemd-fstab-generator failed with error code 1. Cannot add dependency job for unit display-manager.service, ignoring: Unit not found. Couldn't determine result for ConditionKernelCommandLine=|rd.modules-load for systemd-modules-load.service, assuming failed: No such file or directory Couldn't determine result for ConditionKernelCommandLine=|modules-load for systemd-modules-load.service, assuming failed: No such file or directory Verifying OS Kernel Parameter: semopm ...FAILED racnode1: PRVG-1205 : OS kernel parameter "semopm" does not have expected current value on node "racnode1" [Expected = "100" ; Current = "32"; Configured = "100"].

Verifying OS Kernel Parameter: rmem_default ...FAILED racnode1: PRVG-2044 : Command "/sbin/sysctl -a | grep rmem_default[[:space:]]*=" failed on node "racnode1" and produced no output. racnode1: PRVF-7544 : Check cannot be performed for kernel parameter "rmem_default" on node "racnode1" racnode1: PRVG-1205 : OS kernel parameter "rmem_default" does not have expected current value on node "racnode1" [Expected = "262144" ; Current = "unknown"; Configured = "262144"].

Verifying OS Kernel Parameter: rmem_max ...FAILED racnode1: PRVG-2044 : Command "/sbin/sysctl -a | grep rmem_max[[:space:]]*=" failed on node "racnode1" and produced no output. racnode1: PRVF-7544 : Check cannot be performed for kernel parameter "rmem_max" on node "racnode1" racnode1: PRVG-1205 : OS kernel parameter "rmem_max" does not have expected current value on node "racnode1" [Expected = "4194304" ; Current = "unknown"; Configured = "4194304"].

Verifying OS Kernel Parameter: wmem_default ...FAILED racnode1: PRVG-2044 : Command "/sbin/sysctl -a | grep wmem_default[[:space:]]*=" failed on node "racnode1" and produced no output. racnode1: PRVF-7544 : Check cannot be performed for kernel parameter "wmem_default" on node "racnode1" racnode1: PRVG-1205 : OS kernel parameter "wmem_default" does not have expected current value on node "racnode1" [Expected = "262144" ; Current = "unknown"; Configured = "262144"].

Verifying OS Kernel Parameter: wmem_max ...FAILED racnode1: PRVG-2044 : Command "/sbin/sysctl -a | grep wmem_max[[:space:]]*=" failed on node "racnode1" and produced no output. racnode1: PRVF-7544 : Check cannot be performed for kernel parameter "wmem_max" on node "racnode1" racnode1: PRVG-1205 : OS kernel parameter "wmem_max" does not have expected current value on node "racnode1" [Expected = "1048576" ; Current = "unknown"; Configured = "1048576"].

Verifying OS Kernel Parameter: aio-max-nr ...FAILED racnode1: PRVG-1205 : OS kernel parameter "aio-max-nr" does not have expected current value on node "racnode1" [Expected = "1048576" ; Current = "65536"; Configured = "1048576"].

racnode1: PRVG-10048 : Name "racnode1" was not resolved to an address of the specified type by name servers o"127.0.0.11".

racnode1: PRVG-10048 : Name "racnode1" was not resolved to an address of the specified type by name servers o"127.0.0.11".

PRVG-1101 : SCAN name "racnode-scan" failed to resolve

cluvfy.tar.gz

psaini79 commented 6 years ago

OK. It seems some kernel parameters are not set in you env. Did you make sure as per the README.MD, you set the following at Docker Host level:

Please setup some kernel parameters at host level

psaini79 commented 6 years ago

OK. It seems some kernel parameters are not set in you env. Did you make sure as per the README.MD, you set the following at Docker Host level:

fs.file-max = 6815744 net.core.rmem_max = 4194304 net.core.rmem_default = 262144 net.core.wmem_max = 1048576 net.core.wmem_default = 262144 net.core.rmem_default = 262144

sysctl -a sysctl -p

However, I am wondering you are kernel.sem cvu messages as well. If you still see issue, please setup following at host level:

oracle-database-server-12cR2-preinstall setting for fs.file-max is 6815744

fs.file-max = 6815744

oracle-database-server-12cR2-preinstall setting for kernel.sem is '250 32000 100 128'

kernel.sem = 250 32000 100 128

oracle-database-server-12cR2-preinstall setting for kernel.shmmni is 4096

kernel.shmmni = 4096

oracle-database-server-12cR2-preinstall setting for kernel.shmall is 1073741824 on x86_64

kernel.shmall = 1073741824

oracle-database-server-12cR2-preinstall setting for kernel.shmmax is 4398046511104 on x86_64

kernel.shmmax = 4398046511104

oracle-database-server-12cR2-preinstall setting for kernel.panic_on_oops is 1 per Orabug 19212317

kernel.panic_on_oops = 1

oracle-database-server-12cR2-preinstall setting for net.core.rmem_default is 262144

net.core.rmem_default = 262144

oracle-database-server-12cR2-preinstall setting for net.core.rmem_max is 4194304

net.core.rmem_max = 4194304

oracle-database-server-12cR2-preinstall setting for net.core.wmem_default is 262144

net.core.wmem_default = 262144

oracle-database-server-12cR2-preinstall setting for net.core.wmem_max is 1048576

net.core.wmem_max = 1048576

oracle-database-server-12cR2-preinstall setting for net.ipv4.conf.all.rp_filter is 2

net.ipv4.conf.all.rp_filter = 2

oracle-database-server-12cR2-preinstall setting for net.ipv4.conf.default.rp_filter is 2

net.ipv4.conf.default.rp_filter = 2

oracle-database-server-12cR2-preinstall setting for fs.aio-max-nr is 1048576

fs.aio-max-nr = 1048576

oracle-database-server-12cR2-preinstall setting for net.ipv4.ip_local_port_range is 9000 65500

net.ipv4.ip_local_port_range = 9000 65500

sysctl -a sysctl -p

This is strange that you are getting kernel parameters as we have modified some kernel parameters at container level. I will look into it.

smbptnk commented 6 years ago

double checked for kernel parameters at host, all seems to be set properly. Failures were encountered during execution of CVU verification request "stage -pre crsinst".

Verifying OS Kernel Parameter: semopm ...FAILED racnode1: PRVG-1205 : OS kernel parameter "semopm" does not have expected current value on node "racnode1" [Expected = "100" ; Current = "32"; Configured = "100"].

Verifying OS Kernel Parameter: rmem_default ...FAILED racnode1: PRVG-2044 : Command "/sbin/sysctl -a | grep rmem_default[[:space:]]*=" failed on node "racnode1" and produced no output. racnode1: PRVF-7544 : Check cannot be performed for kernel parameter "rmem_default" on node "racnode1" racnode1: PRVG-1205 : OS kernel parameter "rmem_default" does not have expected current value on node "racnode1" [Expected = "262144" ; Current = "unknown"; Configured = "262144"].

Verifying OS Kernel Parameter: rmem_max ...FAILED racnode1: PRVG-2044 : Command "/sbin/sysctl -a | grep rmem_max[[:space:]]*=" failed on node "racnode1" and produced no output. racnode1: PRVF-7544 : Check cannot be performed for kernel parameter "rmem_max" on node "racnode1" racnode1: PRVG-1205 : OS kernel parameter "rmem_max" does not have expected current value on node "racnode1" [Expected = "4194304" ; Current = "unknown"; Configured = "4194304"].

Verifying OS Kernel Parameter: wmem_default ...FAILED racnode1: PRVG-2044 : Command "/sbin/sysctl -a | grep wmem_default[[:space:]]*=" failed on node "racnode1" and produced no output. racnode1: PRVF-7544 : Check cannot be performed for kernel parameter "wmem_default" on node "racnode1" racnode1: PRVG-1205 : OS kernel parameter "wmem_default" does not have expected current value on node "racnode1" [Expected = "262144" ; Current = "unknown"; Configured = "262144"].

Verifying OS Kernel Parameter: wmem_max ...FAILED racnode1: PRVG-2044 : Command "/sbin/sysctl -a | grep wmem_max[[:space:]]*=" failed on node "racnode1" and produced no output. racnode1: PRVF-7544 : Check cannot be performed for kernel parameter "wmem_max" on node "racnode1" racnode1: PRVG-1205 : OS kernel parameter "wmem_max" does not have expected current value on node "racnode1" [Expected = "1048576" ; Current = "unknown"; Configured = "1048576"].

racnode1: PRVG-10048 : Name "racnode1" was not resolved to an address of the specified type by name servers o"127.0.0.11".

racnode1: PRVG-10048 : Name "racnode1" was not resolved to an address of the specified type by name servers o"127.0.0.11".

Attached racnode1 : sysctl.conf.gz

psaini79 commented 6 years ago

1) DNS/NTP error is expected because DNS is not available but it can be ignored and it is instrumented in the installation scripts in RAC docker image. For testing prupose, you can use single SCAN and can ignore DNS but not recommended and supported in your application workload or production workload. 2) Kernel parameter errors are strange to me as I never faced them.

However, I have a question:

Are you running cvu manually? or these errors are thrown by our scripts shipped with DockerFiles?

Also, I again tested it on my env and it looks like following:

===== Verifying Physical Memory ...PASSED Verifying Available Physical Memory ...PASSED Verifying Swap Size ...FAILED (PRVF-7573) Verifying Free Space: racnode1:/usr,racnode1:/var,racnode1:/etc,racnode1:/sbin,racnode1:/tmp,racnode1:/u01/app/grid ...PASSED Verifying User Existence: grid ... Verifying Users With Same UID: 54332 ...PASSED Verifying User Existence: grid ...PASSED Verifying Group Existence: asmadmin ...PASSED Verifying Group Existence: dba ...PASSED Verifying Group Membership: dba ...PASSED Verifying Group Membership: asmadmin ...PASSED Verifying Run Level ...PASSED Verifying Hard Limit: maximum open file descriptors ...PASSED Verifying Soft Limit: maximum open file descriptors ...PASSED Verifying Hard Limit: maximum user processes ...PASSED Verifying Soft Limit: maximum user processes ...PASSED Verifying Soft Limit: maximum stack size ...PASSED Verifying Architecture ...PASSED Verifying OS Kernel Version ...PASSED Verifying OS Kernel Parameter: semmsl ...PASSED Verifying OS Kernel Parameter: semmns ...PASSED Verifying OS Kernel Parameter: semopm ...PASSED Verifying OS Kernel Parameter: semmni ...PASSED Verifying OS Kernel Parameter: shmmax ...PASSED Verifying OS Kernel Parameter: shmmni ...PASSED Verifying OS Kernel Parameter: shmall ...PASSED Verifying OS Kernel Parameter: file-max ...PASSED Verifying OS Kernel Parameter: aio-max-nr ...PASSED Verifying OS Kernel Parameter: panic_on_oops ...PASSED Verifying Package: binutils-2.23.52.0.1 ...PASSED Verifying Package: compat-libcap1-1.10 ...PASSED Verifying Package: libgcc-4.8.2 (x86_64) ...PASSED Verifying Package: libstdc++-4.8.2 (x86_64) ...PASSED Verifying Package: libstdc++-devel-4.8.2 (x86_64) ...PASSED Verifying Package: sysstat-10.1.5 ...PASSED Verifying Package: ksh ...PASSED Verifying Package: make-3.82 ...PASSED Verifying Package: glibc-2.17 (x86_64) ...PASSED Verifying Package: glibc-devel-2.17 (x86_64) ...PASSED Verifying Package: libaio-0.3.109 (x86_64) ...PASSED Verifying Package: libaio-devel-0.3.109 (x86_64) ...PASSED Verifying Package: nfs-utils-1.2.3-15 ...PASSED Verifying Package: smartmontools-6.2-4 ...PASSED Verifying Package: net-tools-2.0-0.17 ...PASSED Verifying Port Availability for component "Oracle Remote Method Invocation (ORMI)" ...PASSED Verifying Port Availability for component "Oracle Notification Service (ONS)" ...PASSED Verifying Port Availability for component "Oracle Cluster Synchronization Services (CSSD)" ...PASSED Verifying Port Availability for component "Oracle Notification Service (ONS) Enterprise Manager support" ...PASSED Verifying Port Availability for component "Oracle Database Listener" ...PASSED Verifying Users With Same UID: 0 ...PASSED Verifying Current Group ID ...PASSED Verifying Root user consistency ...PASSED Verifying Node Connectivity ... Verifying Hosts File ...PASSED Verifying Check that maximum (MTU) size packet goes through subnet ...PASSED Verifying Node Connectivity ...PASSED Verifying Multicast check ...PASSED Verifying Device Checks for ASM ... Verifying ASM device sharedness check ... Verifying Package: cvuqdisk-1.0.10-1 ...PASSED Verifying Shared Storage Accessibility:/asmdisks/asm_disk04.img,/asmdisks/asm_disk01.img,/asmdisks/asm_disk06.img,/asmdisks/asm_disk05.img,/asmdisks/asm_disk02.img,/asmdisks/asm_disk03.img ...PASSED Verifying ASM device sharedness check ...PASSED Verifying Device Checks for ASM ...PASSED Verifying I/O scheduler ... Verifying Package: cvuqdisk-1.0.10-1 ...PASSED Verifying I/O scheduler ...PASSED Verifying Network Time Protocol (NTP) ... Verifying '/etc/ntp.conf' ...PASSED Verifying '/var/run/ntpd.pid' ...PASSED Verifying '/var/run/chronyd.pid' ...PASSED Verifying Network Time Protocol (NTP) ...FAILED Verifying Same core file name pattern ...PASSED Verifying User Mask ...PASSED Verifying User Not In Group "root": grid ...PASSED Verifying Time zone consistency ...PASSED Verifying VIP Subnet configuration check ...PASSED Verifying resolv.conf Integrity ... Verifying (Linux) resolv.conf Integrity ...FAILED (PRVG-10048) Verifying resolv.conf Integrity ...FAILED (PRVG-10048) Verifying DNS/NIS name service ... Verifying Name Service Switch Configuration File Integrity ...PASSED Verifying DNS/NIS name service ...PASSED Verifying Single Client Access Name (SCAN) ...WARNING (PRVG-11368) Verifying Domain Sockets ...PASSED Verifying /boot mount ...PASSED Verifying Daemon "avahi-daemon" not configured and running ...PASSED Verifying Daemon "proxyt" not configured and running ...PASSED Verifying loopback network interface address ...PASSED Verifying Oracle base: /u01/app/grid ... Verifying '/u01/app/grid' ...PASSED Verifying Oracle base: /u01/app/grid ...PASSED Verifying User Equivalence ...PASSED Verifying File system mount options for path /var ...PASSED Verifying zeroconf check ...PASSED Verifying ASM Filter Driver configuration ...PASSED

Pre-check for cluster services setup was unsuccessful on all the nodes.

Failures were encountered during execution of CVU verification request "stage -pre crsinst".

Verifying Swap Size ...FAILED racnode1: PRVF-7573 : Sufficient swap size is not available on node "racnode1" [Required = 16GB (1.6777216E7KB) ; Found = 8GB (8388604.0KB)]

Verifying Network Time Protocol (NTP) ...FAILED Verifying resolv.conf Integrity ...FAILED racnode1: PRVG-10048 : Name "racnode1" was not resolved to an address of the specified type by name servers o"162.88.18.6". racnode1: PRVG-10048 : Name "racnode1" was not resolved to an address of the specified type by name servers o"162.88.2.6". racnode1: Check for integrity of file "/etc/resolv.conf" failed

Verifying (Linux) resolv.conf Integrity ...FAILED racnode1: PRVG-10048 : Name "racnode1" was not resolved to an address of the specified type by name servers o"162.88.18.6". racnode1: PRVG-10048 : Name "racnode1" was not resolved to an address of the specified type by name servers o"162.88.2.6".

Verifying Single Client Access Name (SCAN) ...WARNING racnode1: PRVG-11368 : A SCAN is recommended to resolve to "3" or more IP addresses, but SCAN "racnode-scan.k8local.com" resolves to only "192.168.16.53,192.168.16.54"

psaini79 commented 6 years ago

I would request if you can paste the steps so that I can try to reproduce internally and will look into it.

smbptnk commented 6 years ago

I followed the steps as per README.MD cvu errors are thrown by the scripts shipped with DockerFiles. I'm not running them manually. My env is on AWS AMI ID RHEL-7.5_HVM_GA-20180322-x86_64-1-Hourly2-GP2 (ami-6871a115) No sure if its tested on this environment.

psaini79 commented 6 years ago

This is strange as no one reported this kind of error yet. Technically, it should work on RHEL . However, I have no idea if any policy or setting at AWS causing issue which I have no idea.

psaini79 commented 6 years ago

Also, I looked at your sysctl conf file, where did you update this? on Docker Host i.e. your AWS instance or at container?

After looking at the file, it seems it is at racnode container level. What is the conf of your sysctl.conf at host level?

Can you set following parameters at docker host level where your docker engine is running:

edit /etc/syctl.conf and following

fs.file-max = 6815744 kernel.sem = 250 32000 100 128 kernel.shmmni = 4096 kernel.shmall = 1073741824 kernel.shmmax = 4398046511104 kernel.panic_on_oops = 1 net.core.rmem_default = 262144 net.core.rmem_max = 4194304 net.core.wmem_default = 262144 net.core.wmem_max = 1048576 net.ipv4.conf.all.rp_filter = 2 net.ipv4.conf.default.rp_filter = 2 fs.aio-max-nr = 1048576 net.ipv4.ip_local_port_range = 9000 65500

sysctl -p sysctl -a

smbptnk commented 6 years ago

Please find my comments inline

Also, I looked at your sysctl conf file, where did you update this? on Docker Host i.e. your AWS instance or at container? Docker Host level After looking at the file, it seems it is at racnode container level. What is the conf of your sysctl.conf at host level? fs.file-max = 6815744 net.core.rmem_max = 4194304 net.core.rmem_default = 262144 net.core.wmem_max = 1048576 net.core.wmem_default = 262144 net.core.rmem_default = 262144 kernel.sem = 250 32000 100 128 kernel.shmmni = 4096 kernel.shmall = 1073741824 kernel.shmmax = 4398046511104 kernel.panic_on_oops = 1 net.ipv4.conf.all.rp_filter = 2 net.ipv4.conf.default.rp_filter = 2 fs.aio-max-nr = 1048576 net.ipv4.ip_local_port_range = 9000 65500 It seems some parameters are pulled from host during racnode1 setup except for kernel param could you please check from your end

FYI systemctl status systemd-modules-load ● systemd-modules-load.service - Load Kernel Modules Loaded: loaded (/usr/lib/systemd/system/systemd-modules-load.service; static; vendor preset: disabled) Active: inactive (dead) Condition: start condition failed at Fri 2018-08-10 20:38:07 UTC; 1h 22min ago none of the trigger conditions were met Docs: man:systemd-modules-load.service(8) man:modules-load.d(5)

smbptnk commented 6 years ago

Will it be possible to take control of my instance AMI for a first hand look. If so I will need rsa pub key

Djelibeybi commented 6 years ago

Will it be possible to take control of my instance AMI for a first hand look. If so I will need rsa pub key

Why not the other way around? Deploy an Oracle Linux 7 AMI and see if that works.

smbptnk commented 6 years ago

The Clckwrk OEL version adds a hefty surcharge of $0.06 per hour :(

psaini79 commented 6 years ago

systemd-modules-load.service service is in failed state is expected as we are running in non-priv mode. We are not starting any service which is not required.

psaini79 commented 6 years ago

Please try following:

1) Stop racnode container 2) delete the racnode1 container 3) delete the RAC image i.e. oracle/database-rac:12.2.0.1 4) cd $DOCKER_RAC_IMAGE/dockerfiles/12.2.0.1 Note: You need to change it in RAC DockerFile only. Do not change anything in connection manager and nfs storage image. 5) Modify following setupLinuxEnv.sh and change its content to following: mkdir /asmdisks && \ mkdir /responsefiles && \ chmod ug+x /opt/scripts/startup/*.sh && \ yum -y install systemd oracle-database-server-12cR2-preinstall net-tools which zip unzip tar openssl expect e2fsprogs openssh-server openssh-client vim-minimal passwd which sudo && \ yum clean all

Build the image again and create the racnode1 container and let me know how it goes.

smbptnk commented 6 years ago

No change. Got same error again

Verifying OS Kernel Parameter: semopm ...FAILED (PRVG-1205) Verifying OS Kernel Parameter: rmem_default ...FAILED (PRVG-2044, PRVF-7544, PRVG-1205) Verifying OS Kernel Parameter: rmem_max ...FAILED (PRVG-2044, PRVF-7544, PRVG-1205) Verifying OS Kernel Parameter: wmem_default ...FAILED (PRVG-2044, PRVF-7544, PRVG-1205) Verifying OS Kernel Parameter: wmem_max ...FAILED (PRVG-2044, PRVF-7544, PRVG-1205) Verifying OS Kernel Parameter: semopm ...FAILED Verifying OS Kernel Parameter: rmem_default ...FAILED Verifying OS Kernel Parameter: rmem_max ...FAILED Verifying OS Kernel Parameter: wmem_default ...FAILED Verifying OS Kernel Parameter: wmem_max ...FAILED

[grid@racnode1 ~]$ rpm -qa |grep systemd systemd-libs-219-57.0.1.el7.x86_64 systemd-219-57.0.1.el7.x86_64 systemd-sysv-219-57.0.1.el7.x86_64

psaini79 commented 6 years ago

Systemd version seems to be fine. However, I am wondering why these kernel parameters are not being picked up in container. Kernel.sem can be namespaced. where kernel.sem is equal to following: kernel.sem = SEMMSL SEMMNS SEMOPM SEMMNI

Now, I am wondering why SEMOPM is not updated in the container but rest of them being picked up. Please check following docker documentation: https://docs.docker.com/v17.06/edge/engine/reference/commandline/run/#configure-namespaced-kernel-parameters-sysctls-at-runtime

Also, rmem/wmem parameter must be picked from host. In my env which is a Oracle Linux 7.5 and 7.4, I am not facing this error.

Can you please share the following command output with me: docker version docker inspect racnode1 docker info uname -a -> from docker host

smbptnk commented 6 years ago

Issue in setting --sysctl net.core parameter with docker container. #30778 https://github.com/moby/moby/issues/30778

smbptnk commented 6 years ago

docker version

Client: Version: 17.05.0-ce API version: 1.29 Go version: go1.7.5 Git commit: 89658be Built: Thu May 4 22:06:25 2017 OS/Arch: linux/amd64

Server: Version: 17.05.0-ce API version: 1.29 (minimum version 1.12) Go version: go1.7.5 Git commit: 89658be Built: Thu May 4 22:06:25 2017 OS/Arch: linux/amd64 Experimental: false

docker inspect racnode1

[ { "Id": "e6bb03d9bac6824939542fea51722d34821350c584142c16036db69de392384d", "Created": "2018-08-11T15:39:53.138724805Z", "Path": "/usr/sbin/oracleinit", "Args": [], "State": { "Status": "running", "Running": true, "Paused": false, "Restarting": false, "OOMKilled": false, "Dead": false, "Pid": 1876, "ExitCode": 0, "Error": "", "StartedAt": "2018-08-12T11:58:40.450785878Z", "FinishedAt": "2018-08-11T18:32:38.340511066Z" }, "Image": "sha256:bf30ea53b7eebc26366983d88a6b65819c1cb6b2584be8dfb4e2c133a9ce5db3", "ResolvConfPath": "/var/lib/docker/containers/e6bb03d9bac6824939542fea51722d34821350c584142c16036db69de392384d/resolv.conf", "HostnamePath": "/var/lib/docker/containers/e6bb03d9bac6824939542fea51722d34821350c584142c16036db69de392384d/hostname", "HostsPath": "/opt/containers/rac_host_file", "LogPath": "/var/lib/docker/containers/e6bb03d9bac6824939542fea51722d34821350c584142c16036db69de392384d/e6bb03d9bac6824939542fea51722d34821350c584142c16036db69de392384d-json.log", "Name": "/racnode1", "RestartCount": 0, "Driver": "overlay", "MountLabel": "", "ProcessLabel": "", "AppArmorProfile": "", "ExecIDs": null, "HostConfig": { "Binds": [ "/boot:/boot:ro", "/opt/containers/rac_host_file:/etc/hosts", "racstorage:/oradata", "/sys/fs/cgroup:/sys/fs/cgroup:ro" ], "ContainerIDFile": "", "LogConfig": { "Type": "json-file", "Config": {} }, "NetworkMode": "default", "PortBindings": {}, "RestartPolicy": { "Name": "always", "MaximumRetryCount": 0 }, "AutoRemove": false, "VolumeDriver": "", "VolumesFrom": null, "CapAdd": [ "SYS_ADMIN", "SYS_NICE", "SYS_RESOURCE", "NET_ADMIN" ], "CapDrop": null, "Dns": [], "DnsOptions": [], "DnsSearch": [ "example.com" ], "ExtraHosts": null, "GroupAdd": null, "IpcMode": "", "Cgroup": "", "Links": null, "OomScoreAdj": 0, "PidMode": "", "Privileged": false, "PublishAllPorts": false, "ReadonlyRootfs": false, "SecurityOpt": null, "Tmpfs": { "/dev/shm": "rw,exec,size=4G", "/run": "" }, "UTSMode": "", "UsernsMode": "", "ShmSize": 67108864, "Runtime": "runc", "ConsoleSize": [ 0, 0 ], "Isolation": "", "CpuShares": 0, "Memory": 0, "NanoCpus": 0, "CgroupParent": "", "BlkioWeight": 0, "BlkioWeightDevice": null, "BlkioDeviceReadBps": null, "BlkioDeviceWriteBps": null, "BlkioDeviceReadIOps": null, "BlkioDeviceWriteIOps": null, "CpuPeriod": 0, "CpuQuota": 0, "CpuRealtimePeriod": 0, "CpuRealtimeRuntime": 95000, "CpusetCpus": "", "CpusetMems": "", "Devices": [], "DeviceCgroupRules": null, "DiskQuota": 0, "KernelMemory": 0, "MemoryReservation": 0, "MemorySwap": 0, "MemorySwappiness": -1, "OomKillDisable": false, "PidsLimit": 0, "Ulimits": [ { "Name": "rtprio", "Hard": 99, "Soft": 99 } ], "CpuCount": 0, "CpuPercent": 0, "IOMaximumIOps": 0, "IOMaximumBandwidth": 0 }, "GraphDriver": { "Data": { "LowerDir": "/var/lib/docker/overlay/5ca555f4a74a302c15e64957cc99b8723b29a6ffaf3c40d254974c8e5724efe1/root", "MergedDir": "/var/lib/docker/overlay/8f4e799f111e44af49eaec5ed4edfec52b8e3c5f465611dafcd36778ec24fa2d/merged", "UpperDir": "/var/lib/docker/overlay/8f4e799f111e44af49eaec5ed4edfec52b8e3c5f465611dafcd36778ec24fa2d/upper", "WorkDir": "/var/lib/docker/overlay/8f4e799f111e44af49eaec5ed4edfec52b8e3c5f465611dafcd36778ec24fa2d/work" }, "Name": "overlay" }, "Mounts": [ { "Type": "volume", "Name": "4f5488bcdec2552d706441860d824937ec83e035fc0057292d19e3cf0cf16c13", "Source": "/var/lib/docker/volumes/4f5488bcdec2552d706441860d824937ec83e035fc0057292d19e3cf0cf16c13/_data", "Destination": "/common_scripts", "Driver": "local", "Mode": "", "RW": true, "Propagation": "" }, { "Type": "volume", "Name": "75c1637b50fdc95fa311020f4bf8ac6fb1285bf53bc4b010aa564cc2440a0212", "Source": "", "Destination": "/dev/shm", "Driver": "local", "Mode": "", "RW": true, "Propagation": "" }, { "Type": "bind", "Source": "/opt/containers/rac_host_file", "Destination": "/etc/hosts", "Mode": "", "RW": true, "Propagation": "" }, { "Type": "volume", "Name": "racstorage", "Source": "/var/lib/docker/volumes/racstorage/_data", "Destination": "/oradata", "Driver": "local", "Mode": "z", "RW": true, "Propagation": "" }, { "Type": "bind", "Source": "/sys/fs/cgroup", "Destination": "/sys/fs/cgroup", "Mode": "ro", "RW": false, "Propagation": "" }, { "Type": "bind", "Source": "/boot", "Destination": "/boot", "Mode": "ro", "RW": false, "Propagation": "" } ], "Config": { "Hostname": "racnode1", "Domainname": "", "User": "grid", "AttachStdin": true, "AttachStdout": true, "AttachStderr": true, "Tty": true, "OpenStdin": true, "StdinOnce": true, "Env": [ "NODE_VIP=172.31.60.160", "VIP_HOSTNAME=racnode1-vip", "PRIV_IP=192.168.17.150", "PRIV_HOSTNAME=racnode1-priv", "PUBLIC_IP=172.31.60.150", "PUBLIC_HOSTNAME=racnode1", "SCAN_NAME=racnode-scan", "SCAN_IP=172.31.60.70", "OP_TYPE=INSTALL", "DOMAIN=example.com", "ASM_DISCOVERY_DIR=/oradata", "ORACLE_PWD=Oracle_12c", "ASM_DEVICE_LIST=/oradata/asm_disk01.img,/oradata/asm_disk02.img,/oradata/asm_disk03.img,/oradata/asm_disk04.img,/oradata/asm_disk05.img", "CMAN_HOSTNAME=racnode-cman1", "CMAN_IP=172.31.60.15", "OS_PASSWORD=Oracle_12c", "PATH=/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "SETUP_LINUX_FILE=setupLinuxEnv.sh", "INSTALL_DIR=/opt/scripts", "GRID_BASE=/u01/app/grid", "GRID_HOME=/u01/app/12.2.0/grid", "INSTALL_FILE_1=linuxx64_12201_grid_home.zip", "GRID_INSTALL_RSP=grid.rsp", "GRID_SETUP_FILE=setupGrid.sh", "FIXUP_PREQ_FILE=fixupPreq.sh", "INSTALL_GRID_BINARIES_FILE=installGridBinaries.sh", "INSTALL_GRID_PATCH=applyGridPatch.sh", "INVENTORY=/u01/app/oraInventory", "CONFIGGRID=configGrid.sh", "ADDNODE=AddNode.sh", "ADDNODE_RSP=grid_addnode.rsp", "SETUPSSH=setupSSH.expect", "GRID_PATCH=p27383741_122010_Linux-x86-64.zip", "PATCH_NUMBER=27383741", "SETUPDOCKERORACLEINIT=setupdockeroracleinit.sh", "DOCKERORACLEINIT=dockeroracleinit", "GRID_USER_HOME=/home/grid", "SETUPGRIDENV=setupGridEnv.sh", "DB_BASE=/u01/app/oracle", "DB_HOME=/u01/app/oracle/product/12.2.0/dbhome_1", "INSTALL_FILE_2=linuxx64_12201_database.zip", "DB_INSTALL_RSP=db_inst.rsp", "DBCA_RSP=dbca.rsp", "DB_SETUP_FILE=setupDB.sh", "PWD_FILE=setPassword.sh", "RUN_FILE=runOracle.sh", "STOP_FILE=stopOracle.sh", "ENABLE_RAC_FILE=enableRAC.sh", "CHECK_DB_FILE=checkDBStatus.sh", "USER_SCRIPTS_FILE=runUserScripts.sh", "REMOTE_LISTENER_FILE=remoteListener.sh", "INSTALL_DB_BINARIES_FILE=installDBBinaries.sh", "FUNCTIONS=functions.sh", "COMMON_SCRIPTS=/common_scripts", "CHECK_SPACE_FILE=checkSpace.sh", "EXPECT=/usr/bin/expect", "BIN=/usr/sbin", "container=true", "INSTALL_SCRIPTS=/opt/scripts/install", "SCRIPT_DIR=/opt/scripts/startup", "GRID_PATH=/u01/app/12.2.0/grid/bin:/u01/app/12.2.0/grid/OPatch/:/usr/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "DB_PATH=/u01/app/oracle/product/12.2.0/dbhome_1/bin:/u01/app/oracle/product/12.2.0/dbhome_1/OPatch/:/usr/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "GRID_LD_LIBRARY_PATH=/u01/app/12.2.0/grid/lib:/usr/lib:/lib", "DB_LD_LIBRARY_PATH=/u01/app/oracle/product/12.2.0/dbhome_1/lib:/usr/lib:/lib" ], "Cmd": [ "/usr/sbin/oracleinit" ], "ArgsEscaped": true, "Image": "oracle/database-rac:12.2.0.1", "Volumes": { "/common_scripts": {}, "/dev/shm": {} }, "WorkingDir": "/home/grid", "Entrypoint": null, "OnBuild": null, "Labels": {} }, "NetworkSettings": { "Bridge": "", "SandboxID": "a34e894c5b5e7f9583e82209ca4451ac66de7435f41510e3efaf1a1a1bdbdd54", "HairpinMode": false, "LinkLocalIPv6Address": "", "LinkLocalIPv6PrefixLen": 0, "Ports": {}, "SandboxKey": "/var/run/docker/netns/a34e894c5b5e", "SecondaryIPAddresses": null, "SecondaryIPv6Addresses": null, "EndpointID": "", "Gateway": "", "GlobalIPv6Address": "", "GlobalIPv6PrefixLen": 0, "IPAddress": "", "IPPrefixLen": 0, "IPv6Gateway": "", "MacAddress": "", "Networks": { "rac_priv1_nw": { "IPAMConfig": { "IPv4Address": "192.168.17.150" }, "Links": null, "Aliases": [ "e6bb03d9bac6" ], "NetworkID": "b3d495bff13b20f4539494bc84e11cf0b958c6b2daff201ea112426b467f2507", "EndpointID": "cb79862d02cfa30fc87371575aa2350ab96133f63d3b0d5def8db70dff3362ac", "Gateway": "192.168.17.1", "IPAddress": "192.168.17.150", "IPPrefixLen": 24, "IPv6Gateway": "", "GlobalIPv6Address": "", "GlobalIPv6PrefixLen": 0, "MacAddress": "02:42:c0:a8:11:96" }, "rac_pub1_nw": { "IPAMConfig": { "IPv4Address": "172.31.60.150" }, "Links": null, "Aliases": [ "e6bb03d9bac6" ], "NetworkID": "5f6e430f9fc062f485a6a0bb76f6281862ee7169f3c69979713ec34fa73f34d1", "EndpointID": "1e4188a4ce4b122508a1e20895a58b5444ac9215b73ea9aad81d549c56bbade9", "Gateway": "172.31.60.1", "IPAddress": "172.31.60.150", "IPPrefixLen": 24, "IPv6Gateway": "", "GlobalIPv6Address": "", "GlobalIPv6PrefixLen": 0, "MacAddress": "02:42:ac:1f:3c:96" } } } } ]

docker info

Containers: 7 Running: 2 Paused: 0 Stopped: 5 Images: 50 Server Version: 17.05.0-ce Storage Driver: overlay Backing Filesystem: xfs Supports d_type: true Logging Driver: json-file Cgroup Driver: systemd Plugins: Volume: local Network: bridge host macvlan null overlay Swarm: inactive Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: 9048e5e50717ea4497b757314bad98ea3763c145 runc version: 9c2d8d184e5da67c95d601382adf14862e4f2228 init version: 949e6fa Security Options: seccomp Profile: default Kernel Version: 3.10.0-862.9.1.el7.x86_64 Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo) OSType: linux Architecture: x86_64 CPUs: 2 Total Memory: 7.637GiB Name: ip-172-31-60-254.ec2.internal ID: XEO5:E5RT:FS7T:K6QO:IATG:R7JA:PVOS:WJXV:KZ3P:PDID:P7IL:XNUW Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Username: smbptnk Registry: https://index.docker.io/v1/ Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false

uname -a

Linux ip-172-31-60-254.ec2.internal 3.10.0-862.9.1.el7.x86_64 #1 SMP Wed Jun 27 04:30:39 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux

psaini79 commented 6 years ago

Ok. Here is the issue, Docker must be running on UEK 4 on docker host and your docker engine is running on UEK3 and this is not supported for RAC on Docker. I will also update this in docker RADME.MD but please update docker kernel host to UEK 4 and it should work.

For details, please check following: https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_oracle_docker-2Dimages_issues_955-23issuecomment-2D412339558&d=DwMFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=X58wTh19-3pzEbGg2iSTddwjmPMBoBS8ARmjEV0fzYE&m=UXEYXNQrhwbqoyOmzekm7q0kYMhHpyPVAbrdE8W-bJs&s=7QuklurT0VmDymbHE37BpsI-YpSO3i19Q4sF4URJx8I&e=

Docker version 1.9 and later require that you configure the system to use the Unbreakable Enterprise Kernel Release 4 (UEK R4) and boot the system with this kernel. If you are using either UEK R3 or the Red Hat Compatible Kernel (RHCK), you must configure Yum to allow you to install UEK R4.

psaini79 commented 6 years ago

https://docs.oracle.com/cd/E52668_01/E87205/html/docker_install_upgrade_yum_uek.html

Djelibeybi commented 6 years ago

Actually @psaini79, that's not Oracle Linux at all. It's either RHEL or CentOS running the stock Red Hat 3.10 kernel with Docker 17.05-ce installed.

psaini79 commented 6 years ago

Hi @Djelibeybi, you are right. It is not OEL. I would recommend @smbptnk to test it on OEL 7.5 with UEK 4.