oravirt / vagrant-vbox-rac

Vagrant config to automatically build one or more Oracle RAC clusters
14 stars 8 forks source link

18c build failing at TASK [oradb-manage-db : manage-db | create/manage database] -[FATAL] ORA-03113: end-of-file on communication channel #5

Open brokedba opened 5 years ago

brokedba commented 5 years ago

Hi Mikael, I have been trying your build for several times now and it still fails at the same stage . I took the liberty to change few attributes ( ips ,hostnames,scan name,dbnames, db_home path ) in the corresponding VagrantFile and host.yml.

It seems that the failure kicks in just before or during PDB creation. As all the prior steps are processed until this task : > [oradb-manage-db : manage-db | create/manage database] dbca creates the db but couldn't bounce the instances .(previous run had successfully installed dbs but stopped at the PDB creation) here is the output I got at the last try :

TASK [oradb-manage-db : manage-db | create/manage database] **** fatal: [london1]: FAILED! => {"changed": false, "msg": "Error - STDOUT: [WARNING] [DBT-09102] Target environment does not meet some optional requirements.\n CAUSE: Some of the optional prerequisites are not met. See logs for details.\n ACTION: Find the appropriate configuration from the log file or from the installation guide to meet the prerequisites and fix this manually.\nPrepare for db operation\n8% complete\nCopying database files\n33% complete\nCreating and st arting Oracle instance\n34% complete\n35% complete\n39% complete\n[FATAL] ORA-03113: end-of-file on communication channel\n\n50% complete\n100% complete\n[FATAL] ORA-03113: end-of-file on communication channel\n\n33% complete\n8% complet e\n0% complete\nLook at the log file \"/u01/app/oracle/cfgtoollogs/dbca/racdb/racdb.log\" for further details.\n, STDERR: , COMMAND: /u01/app/oracle/product/18.3.0.0/db1/bin/dbca -createDatabase -silent -responseFile /u01/stage/rsp/dbca _racdb.rsp -initParams db_create_file_dest=+DATA,db_create_online_log_dest_1=+FRA,db_recovery_file_dest=+FRA,db_recovery_file_dest_size=20G"}

NO MORE HOSTS LEFT * to retry, use: --limit @/vagrant/extra-provision/ansible-oracle/vbox-rac-dc1.retry PLAY RECAP ***** london1 : ok=116 changed=64 unreachable=0 failed=1 london2 : ok=95 changed=53 unreachable=0 failed=0

Ansible failed to complete successfully. Any error output should be visible above. Please fix these errors and try again.

when I check the crs status the instances are there but just not up and runing.

Instance racdb1 is not running on node london1
Instance racdb2 is not running on node london2

any idea what could be the problem here ? . I can show the host.yml and vagrant if you want . Thanks again

brokedba commented 5 years ago

I tried another time and now it stops at the manage tablespace Task

TASK [oradb-manage-pdb : Manage pdb(s)] ****

TASK [oradb-manage-tablespace : Manage tablespaces (db/cdb)] ***

TASK [oradb-manage-tablespace : Manage tablespaces (pdb)] ** failed: [london1] (item=port: 1521 service: PDB tablespace: users content: permanent state: present) => {"changed": false, "item": [{"cdb": "rac_db", "home": "18300-base", "init_parameters": [{"name": "db_create_file_dest", "scope": "both", "state": "present", "value": "+DATA"}], "pdb_name": "PDB", "roles": [{"grants": ["create session", "create table", "select any table", "select any dictionary"], "name": "approle1", "state": "present"}], "services": [{"ai": "racdb2", "name": "app1_service", "pi": "racdb1", "state": "started"}], "state": "present", "users": [{"default_tablespace": "appuser1_data", "grants": ["approle1"], "schema": "appuser1", "state": "present"}]}, {"autoextend": false, "bigfile": true, "content": "permanent", "maxsize": "500M", "name": "users", "next": "5M", "size": "10M", "state": "present"}], "msg": "Could not connect to database - ORA-12514: TNS:listener does not currently know of service requested in connect descriptor, connect descriptor: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=london1)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=PDB)))"} failed: [london1] (item=port: 1521 service: PDB tablespace: appuser1_data content: permanent state: present) => {"changed": false, "item": [{"cdb": "rac_db", "home": "18300-base", "init_parameters": [{"name": "db_create_file_dest", "scope": "both", "state": "present", "value": "+DATA"}], "pdb_name": "PDB", "roles": [{"grants": ["create session", "create table", "select any table", "select any dictionary"], "name": "approle1", "state": "present"}], "services": [{"ai": "racdb2", "name": "app1_service", "pi": "racdb1", "state": "started"}], "state": "present", "users": [{"default_tablespace": "appuser1_data", "grants": ["approle1"], "schema": "appuser1", "state": "present"}]}, {"autoextend": false, "bigfile": true, "content": "permanent", "maxsize": "500M", "name": "appuser1_data", "next": "5M", "size": "10M", "state": "present"}], "msg": "Could not connect to database - ORA-12514: TNS:listener does not currently know of service requested in connect descriptor, connect descriptor: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=london1)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=PDB)))"}

NO MORE HOSTS LEFT *****

brokedba commented 5 years ago

[oracle@racdb1 admin]$tnsping racdb is successful on both nodes though

Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = london-cluster-scan.evilcorp.com)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = racdb)))
OK (40 msec) 

[oracle@racdb2 oracle]$srvctl status database -db racdb

Instance racdb1 is running on node london1
Instance racdb2 is running on node london2

[oracle@racdb2 oracle]$sqlplus / as sysdba Connected to:Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production SYS @ RACDB2:CDB$ROOT:>SELECT NAME, CON_ID, DBID, CON_UID, GUID FROM V$CONTAINERS ORDER BY CON_ID;

NAME                CON_ID       DBID    CON_UID GUID
--------------- ---------- ---------- ---------- --------------------------------
CDB$ROOT                 1 1016529875          1 64A52F53A7683286E053CDA9E80AED76
PDB$SEED                 2 1898853373 1898853373 9276EB713D994B9FE053334EA8C0EAF4

I don't get why ansible play stopped

oravirt commented 5 years ago

Hi, sorry about the late answer. Yes, please show me your hosts.yml, and then I'll see if I can re-produce this

brokedba commented 5 years ago

Hi, No problem Mikael. here is the hosts.yml file

 - basename_vm: london
  num_vm: 2
  hostgroup: vbox-rac-dc1
  domain: evilcorp.com
  box: oravirt/rhel75
  vagrant_user: vagrant
  vagrant_pass: vagrant
  #vagrant_private_key: /vagrant/base-provision/insecure_private_key
  ram: 6196
  cpu: 1
  base_pub_ip: 192.168.78.51
  base_pub_ip_vip: 192.168.78.61
  scan_addresses: 192.168.78.251,192.168.78.252,192.168.78.253
  base_priv_ip: 172.16.100.51
  scan_name: "london-cluster-scan"
  synced_folders:
     #- {src: swrepo, dest: /media/swrepo}
     #- {src: /Users/miksan/Downloads/oracle, dest: /media/swrepo}
     - {src: "D:\\VM\\vagrant\\software", dest: /media/swrepo}
  base_disk_path:
  create_local_disk: true
  local_disks:
     - {name: u01, size: 75, count: 1}
  create_shared_disk: true
  shared_disks:
     - {name: crs, size: 40, count: 1}
     - {name: data, size: 8, count: 1}
     - {name: fra, size: 12, count: 2}
  provisioning: extra-provision/ansible-oracle/vbox-rac-dc1.yml
  provisioning_env_override: true

and here is the Vagrantfile (added scan_name and oracle_home vars) https://bit.ly/2kCXGna

I think I know where the problem comes from . here is the content of oravirt\extra-provision\ansible-oracle\group_vars\vbox-rac-dc1\databases.yml

 oracle_databases:
      - home: 18300-base
        oracle_db_name: rac_db
        oracle_db_type: RAC
        is_container: True
        storage_type: ASM
        oracle_db_mem_totalmb: 1024
        oracle_database_type: MULTIPURPOSE
        redolog_size: 75M
        redolog_groups: 3
        datafile_dest: '+DATA'
        recoveryfile_dest: '+FRA'
        archivelog: True
        flashback: False
        force_logging: False
        state: present
        # tablespaces:
             # - { name: users, size: 10M, bigfile: True, autoextend: false , next: 5M, maxsize: 500M, content: permanent, state: present }
        init_parameters:
             - {name: db_create_file_dest, value: '+DATA', scope: both, state: present}
             - {name: db_create_online_log_dest_1, value: '+FRA', scope: both, state: present}
             - {name: db_recovery_file_dest, value: '+FRA', scope: both, state: present}
             - {name: db_recovery_file_dest_size, value: 20G, scope: both, state: present}

oracle database dbca creation uses the variable oracle_db_name in my case rac_db but it seems to ignore the underscore durring creation (name is now racdb) . So I guess the dbca create pdb may look for a db that doesn't exists (rac_db ) . I am going to retry that with racdb and let you know " btw i dont understand why you split pdb and db creation in 2 tasks. Thx

brokedba commented 5 years ago

no it didn't fix it.

oravirt commented 5 years ago

Thanks! I haven't had time to look at this yet, sorry about that.

btw i dont understand why you split pdb and db creation in 2 tasks.

I may already have a CDB and only want to create a PDB and then it doesn't make sense to run the DB provisioning step. The PDB lifecycle may be totally different to the CDB, so breaking up the management of them makes sense.

brokedba commented 5 years ago

Hi Mikaël, I finally found the root cause of the issue. it was due to memory starvation . I don't know if you have already run this build with that low amount of ram but I had to increase the ram in host.yml to

ram: 7168

the database creation was hanging because there was too much swapping in node 1.

               Cluster vbox-rac-dc1

      Listener   |      Port      |      london1      |      london2      |     Type     |
  ---------------------------------------------------------------------------------------
   LISTENER      | TCP:1521       |       Online      |       Online      |   Listener   |
   LISTENER_SCAN1| TCP:1521       |         -         |       Online      |     SCAN     |
   LISTENER_SCAN2| TCP:1521       |       Online      |         -         |     SCAN     |
   LISTENER_SCAN3| TCP:1521       |       Online      |         -         |     SCAN     |
  ---------------------------------------------------------------------------------------

         DB      |     Version    |      london1      |      london2      |    DB Type   |
  ---------------------------------------------------------------------------------------
   racdb         | 18.3.0.0   (1) |        Open       |        Open       |    RAC (P)   |
  ---------------------------------------------------------------------------------------
  ORACLE_HOME references listed in the Version column

         1 : /u01/app/oracle/product/18.3.0.0/db1       oracle oinstall

       : Has been restarted less than 24 hours ago
brokedba commented 5 years ago

you said ,

I may already have a CDB and only want to create a PDB. and then it doesn't make sense to run the DB provisioning step.

how can that be if you are creating the build from scratch . if I read your yaml well I see manage_db role called first then quite later manage-pdb . so at worse we can introduce a boulean var that decides on creating a db with pdb or not and call dbca once with one template responsefile.

extra-provision/ansible-oracle/vbox-rac-dc1.yml

  • name: Database Server Installation & Database Creation hosts: vbox-rac-dc1 user: vagrant become: yes roles:
    • oraswdb-install
    • {role: oraswdb-manage-patches, when: apply_patches_db} - oradb-manage-db
  • name: Configure Logrotate hosts: vbox-rac-dc1 user: vagrant become: yes roles:

    • orahost-logrotate
  • name: Customize database hosts: vbox-rac-dc1 user: vagrant sudo: yes roles: - oradb-manage-pdb

    • oradb-manage-tablespace
    • oradb-manage-parameters
    • oradb-manage-redo
    • oradb-manage-roles
    • oradb-manage-users
    • oradb-manage-grants
    • oradb-manage-services
oravirt commented 5 years ago

I finally found the root cause of the issue. it was due to memory starvation

Great!

I don't know if you have already run this build with that low amount of ram

Yes, I have but it was slow

how can that be if you are creating the build from scratch

Yes, for this specific Vagrant project it builds everything from scratch, but the toolkit (ansible-oracle) is it's own project which can be run completely stand-alone and is just used as the provisioning step for this project So, your're welcome to fork the project and customize it in whatever way you want (or start your own project) but this is the way it is treated for this project. I don't really understand why you think it is a problem to have this in 2 separate tasks?

I try to write my Ansible roles such that they do 1 thing. So I think that bundling the pdb creation with the create (C)DB step is bad idea.

brokedba commented 5 years ago

Hi Mikael, Sorry I didn't mean to criticize your work . As a matter of fact I find the build amazing . I was just curious about the rac provisioning tasks and rules used in your build . I reckon now that ansible-oracle toolkit has a larger scope than just the rac build. I didn't say it was a bad idea or a problem I just wanted to know more about the context of your ansible provisioning . Being new in ansible I had to read all the extra provisioning /group_vars roles involved in the playbook one by one to know what was actually happening. I even sent you emails before so I could understand it better . I am interested to add rhel8 to the lists of boxes for this build (in a fork for example). By the way how can we change the cluster name without messing up the hostgroup variable ? I also wanted to know where does the oracle user password is defined. Thank you again and sorry if my message seemed demanding .

yevp commented 4 years ago

Hi @KoussHD , Hash from default password is defined there: https://github.com/oravirt/ansible-oracle/blob/master/roles/orahost/defaults/main.yml relative path inside vagrant-vbox-rac ./extra-provision/ansible-oracle/roles/orahost/defaults/main.yml

  oracle_users:         # Passwd :Oracle123
   - { username: oracle, uid: 54321, primgroup: oinstall, othergroups: "dba,asmadmin,asmdba,backupdba,dgdba,kmdba,oper", passwd: "$6$0xHoAXXF$K75HKb64Hcb/CEcr3YEj2LGERi/U2moJgsCK.ztGxLsKoaXc4UBiNZPL0hlxB5ng6GL.gyipfQOOXplzcdgvD0" }

you could generate hash and override default with variable-precedence

$python -c 'import crypt; print crypt.crypt("Oracle123", "$6$0xHoAXXF$")'
$6$0xHoAXXF$K75HKb64Hcb/CEcr3YEj2LGERi/U2moJgsCK.ztGxLsKoaXc4UBiNZPL0hlxB5ng6GL.gyipfQOOXplzcdgvD0
brokedba commented 4 years ago

thank you @yevp . It has been a while since I ran the build and deleted it few days later . But I'll give a try this weekend .