Open jtgammon opened 8 years ago
I built out a lab with postgres, and ran the backup restore. I did not have any pg_restore errors in the log, but seeing the cloud controller fail to start.
+----------------------------------------------------------------+----------+--------------------------------------------------------------+--------------+ | Job/index | State | Resource Pool | IPs | +----------------------------------------------------------------+----------+--------------------------------------------------------------+--------------+ | ccdb-partition-3f2d3e1323bb74aa36a1/0 | running | ccdb-partition-3f2d3e1323bb74aa36a1 | 10.65.187.82 | | clock_global-partition-3f2d3e1323bb74aa36a1/0 | running | clock_global-partition-3f2d3e1323bb74aa36a1 | 10.65.187.42 | | cloud_controller-partition-3f2d3e1323bb74aa36a1/0 | starting | cloud_controller-partition-3f2d3e1323bb74aa36a1 | 10.65.187.41 | | cloud_controller_worker-partition-3f2d3e1323bb74aa36a1/0 | failing | cloud_controller_worker-partition-3f2d3e1323bb74aa36a1 | 10.65.187.43 | | consoledb-partition-3f2d3e1323bb74aa36a1/0 | running | consoledb-partition-3f2d3e1323bb74aa36a1 | 10.65.187.84 | | consul_server-partition-3f2d3e1323bb74aa36a1/0 | running | consul_server-partition-3f2d3e1323bb74aa36a1 | 10.65.187.33 | | diego_brain-partition-3f2d3e1323bb74aa36a1/0 | running | diego_brain-partition-3f2d3e1323bb74aa36a1 | 10.65.187.45 | | diego_cell-partition-3f2d3e1323bb74aa36a1/0 | running | diego_cell-partition-3f2d3e1323bb74aa36a1 | 10.65.187.46 | | diego_cell-partition-3f2d3e1323bb74aa36a1/1 | running | diego_cell-partition-3f2d3e1323bb74aa36a1 | 10.65.187.47 | | diego_database-partition-3f2d3e1323bb74aa36a1/0 | running | diego_database-partition-3f2d3e1323bb74aa36a1 | 10.65.187.36 | | doppler-partition-3f2d3e1323bb74aa36a1/0 | running | doppler-partition-3f2d3e1323bb74aa36a1 | 10.65.187.48 | | etcd_server-partition-3f2d3e1323bb74aa36a1/0 | running | etcd_server-partition-3f2d3e1323bb74aa36a1 | 10.65.187.35 | | ha_proxy-partition-3f2d3e1323bb74aa36a1/0 | running | ha_proxy-partition-3f2d3e1323bb74aa36a1 | 10.65.187.32 | | loggregator_trafficcontroller-partition-3f2d3e1323bb74aa36a1/0 | running | loggregator_trafficcontroller-partition-3f2d3e1323bb74aa36a1 | 10.65.187.49 | | mysql-partition-3f2d3e1323bb74aa36a1/0 | running | mysql-partition-3f2d3e1323bb74aa36a1 | 10.65.187.40 | | mysql_proxy-partition-3f2d3e1323bb74aa36a1/0 | running | mysql_proxy-partition-3f2d3e1323bb74aa36a1 | 10.65.187.39 | | nats-partition-3f2d3e1323bb74aa36a1/0 | running | nats-partition-3f2d3e1323bb74aa36a1 | 10.65.187.34 | | nfs_server-partition-3f2d3e1323bb74aa36a1/0 | running | nfs_server-partition-3f2d3e1323bb74aa36a1 | 10.65.187.37 | | router-partition-3f2d3e1323bb74aa36a1/0 | running | router-partition-3f2d3e1323bb74aa36a1 | 10.65.187.38 | | uaa-partition-3f2d3e1323bb74aa36a1/0 | running | uaa-partition-3f2d3e1323bb74aa36a1 | 10.65.187.44 | | uaadb-partition-3f2d3e1323bb74aa36a1/0 | running | uaadb-partition-3f2d3e1323bb74aa36a1 | 10.65.187.83 | +----------------------------------------------------------------+----------+--------------------------------------------------------------+--------------+
cloud_controller-partition-3f2d3e1323bb74aa36a1-0-3430133b0907.zip
cloud_controller_worker-partition-3f2d3e1323bb74aa36a1-0-589f5057db19.zip
Saw this in logs:
cloud_controller_worker_ctl.err.log:[2016-05-13 15:31:40+0000] Sequel::DatabaseError: PG::UndefinedTable: ERROR: relation delayed_jobs does not exist cloud_controller_worker_ctl.err.log:[2016-05-13 15:31:40+0000] PG::UndefinedTable: ERROR: relation delayed_jobs does not exist cloud_controller_worker_ctl.err.log:[2016-05-13 15:31:40+0000] Delayed::FatalBackendError: Delayed::FatalBackendError cloud_controller_worker_ctl.err.log:[2016-05-13 15:31:40+0000] Sequel::DatabaseError: PG::UndefinedTable: ERROR: relation delayed_jobs does not exist cloud_controller_worker_ctl.err.log:[2016-05-13 15:31:40+0000] PG::UndefinedTable: ERROR: relation delayed_jobs does not exist
Wondering if we can try pg_restore without the -x option in cfops? Is the only difference between manual working method and cfops.
Not sure what all is included in typical pg_dump but looks like -x is there to prevent restoring roles. Which seems like it should be there if restoring from scratch... Let me know if you have a place to test this and can add a draft release without this option.
I have my lab until Monday, so can test over the weekend. If that doesn't work I can ask for a 1 day extension.
CFOPS ERT restore is failing when the backend is postgres. We did a manual restore using the cfops backup data and it worked. Issue appears to be with the cfops restore. We believe it is the lack of the "--clean" option in the pg_restore command. The cfops restore resulted in the cc being unable to start. 1st instance of CC showed starting, 2nd instance of CC showed stopped, and CC worker showed failing.
We later tried a manual restore and it worked. Below are the working steps that we used:
ERT Restore log: ert.log.gz
error: 2016/04/27 10:50:21 E0427 10:50:21.097533 6309 execute_list.go:15] Process exited with: 1. Reason was: () 2016/04/27 10:50:19 D0427 10:50:19.678913 6309 execute_list.go:12] PGPASSWORD=270a0e10f2739c8f /var/vcap/packages/postgres-9.4.2/bin/pg_restore -h localhost -U vcap -x -p 2544 -c -d uaa /tmp/archive.backup Note: No "--clean" option
Working manual steps: bosh stop cloud_controller-partition-d46fdf5eca88a5c00fc7 0 bosh stop cloud_controller-partition-d46fdf5eca88a5c00fc7 1
ccdb vcap@10.47.104.7 030df4fe78359255
./pg_restore -U vcap -p 2544 -d ccdb --clean /tmp/ccdb.sql
bosh stop uaa-partition-d46fdf5eca88a5c00fc7 0 bosh stop uaa-partition-d46fdf5eca88a5c00fc7 1
uaadb vcap@10.47.104.8 270a0e10f2739c8f
./pg_restore -U vcap -p 2544 -d uaa --clean /tmp/uaadb.backup ./pg_restore -U vcap -p 2544 -v -d console --clean /tmp/consoledb.backup Note: Includes "--clean" option
NFS Remote restore: cat nfs_server.backup | ssh vcap@host "tar xvzf - -C /var/vcap/store" mysql/0 10.47.104.18
mysql -h localhost -u root -p < /tmp/mysql.backup mysql -u root -p -h localhost
bosh start uaa-partition-d46fdf5eca88a5c00fc7 0 bosh start uaa-partition-d46fdf5eca88a5c00fc7 1
bosh start cloud_controller-partition-d46fdf5eca88a5c00fc7 0 bosh start cloud_controller-partition-d46fdf5eca88a5c00fc7 1