vmware-archive / cfops

This is simply an automation that is based on the supported way to back up Pivotal Cloud Foundry
http://www.cfops.io
Apache License 2.0
35 stars 24 forks source link

Unable to start cloud controller workers after restore #53

Closed aboik closed 8 years ago

aboik commented 8 years ago

Using v2.0.53 From /var/vcap/data/sys/log/cloud_controller_worker_ctl.err.log on cloud controller worker partitions:

[2016-01-30 02:36:54+0000] ------------ STARTING cloud_controller_worker_ctl at Sat Jan 30 02:36:54 UTC 2016 -------------- [2016-01-30 02:36:54+0000] chown: changing ownership of ‘/var/vcap/nfs/shared’: Operation not permitted [2016-01-30 02:36:54+0000] chown: changing ownership of ‘/var/vcap/nfs/shared’: Operation not permitted [2016-01-30 02:36:54+0000] chown: changing ownership of ‘/var/vcap/nfs/shared’: Operation not permitted [2016-01-30 02:36:56+0000] rake aborted! [2016-01-30 02:36:56+0000] Sequel::DatabaseError: PG::InvalidSchemaName: ERROR: no schema has been selected to create in [2016-01-30 02:36:56+0000] PG::InvalidSchemaName: ERROR: no schema has been selected to create in [2016-01-30 02:36:56+0000] Tasks: TOP => jobs:generic

Seeing the following error repeated in the postgres log for ccdb as well as a similar error for uaadb:

2016-01-30 02:05:02.413 GMT: STATEMENT: CREATE TABLE "schema_migrations" ("filename" text PRIMARY KEY) 2016-01-30 02:05:02.559 GMT: ERROR: relation "schema_migrations" does not exist at character 27 2016-01-30 02:05:02.559 GMT: STATEMENT: SELECT NULL AS "nil" FROM "schema_migrations" LIMIT 1 2016-01-30 02:05:02.560 GMT: ERROR: no schema has been selected to create in

xchapter7x commented 8 years ago

Can you send over some information so we can help dig a bit on this issue:

thanks

aboik commented 8 years ago

The above errors appeared after trying to start the cloud controller/cc workers. The bosh start <cc_job> and bosh start <cc_worker_job> commands I ran failed after a timeout period, and I investigated by ssh'ing to the cc worker vms and noticed the first error in the error log repeated - it kept trying to start the cloud controller worker and failed each time. I ssh'ed to the ccdb vm and noticed the second error in the /var/vcap/sys/log/postgres/postgresql.log, and a similar error appeared in the postgresql log on the uaadb vm. The consoledb had no such error in the postgres logs.

xchapter7x commented 8 years ago

the cc jobs should be stopped/started by cfops. what is the context in which we need to run bosh start or interact directly with bosh?

just trying to connect all the dots so i can more reliably reproduce your environment. let me know, thanks.

aboik commented 8 years ago

Well, I noticed after running cfops and waiting a while the cc jobs were still in a failing/starting state. I tried to start them manually to see what was preventing them from starting.

aboik commented 8 years ago

Closing this issue see https://github.com/pivotalservices/cfops/issues/55 for root cause.