puppetlabs-toy-chest / puppetlabs-havana

Multi-node deployment for OpenStack Havana
Apache License 2.0
15 stars 14 forks source link

ceilometer-dbsync fails on first run of controller role #9

Open tangestani opened 10 years ago

tangestani commented 10 years ago

ceilometer-dbsync exits with a failure when applying the controller role for the first time on a clean system.

Notice: /Stage[main]/Ceilometer::Db/Exec[ceilometer-dbsync]/returns: 2014-01-30 12:12:31.182 26917 TRACE ceilometer ConnectionFailure: could not connect to 172.16.33.4:27017: [Errno 111] ECONNREFUSED
Notice: /Stage[main]/Ceilometer::Db/Exec[ceilometer-dbsync]/returns: 2014-01-30 12:12:31.182 26917 TRACE ceilometer
Error: /Stage[main]/Ceilometer::Db/Exec[ceilometer-dbsync]: Failed to call refresh: ceilometer-dbsync --config-file=/etc/ceilometer/ceilometer.conf returned 1 instead of one of [0]
Error: /Stage[main]/Ceilometer::Db/Exec[ceilometer-dbsync]: ceilometer-dbsync --config-file=/etc/ceilometer/ceilometer.conf returned 1 instead of one of [0]

The problem seems to be that puppet executes ceilometer-dbsync immediately after starting the mongod service, which does not always work out well because mongod takes some time to allocate a journal before it will accept incoming connections on 27017. On my VM this process takes about 15 seconds.

hogepodge commented 10 years ago

Yes. While I would consider this to be a bug in the MongoDB startup scripts (my opinion is that they should not return until the database is initialized, precisely because of problems like this), it's something that needs to be reliably addressed. I'm thinking a script that tries n times with m seconds between each try.

benh57 commented 10 years ago

Mine's actually running the ceilometer-dbsync before it installs mongo. So there is some dependency ordering issue here.

benh57 commented 10 years ago

(which is odd considering the explicit arrows in the controller role, mongo is before ceilometer-api)

hogepodge commented 10 years ago

Oh, the role ordering will do almost nothing to ensure the dependency ordering. Contained classes will float away and become unordered. There are workarounds that I find offensive. I'll take a look at making stronger dependency ordering within the profile. It should be possible. Sorry for taking so long to close this.

hogepodge commented 10 years ago

(I'm actually pulling out that ordering in future versions since "the goggles do nothing").

beddari commented 10 years ago

Ordering is the main reason I can't use the stackforge modules for anything other than demo envs :-(

I'm hoping 'contains' will make the situation better in the future

hunner commented 10 years ago

I'm not sure this should especially be fixed this way, but I submitted a patch at https://review.openstack.org/#/c/81950/ to cause ceilometer-dbsync to retry on a failed connection.

It's really mongodb's fault, but we can't really help that.

hunner commented 10 years ago

A better way to solve this would be to make the mongodb::server::service class block on the service using a "validate connection" resource similar to the one in the puppetdb module.

beddari commented 10 years ago

As a note I'm currently using this solution https://github.com/Katello/puppet-service_wait

ltartarini90 commented 10 years ago

Workaround: http://openstack.redhat.com/Workarounds_2014_01#Failed_to_start_mongodb