paz-sh / paz

An open-source, in-house service platform with a PaaS-like workflow, built on Docker, CoreOS, Etcd and Fleet. This repository houses the documentation and installation scripts.
http://paz.sh
Other
1.08k stars 56 forks source link

Re-running Paz cluster #9

Closed foliveira closed 9 years ago

foliveira commented 9 years ago

Right now there's no way of running paz other than executing the install-vagrant.sh script (that I'm aware of), which destroys the current cluster and creates a new one (which takes it's time).

Running vagrant up on the coreos-vagrant folder is giving me random Connection timeout messages and even if I can start the cluster without any apparent problems, ssh'ing into each machines shows:

Failed Units: 2
  cadvisor.service
  paz-dnsmasq.service
lukebond commented 9 years ago

@foliveira can you grab some logs via journalctl (see tips in #2) and paste here?

foliveira commented 9 years ago

Running journalctloutputs the following error when I try to access the Services tab in the web interface:

Feb 17 15:51:29 core-02 bash[1689]: {"name":"paz-orchestrator_log","hostname":"a8b6f4aed31c","pid":9,"level":30,"req":{"method":"GET","url":"/services?noEmit=true"},"res":{"statusCode":200,"header":""},"uuid":"6di14q","msg":"","time":"2015-02-17T15:51:29.378Z","src":{"file":"/usr/src/app/middleware/logger.js","line":4,"func":"module.exports"},"v":0}
Feb 17 15:51:29 core-02 bash[1689]: {"name":"paz-orchestrator_log","hostname":"a8b6f4aed31c","pid":9,"level":30,"service.get":"*","uuid":"6di14q","msg":"","time":"2015-02-17T15:51:29.383Z","src":{"file":"/usr/src/app/resources/service/controller.js","line":27,"func":"controller.list"},"v":0}
Feb 17 15:51:49 core-02 bash[1689]: {"name":"paz-orchestrator_log","hostname":"a8b6f4aed31c","pid":9,"level":50,"err":{"message":"getaddrinfo ENOTFOUND","name":"Error","stack":"Error: getaddrinfo ENOTFOUND\n    at errnoException (dns.js:37:11)\n    at Object.onanswer [as oncomplete] (dns.js:124:16)","code":"ENOTFOUND"},"msg":"getaddrinfo ENOTFOUND","time":"2015-02-17T15:51:49.796Z","src":{"file":"/usr/src/app/resources/service/controller.js","line":30},"v":0}
lukebond commented 9 years ago

Thanks.

Looks like the orchestrator cannot find the service directory; it's probably down.

Run list-units (see #2) to see if any services are down, and if so, check the logs to see why and start them again with Fleet.

Services stopping or dying is an unfortunately common occurrence at the moment. I plan to fix this, see #11.

lukebond commented 9 years ago

Further to your original question, I've created #12 to address this.