puntonim / ansible-biostar

An Ansible playbook to automatize the deployment of a Biostar-based project using Docker containers
http://painl.es/ansible-biostar/
GNU General Public License v3.0
10 stars 3 forks source link

Errors with PostgreSQL Docker image #30

Closed gawbul closed 9 years ago

gawbul commented 9 years ago

While trying to get a DigitalOcean droplet working with this code I was getting time outs while waiting for the PostgreSQL container to start:

TASK: [docker_postgresql | Wait for the new container to listen on SSH port] *** 
failed: [178.62.34.112] => {"elapsed": 300, "failed": true}
msg: Timeout when waiting for 127.0.0.1:2223

FATAL: all hosts have already failed -- aborting

I tried running the following manually:

docker pull nimiq/postgresql93
docker run -d -p 2223:22 -p 5432:5432 --name postgresql --volume=/srv/docker-volumes/pgdata:/srv/pgdata -e PG_USERNAME=user -e PG_PASSWORD=pw -e SSH_PUBLIC_KEY="ssh-rsa ABC123... ubuntu@biostar" nimiq/postgresql93

The container wasn't showing up in docker ps. But using docker logs with the container ID gives:

*** Running /etc/my_init.d/00_regen_ssh_host_keys.sh...
*** Running /etc/my_init.d/01_start_postgresql.sh...
 * Starting PostgreSQL 9.3 database server
 * The PostgreSQL server failed to start. Please check the log output:
2014-11-29 23:30:36 UTC FATAL:  could not access private key file "/etc/ssl/private/ssl-cert-snakeoil.key": Permission denied
   ...fail!
*** /etc/my_init.d/01_start_postgresql.sh failed with status 1

*** Killing all processes...

Some issue with permissions for the ssh host keys in the container?

gawbul commented 9 years ago

Added pull request to fix https://github.com/nimiq/docker-postgresql93/pull/1

gawbul commented 9 years ago

Getting failures while waiting to SSH into the container? Does this need openssh-server installing too? I am currently testing on my own fork.

puntonim commented 9 years ago

Sorry for my late reaction Steve, I am very busy these days. Expect my comment in 30 min!

gawbul commented 9 years ago

No problem.

puntonim commented 9 years ago

First of all, thanks a lot for your effort in porting Ansible-Biostar to DigitalOcean!

We chose to base this PostgreSQL container on phusion/baseimage instead of the original Ubuntu image because it seems more reliable and provides cool features. Check out here what the problems related to the original Ubuntu image are.

We used phusion/baseimage in Ansible-Biostar too and it works well. In Ansible-Biostar we make use of the cool phusion/baseimage-specific features like the Runit system and the integrated SSH server; in this PostgreSQL container instead we only use the integrated SSH server.

So I'd prefer not to switch from phusion/baseimage to an original Ubuntu image at the moment, unless there is a strong reason. And I believe that the error you are experiencing is related to some DigitalOcean specific thing. This is because I tested this Docker container on a bare metal Ubuntu machine with Docker installed, on a Amazon EC2 instance with Ubuntu OS and Docker and a Vagrant + Virtualbox machine with Docker. If you need to know more (like what Docker version I used) I can test it again and take better notes.

Let's try to fix this DigitalOcean specific issue. My first guess: PostgreSQL 9.3 database server is probably run by the postgres user. Does this user have read access to /etc/ssl/private/ssl-cert-snakeoil.key?

gawbul commented 9 years ago

These are the permissions on the key:

# ls -alh /etc/ssl/private/ssl-cert-snakeoil.key 
-rw-r----- 1 root ssl-cert 1.7K Dec  1 17:16 /etc/ssl/private/ssl-cert-snakeoil.key

postgres appears to be in the ssl-cert group:

# groups postgres
postgres : postgres ssl-cert
gawbul commented 9 years ago

Found this https://github.com/Painted-Fox/docker-postgresql/issues/30. Seems to be the cause of the problem?

Testing rolling back to phusion/baseimage:0.9.13, though perhaps the AUFS work around would be better? Will test that after.

gawbul commented 9 years ago

Got this with roll back to 0.9.13:

$ docker run -ti --rm -p 2223:22 -p 5432:5432 --name postgresql --volume=/mylocaldir:/srv/pgdata -e "PG_USERNAME=myuser" -e "PG_PASSWORD=mypass" -e "SSH_PUBLIC_KEY=$MY_SSH_KEY" postgresql93-fix /sbin/my_init -- bash
*** Running /etc/my_init.d/00_regen_ssh_host_keys.sh...
*** Running /etc/my_init.d/01_start_postgresql.sh...
 * Starting PostgreSQL 9.3 database server                                                                                                               [ OK ] 
*** Running /etc/my_init.d/02_init_container.sh...
 * Adding public SSH key...
grep: /root/.ssh/authorized_keys: No such file or directory
 * Creating PostgreSQL user myuser...
CREATE ROLE
CREATE DATABASE
 * Moving PostgreSQL data...
 * Stopping PostgreSQL 9.3 database server                                                                                                               [ OK ] 
9.3/main (port 5432): down
/srv/pgdata/data or /srv/pgdata/logs are not empty dirs. Operation aborted.
 * Starting PostgreSQL 9.3 database server                                                                                                               [ OK ] 
*** Running /etc/rc.local...
*** Booting runit daemon...
*** Runit started as PID 157
*** Running bash...

All good! Was going to test the AUFS work around with the lineinfile module in the Ansible playbook, but thinking about it, that wouldn't solve the problem for those users just using the docker image.

puntonim commented 9 years ago

Ok, this is a good catch!

So it seems that the problem is not related to DigitalOcean at all, instead it seems to be related to the latest phusion/baseimage version. The best solutions seems to be changing this line to:

FROM phusion/baseimage:0.9.13

Am I correct?

gawbul commented 9 years ago

Created new issue here https://github.com/nimiq/docker-postgresql93/issues/2.

gawbul commented 9 years ago

This is a Docker AUFS bug (https://github.com/docker/docker/issues/783) - workaround is to roll back to phusion/baseimage:9.1.13 rather then latest. This is recommend by phusion anyway - https://github.com/phusion/baseimage-docker#getting_started. Closing this issue.