google / trillian

A transparent, highly scalable and cryptographically verifiable data store.
Apache License 2.0
3.54k stars 379 forks source link

[docker-compose] containers unable to connect to MariaDB #1164

Closed DazWilkin closed 6 years ago

DazWilkin commented 6 years ago

docker-compose up --build yields:

trillian-log-server_1  | W0529 21:19:53.108605       7 tree_storage.go:81] Failed to set strict mode on mysql db: dial tcp 172.24.190.65:3306: getsockopt: connection timed out
trillian-log-server_1  | F0529 21:19:53.109744       7 main.go:99] Failed to get storage provider: dial tcp 172.24.190.65:3306: getsockopt: connection timed out
trillian-log-server_1  | W0529 21:22:04.180614       7 tree_storage.go:81] Failed to set strict mode on mysql db: dial tcp 172.24.190.65:3306: getsockopt: connection timed out
trillian-log-server_1  | F0529 21:22:04.181499       7 main.go:99] Failed to get storage provider: dial tcp 172.24.190.65:3306: getsockopt: connection timed out
deployment_trillian-log-signer_1 exited with code 1
trillian-log-signer_1  | I0529 21:17:42.815491       7 main.go:82] **** Log Signer Starting ****
trillian-log-signer_1  | W0529 21:19:53.108579       7 tree_storage.go:81] Failed to set strict mode on mysql db: dial tcp 172.24.190.65:3306: getsockopt: connection timed out
trillian-log-signer_1  | F0529 21:19:53.109188       7 main.go:88] Failed to get storage provider: dial tcp 172.24.190.65:3306: getsockopt: connection timed out
trillian-log-signer_1  | I0529 21:19:55.425295       7 main.go:82] **** Log Signer Starting ****
trillian-log-signer_1  | W0529 21:22:06.228573       7 tree_storage.go:81] Failed to set strict mode on mysql db: dial tcp 172.24

and trillian-db-seed appears to never get a connection (through wait-for-it) to the database but I think it's not wait-for-it that's at fault as if I replace this with a long sleep 15s and try that way, I get errors too:

trillian-db-seed_1     | - Using MySQL Flags: -h mysql -u root --host localhost --port 3306
trillian-db-seed_1     | Warning: about to destroy and reset database 'test'
trillian-db-seed_1     | 
trillian-db-seed_1     | Resetting DB...
trillian-db-seed_1     | ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2 "No such file or directory")

IIUC it should not be attempting to connect through that socket but the port. The output suggests that it is using localhost:3306 but this doesn't work even if I change this to mysql using $DB_HOST.

trillian-db-seed_1     | ERROR 2003 (HY000): Can't connect to MySQL server on 'mysql' (110 "Connection timed out")

The database is ready:

2018-05-29 21:17:42 140010780272576 [Note] mysqld: ready for connections.
Version: '10.1.33-MariaDB-1~jessie'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  mariadb.org binary distribution

If I add adminer into the compose file, I'm able to reach the database using it:

  adminer:
    image: adminer
    restart: always
    ports:
      - 8080:8080
    links:
    - mysql:db

and then I can localhost:8080 with ${DB_PASSWORD} image In this case, it appears to have gotten as far as creating (a) test DB but it contains no tables; (b) test user.

There's something in the bowels of docker-compose and|or MySQL(MariaDB) and|or Trillian that I'm not seeing because I'm entirely unable to understand what's broken and how to fix it and would value some inspiration.

docker --version
Docker version 18.03.0-ce, build 0520e24

and

docker-compose --version
docker-compose version 1.21.0, build 5920eb0
DazWilkin commented 6 years ago

Ugh! I've got it mostly working but there's still a race condition with the trillian_db_seed that results in this not always working initially on a clean (docker system prune) system :-(

https://github.com/google/trillian/blob/master/examples/deployment/docker-compose.yml

I have docker-compose working but I'd value some insights into which of the several changes is required. I'm confident it's some combination of several.

host=mysql does not work

The docker-compose.yml uses references to mysql (I believe in reference to service: mysql) in an attempt to reference the database. For me this does not work. After much consternation, I realized that my debugging included a possible solution. This worked:

  adminer:
    image: adminer
    restart: always
    ports:
      - 8080:8080
    depends_on:
      - mysql
    links:
      - mysql:db

The other containers don't assume (unlike adminer) that the container is called db but it struck me that perhaps the links is necessary and I think (!?) this provides a way to uniquely reference other containers. When I run docker-compose without links the database is not actually called mysql but is called e.g. mysql_1. Using links for all the containers appears to solve host resolution.

So each service needs:

    links:
    - mysql:db

And then references to the database become e.g. test:zaphod@tcp(db:3306)/test

protocol=TCP appears to be required

I tried running MySQL and MariaDB containers standalone and then connecting to them from a 2nd container as a client. The only way I could get this to work was when I included --protocol=TCP to force comms over the TCP port. Unfortunately, resetdb.sh does not include this flag (see next)

docker run \
--env=MYSQL_ROOT_PASSWORD=$DB_PASSWORD \
-it \
--publish=3306:3306 \
mysql:8

Or:

docker run \
--env=MYSQL_ROOT_PASSWORD=$DB_PASSWORD \
-it \
--publish=3306:3306 \
mariadb:10.3.7

And:

docker run \
--interactive \
--tty  \
--net=host mysql:8 \
mysql \
  --user=root \
  --password=${DB_PASSWORD} \
  --host=localhost \
  --port=3306 \
  --protocol=TCP \
  --execute="show databases;"

Works reliably.

resetdb.sh

I'm loathe to propose changes to resetdb.sh knowing that this is used elsewhere but it appears that this must include the --protocol=TCP as a flag to address the above issue.

While I was in here I made the password property consistent with the other long-names so -p --> --password=

The result:

#!/bin/bash

set -e

usage() {
  echo "$0 [--force] [--verbose] ..."
  echo "accepts environment variables:"
  echo " - DB_NAME"
  echo " - DB_USER"
  echo " - DB_PASSWORD"
  echo " - DB_HOST"
  echo " - DB_PORT"
  echo " - DB_PROT"
}

collect_vars() {
  # set unset environment variables to defaults
  [ -z ${DB_USER+x} ] && DB_USER="root"
  [ -z ${DB_NAME+x} ] && DB_NAME="test"
  [ -z ${DB_HOST+x} ] && DB_HOST="localhost"
  [ -z ${DB_PORT+x} ] && DB_PORT="3306"
  [ -z ${DB_PROT+x} ] && DB_PROT="TCP"
  FLAGS=()

  # handle flags
  FORCE=false
  VERBOSE=false
  while [[ $# -gt 0 ]]; do
    case "$1" in
      --force) FORCE=true ;;
      --verbose) VERBOSE=true ;;
      *) FLAGS+=("$1")
    esac
    shift 1
  done

  FLAGS+=(--user="${DB_USER}")
  FLAGS+=(--host="${DB_HOST}")
  FLAGS+=(--port="${DB_PORT}")
  FLAGS+=(--protocol="${DB_PROT}")

  # Optionally print flags (before appending password)
  [[ ${VERBOSE} = 'true' ]] && echo "- Using MySQL Flags: ${FLAGS[@]}"

  # append password if supplied
  [ -z ${DB_PASSWORD+x} ] || FLAGS+=(--password="${DB_PASSWORD}")
}

main() {
  collect_vars "$@"

  readonly TRILLIAN_PATH=$(go list -f '{{.Dir}}' github.com/google/trillian)

  # what we're about to do
  echo "Warning: about to destroy and reset database '${DB_NAME}'"

  [[ ${FORCE} = true ]] || read -p "Are you sure? [Y/N]: " -n 1 -r
  echo # Print newline following the above prompt

  if [ -z ${REPLY+x} ] || [[ $REPLY =~ ^[Yy]$ ]]
  then
      echo "Resetting DB..."
      echo "Flags: ${FLAGS[@]}"
      mysql "${FLAGS[@]}" -e "DROP DATABASE IF EXISTS ${DB_NAME};"
      mysql "${FLAGS[@]}" -e "CREATE DATABASE ${DB_NAME};"
      mysql "${FLAGS[@]}" -e "GRANT ALL ON ${DB_NAME}.* TO '${DB_NAME}' IDENTIFIED BY 'zaphod';"
      mysql "${FLAGS[@]}" -D ${DB_NAME} < ${TRILLIAN_PATH}/storage/mysql/storage.sql
      echo "Reset Complete"
  fi
}

main "$@"

docker-compose.yml

Because of the minimal Dockerfiles and a desire to make Docker Compose and Kubernetes consistent, I've pulled much of the formatting into the docker-compose.yml file.

Unfortunately, whereas Kubernetes permits environment variables (values) to be incorporated into the runtime command-line, Docker Compose does not and so the Docker Compose variant of the file is more static than I'd prefer. These variables could possibly (!?) be recreated and provided by the environment (e.g. ${DB_FLAG} and ${DB_PROVIDER} etc. as is done with ${DB_PASSWORD} but for now:

Also dropped -u mysql from the trillian-db-seed command it is redundant if DB_HOST=db is provided too.

version: "3.2"
services:

  mysql:
    image: mariadb:10.3.7
    restart: always
    environment:
      - MYSQL_ROOT_PASSWORD=${DB_PASSWORD}

  trillian-db-seed:
    build:
      context: ../..
      dockerfile: ./examples/deployment/docker/db_client/Dockerfile
    depends_on:
      - mysql
    links:
      - mysql:db
    environment:
      - DB_USER=root
      - DB_PASSWORD=${DB_PASSWORD}
      - DB_HOST=db
      - DB_PORT=3306
      - DB_PROT=TCP
    command: ./scripts/resetdb.sh --verbose --force

  trillian-log-server:
    build:
      context: ../..
      dockerfile: examples/deployment/docker/log_server/Dockerfile.new
    restart: always
    ports:
      - "8090:8090"
      - "8091:8091"
    depends_on:
      - mysql
    links:
      - mysql:db
    environment: 
      - DB_USER=root
      - DB_PASSWORD=$DB_PASSWORD
    command: "--mysql_uri=test:zaphod@tcp(db:3306)/test --storage_system=mysql --rpc_endpoint=0.0.0.0:8090 --http_endpoint=0.0.0.0:8091 --alsologtostderr"

  trillian-log-signer:
    build:
      context: ../..
      dockerfile: examples/deployment/docker/log_signer/Dockerfile.new
    restart: always
    ports:
      - "8092:8091"
    depends_on:
      - mysql
    links:
      - mysql:db
    environment: [
      "DB_USER=root",
      "DB_PASSWORD=$DB_PASSWORD",
    ]
    command: "--mysql_uri=test:zaphod@tcp(db:3306)/test --storage_system=mysql --http_endpoint=0.0.0.0:8091 --sequencer_guard_window=0s --sequencer_interval=300ms --num_sequencers=10 --batch_size=2000 --force_master=true --alsologtostderr"

Feedback and guidance welcome.

jsha commented 6 years ago

I don't know if I can answer all your questions, but in Boulder, we use docker-compose routinely for all our testing, along with a mysql container and an initialization script. See these files:

https://github.com/letsencrypt/boulder/blob/master/docker-compose.yml https://github.com/letsencrypt/boulder/blob/master/test/create_db.sh https://github.com/letsencrypt/boulder/blob/master/test/entrypoint.sh

It's worth noting that links is deprecated: https://docs.docker.com/compose/compose-file/#links. We recently switched to using aliases with a lot of success. It doesn't matter what your database container is called. All you really care about is that it be resolvable under a predictable name from the other containers. aliases accomplishes that.

I think the reason you find yourself needing to add --protocol=TCP to your mysql command line is that mysql special-cases the hostname "localhost" and tries to use a Unix domain socket to connect to localhost. You can override that with --protocol=TCP. However, it seems like you are actually running your mysql client commands on a different container, which will reference the mysql container as mysql (or db if you like). If you change your mysql commands to have -h mysql, you probably won't need --protocol=TCP anymore. As an added benefit, that should allow you to get rid of --publish=3306:3306, which shouldn't be necessary.

One tip that I ran into recently: When we run docker-compose up, everything is hunky-dory. When we run things like docker-compose run boulder ping boulder-mysql, it doesn't work. We realized we needed to run with --use-aliases. Otherwise docker-compose assumes you don't want the aliases when using run.

DazWilkin commented 6 years ago

Very helpful. Thank you.

I'm more familiar with Kubernetes and inadvertently broke Trillian's Docker Compose while trying to refine the Kubernetes deployment.

I'll review this tomorrow. Your suggestions return me to a place closer to where the Docker Compose files were but, with those, I could not get the containers taking to the MySqL instance.

Thanks for the guidance!

DazWilkin commented 6 years ago

@jsha Thanks! I have it working and your comment was very helpful.

The Docker Compose works when I define a network and use aliases (at your suggestion).

The Database (container) doesn't need the --port and --protocol flags as your suggested.

However, I'm also able to get it working if I don't use mysql as the service name ...

This, I don't understand :-(

That's closer to how the file was originally (albeit) broken so I'm going to stick closer to it.

Thanks very much for the help!

jsha commented 6 years ago

However, I'm also able to get it working if I don't use mysql as the service name ...

When you say "as the service name," you mean the heading in the yaml file? E.g.

services:
  mysql:

That's expected if you're using aliases. In other words, if you configure component X to have an alias "mysql" that points to the MySQL container, then things running in that component can get the IP address of the MySQL container by looking up the name "mysql".

If you want to post your current config I can take a look.

DazWilkin commented 6 years ago

Right! But that's not what wasn't working ;-)

The following (hypothetical) Docker Compose file does not work for me (but it should). test timesout unable to connect to mysql:

version: '3'
services:
  mysql:
    image: mariadb:10.1
    environment:
      - MYSQL_ROOT_PASSWORD=${DB_PASSWORD}
  test:
    image: mysql:8
    restart: always
    depends_on:
    - mysql
    environment:
      - MYSQL_USER="root"
      - MYSQL_PWD=${DB_PASSWORD}
    command: mysql --host=mysql --execute="show databases;"

yields:

test_1   | ERROR 2003 (HY000): Can't connect to MySQL server on 'mysql' (110)

If I rename the service to something (anything?) other than mysql, e.g. db or xxx, it works:

version: '3'
services:
  xxx:
    image: mariadb:10.1
    environment:
      - MYSQL_ROOT_PASSWORD=${DB_PASSWORD}
  test:
    image: mysql:8
    restart: always
    depends_on:
    - xxx
    environment:
      - MYSQL_USER="root"
      - MYSQL_PWD=${DB_PASSWORD}
    command: mysql --host=xxx --execute="show databases;"

yields:

test_1              | Database
test_1              | information_schema
test_1              | mysql
test_1              | performance_schema
test_1              | Database
test_1              | information_schema
test_1              | mysql
test_1              | performance_schema
jsha commented 6 years ago

What command are you running? The above roughly works for me with docker-compose up or docker-compose run --use-aliases test. If you use docker-compose run without --use-aliases it definitely won't work.

Also note that there are some timing issues around waiting for mysql to come up before you try to connect to it. Your simplified example here doesn't include a wait-for-it command but I gather from the previous comments that there's a thing called wait-for-it used by the "real" containers, which waits for a given TCP port to become available.

DazWilkin commented 6 years ago

I've been using:

docker-compose --file=$PWD/docker-compose.yml up --remove-orphans

When I want to be more emphatic about cleaning the slate:

docker system prune --force && \
docker-compose --file=$PWD/docker-compose.yml up --remove-orphans --build

Yes, in the case above, service: mysql takes time to fail but, once it timesout, it's done for good. With service: something-else, initially timeouts pass as the database boots and then it's golden.

Yes, the Trillian folks use wait-for-it to help:

./wait-for-it.sh -t 0 mysql:3306 -- ./scripts/resetdb.sh --verbose --force
DazWilkin commented 6 years ago

Here's the logs from mysql; even after the database container comes ready, the client remains unable to connect to it:

mysql_1  | Version: '10.1.33-MariaDB-1~jessie'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  mariadb.org binary distribution
test_1   | ERROR 2003 (HY000): Can't connect to MySQL server on 'mysql' (110)
deployment_test_1 exited with code 1
test_1   | ERROR 2003 (HY000): Can't connect to MySQL server on 'mysql' (110)
test_1   | ERROR 2003 (HY000): Can't connect to MySQL server on 'mysql' (110)

Whereas, with anything-other-than-mysql (henryhoops), it fails quickly until the database comes ready:

henryhoops_1  | Initializing database
test_1        | ERROR 2003 (HY000): Can't connect to MySQL server on 'henryhoops' (111)
deployment_test_1 exited with code 1
test_1        | ERROR 2003 (HY000): Can't connect to MySQL server on 'henryhoops' (111)
test_1        | ERROR 2003 (HY000): Can't connect to MySQL server on 'henryhoops' (111)
test_1        | ERROR 2003 (HY000): Can't connect to MySQL server on 'henryhoops' (111)
deployment_test_1 exited with code 1
test_1        | ERROR 2003 (HY000): Can't connect to MySQL server on 'henryhoops' (111)
test_1        | ERROR 2003 (HY000): Can't connect to MySQL server on 'henryhoops' (111)
test_1        | ERROR 2003 (HY000): Can't connect to MySQL server on 'henryhoops' (111)
test_1        | ERROR 2003 (HY000): Can't connect to MySQL server on 'henryhoops' (111)
test_1        | ERROR 2003 (HY000): Can't connect to MySQL server on 'henryhoops' (111)
deployment_test_1 exited with code 1
test_1        | ERROR 2003 (HY000): Can't connect to MySQL server on 'henryhoops' (111)
test_1        | ERROR 2003 (HY000): Can't connect to MySQL server on 'henryhoops' (111)
test_1        | ERROR 2003 (HY000): Can't connect to MySQL server on 'henryhoops' (111)
test_1        | ERROR 2003 (HY000): Can't connect to MySQL server on 'henryhoops' (111)
test_1        | ERROR 2003 (HY000): Can't connect to MySQL server on 'henryhoops' (111)
test_1        | ERROR 2003 (HY000): Can't connect to MySQL server on 'henryhoops' (111)
test_1        | ERROR 2003 (HY000): Can't connect to MySQL server on 'henryhoops' (111)
henryhoops_1  | Version: '10.1.33-MariaDB-1~jessie'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  mariadb.org binary distribution
deployment_test_1 exited with code 0
test_1        | ERROR 2003 (HY000): Can't connect to MySQL server on 'henryhoops' (111)
test_1        | ERROR 2003 (HY000): Can't connect to MySQL server on 'henryhoops' (111)
test_1        | ERROR 2003 (HY000): Can't connect to MySQL server on 'henryhoops' (111)
test_1        | ERROR 2003 (HY000): Can't connect to MySQL server on 'henryhoops' (111)
test_1        | ERROR 2003 (HY000): Can't connect to MySQL server on 'henryhoops' (111)
test_1        | ERROR 2003 (HY000): Can't connect to MySQL server on 'henryhoops' (111)
test_1        | ERROR 2003 (HY000): Can't connect to MySQL server on 'henryhoops' (111)
test_1        | Database
test_1        | information_schema
test_1        | mysql
test_1        | performance_schema
test_1        | Database
test_1        | information_schema
test_1        | mysql
test_1        | performance_schema
daviddrysdale commented 6 years ago

What's the status on this issue? Still a problem?

DazWilkin commented 6 years ago

I've not used it since.

I recall that we were unable to explain why renaming the services addressed the issue.

I was pursuing the docker-compose route to unbreak some of the changes I'd introduced to the DOCKERFILEs and bash scripts before I realized Trillian had a docker-compose way too.

jxsl13 commented 4 years ago

Thanks for the help, using links: mariadb as well as the network bridge worked fine.