iov-one / iov-core

Client library for secure key management and multi-blockchain communication
https://iov-one.github.io/iov-core-docs/
Apache License 2.0
198 stars 18 forks source link

Lisk node does not start reliably #1002

Open webmaster128 opened 5 years ago

webmaster128 commented 5 years ago

Sometimes the Lisk node is started properly. Then even after all the installation ab build, port 4000 is not responsing.

See e.g. https://travis-ci.com/iov-one/iov-core/builds/111913750 but that also happy locally in rare cases.

webmaster128 commented 5 years ago

I suspect a race condition betweenthe Lisk node and the Lisk Postgres, as there are a lot of errors during the DB boot:

$ cd $TMPDIR/@iov/lisk/docker
$ docker-compose logs db
Attaching to docker_db_1
db_1    | The files belonging to this database system will be owned by user "postgres".
db_1    | This user must also own the server process.
db_1    | 
db_1    | The database cluster will be initialized with locale "en_US.utf8".
db_1    | The default database encoding has accordingly been set to "UTF8".
db_1    | The default text search configuration will be set to "english".
db_1    | 
db_1    | Data page checksums are disabled.
db_1    | 
db_1    | fixing permissions on existing directory /var/lib/postgresql/data ... ok
db_1    | creating subdirectories ... ok
db_1    | selecting default max_connections ... 100
db_1    | selecting default shared_buffers ... 128MB
db_1    | selecting dynamic shared memory implementation ... posix
db_1    | creating configuration files ... ok
db_1    | running bootstrap script ... ok
db_1    | sh: locale: not found
db_1    | 2019-05-15 15:01:44.351 UTC [28] WARNING:  no usable system locales were found
db_1    | performing post-bootstrap initialization ... ok
db_1    | syncing data to disk ... ok
db_1    | 
db_1    | WARNING: enabling "trust" authentication for local connections
db_1    | You can change this by editing pg_hba.conf or using the option -A, or
db_1    | --auth-local and --auth-host, the next time you run initdb.
db_1    | 
db_1    | Success. You can now start the database server using:
db_1    | 
db_1    |     pg_ctl -D /var/lib/postgresql/data -l logfile start
db_1    | 
db_1    | waiting for server to start....2019-05-15 15:01:45.827 UTC [32] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
db_1    | 2019-05-15 15:01:45.843 UTC [33] LOG:  database system was shut down at 2019-05-15 15:01:44 UTC
db_1    | 2019-05-15 15:01:45.847 UTC [32] LOG:  database system is ready to accept connections
db_1    |  done
db_1    | server started
db_1    | CREATE DATABASE
db_1    | 
db_1    | 
db_1    | /usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*
db_1    | 
db_1    | waiting for server to shut down....2019-05-15 15:01:46.339 UTC [32] LOG:  received fast shutdown request
db_1    | 2019-05-15 15:01:46.341 UTC [32] LOG:  aborting any active transactions
db_1    | 2019-05-15 15:01:46.344 UTC [32] LOG:  worker process: logical replication launcher (PID 39) exited with exit code 1
db_1    | 2019-05-15 15:01:46.344 UTC [34] LOG:  shutting down
db_1    | 2019-05-15 15:01:46.362 UTC [32] LOG:  database system is shut down
db_1    |  done
db_1    | server stopped
db_1    | 
db_1    | PostgreSQL init process complete; ready for start up.
db_1    | 
db_1    | 2019-05-15 15:01:46.448 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
db_1    | 2019-05-15 15:01:46.448 UTC [1] LOG:  listening on IPv6 address "::", port 5432
db_1    | 2019-05-15 15:01:46.452 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
db_1    | 2019-05-15 15:01:46.464 UTC [43] LOG:  database system was shut down at 2019-05-15 15:01:46 UTC
db_1    | 2019-05-15 15:01:46.470 UTC [1] LOG:  database system is ready to accept connections
db_1    | 2019-05-15 15:01:47.120 UTC [50] LOG:  incomplete startup packet
db_1    | 2019-05-15 15:02:10.778 UTC [56] LOG:  incomplete startup packet
db_1    | 2019-05-15 15:03:46.693 UTC [48] LOG:  stats_timestamp 2019-05-15 15:03:46.69966+00 is later than collector's time 2019-05-15 15:03:46.689726+00 for database 16829

Let's try to reduce the load while Lisk node is booting