openethereum / parity-deploy

Parity deployment script.
Apache License 2.0
81 stars 52 forks source link

Client fails to setup due to race during docker node startup #36

Closed 0x7CFE closed 6 years ago

0x7CFE commented 7 years ago

When deploy is invoked with like ./parity-deploy.sh --config aura -n 1 -r nightly --min-gas-price 0 --enable-client --reserved-only, client node is created.

Unfortunately the following scenario is possible: https://gist.github.com/0x7CFE/df64afad8244e74156a772a6b2df28f0

tl;dr: Client fails to start due to failed name resolution (related to https://github.com/paritytech/parity/issues/6907). Looks like this happens because of a race between nodes. If host1 starts before client all is working as expected. However, client may try to resolve host1 before latter obtains an IP address.

In order to fix that, we need either fix the order in which nodes start or assign static addresses to nodes.

0x7CFE commented 7 years ago

Found the best solution so far and it is: https://docs.docker.com/compose/startup-order/ So, client node simply should depend on all hosts. That would fix the race.

0x7CFE commented 7 years ago

Unfortunately that wasn't enough. On a more complex case other than 1 client 1 authority, the problem remains. That's because each authority node is configured so that other authorities are set as reserved nodes. So host1 depends on host2 and vice versa.

This results in the following dead loop:

$ sudo docker-compose up
Creating network "paritydeploy_default" with the default driver
Creating host2
Creating host1
Creating client
Attaching to host1, host2, client
host1     | Loading config file from /parity/authority.toml
host2     | Loading config file from /parity/authority.toml
host2     | Invalid node address format given for a boot node: enode://be0df3e855e5995a57ecf1e863aba96f0e29f075a9cbb359456b045b67d895c87c4935599306b797ef2b172d2b6556d0c57f60dcc1b5f0b4080bb74ebb2c32e2@host1:30303
host1     | Invalid node address format given for a boot node: enode://ba1d0e6d920b7e219f13ef540319c361abbd9c26e5487ff35f6a019cd8dd2d6d62642b55a06472a766f12705de971a1bb40d5f643d0f38e58f7960f79fe13ef4@host2:30303
client    | Loading config file from /parity/client.toml
host1 exited with code 1
host2 exited with code 1
client    | Invalid node address format given for a boot node: enode://be0df3e855e5995a57ecf1e863aba96f0e29f075a9cbb359456b045b67d895c87c4935599306b797ef2b172d2b6556d0c57f60dcc1b5f0b4080bb74ebb2c32e2@host1:30303
client exited with code 1

We need either to solve nameserver issues, retry resolution after some time or bring all hosts up and then actually start the business logic.

ddorgan commented 6 years ago

I believe a fix was introduced for this. Can you please retest.

ddorgan commented 6 years ago

@0x7CFE is this still any issue?

ddorgan commented 6 years ago

@0x7CFE any update on this?

0x7CFE commented 6 years ago

@ddorgan, sorry for the late response. I haven't used docker recently. Probably would be better to ask someone, who is working on it now.

ddorgan commented 6 years ago

@0x7CFE Thanks! Closing issue.