hyperledger-labs / mirbft

MirBFT is a consensus library implementing the Mir consensus protocol.
Apache License 2.0
158 stars 34 forks source link

Can not run local example on branch research-iss. #109

Closed JeffXiesk closed 2 years ago

JeffXiesk commented 2 years ago

Hi, when I am working on branch research-iss in Ubuntu 20.04, after finishing installing and deployment described in readme, and I try to run example 3 (local example) in deployment. ace1a94dbc9b862fa9398253f279495

However, no matter how many times I run, it always ends up with the result below. Terminal

seok@ubuntu:/opt/gopath/src/github.com/hyperledger-labs/mirbft/deployment$ ./deploy.sh local new scripts/experiment-configuration/generate-local-config.sh
Using experiment data directory: deployment-data/local-0018
Generated 2 experiments.
Using deployment file: deployment-data/local-0018/deployment.dpl
Killing all application processes on the local machine.
orderingclient: no process found
Compiling.
Copy TLS keys and certificates to deployment-data/local-0018
Changing directory to deployment-data/local-0018
Starting master server.
Changing directory back to /opt/gopath/src/github.com/hyperledger-labs/mirbft/deployment
Copying code and binaries to experiment data directory for later analysis.
Waiting for local experiment to finish.
Deploy params: -1, 1, 1client, cloud-machine-templates/small-machine-fra05.cmt
Changing directory to deployment-data/local-0018
Starting local slaves: 1 1client
discoveryslave 1client 127.0.0.1:9999 127.0.0.1 127.0.0.1
Process is 204611
Changing directory back to /opt/gopath/src/github.com/hyperledger-labs/mirbft/deployment
Deploy params: -1, 4, peers, cloud-machine-templates/small-machine-fra05.cmt
Changing directory to deployment-data/local-0018
Starting local slaves: 4 peers
discoveryslave peers 127.0.0.1:9999 127.0.0.1 127.0.0.1
discoveryslave peers 127.0.0.1:9999 127.0.0.1 127.0.0.1
discoveryslave peers 127.0.0.1:9999 127.0.0.1 127.0.0.1
discoveryslave peers 127.0.0.1:9999 127.0.0.1 127.0.0.1
Changing directory back to /opt/gopath/src/github.com/hyperledger-labs/mirbft/deployment

master-log.log

[90m09:06:54.389[0m [32mINF[0m Processing command. [36mcmdName=[0mwrite-file [36mcontent=[0mREADY [36mfileName=[0mmaster-ready
[90m09:06:54.402[0m [32mINF[0m Processing command. [36mcmdName=[0m"wait for slaves" [36mn=[0m4 [36mtag=[0mpeers
[90m09:06:54.402[0m [32mINF[0m Starting server. [36mport=[0m9999
[90m09:06:54.634[0m [32mINF[0m New slave. [36maddrPort=[0m127.0.0.1:43970 [36mslaveID=[0m0 [36mtag=[0m1client
[90m09:06:54.688[0m [32mINF[0m New slave. [36maddrPort=[0m127.0.0.1:43972 [36mslaveID=[0m1 [36mtag=[0mpeers
[90m09:06:54.688[0m [32mINF[0m OK [36maddrPort=[0m127.0.0.1:43970 [36mcmdID=[0m-1 [36mslaveID=[0m0 [36mstatus=[0m0
[90m09:06:54.697[0m [32mINF[0m OK [36maddrPort=[0m127.0.0.1:43972 [36mcmdID=[0m-1 [36mslaveID=[0m1 [36mstatus=[0m0
[90m09:06:54.703[0m [32mINF[0m New slave. [36maddrPort=[0m127.0.0.1:43974 [36mslaveID=[0m2 [36mtag=[0mpeers
[90m09:06:54.712[0m [32mINF[0m OK [36maddrPort=[0m127.0.0.1:43974 [36mcmdID=[0m-1 [36mslaveID=[0m2 [36mstatus=[0m0
[90m09:06:54.775[0m [32mINF[0m New slave. [36maddrPort=[0m127.0.0.1:43978 [36mslaveID=[0m3 [36mtag=[0mpeers
[90m09:06:54.779[0m [32mINF[0m New slave. [36maddrPort=[0m127.0.0.1:43976 [36mslaveID=[0m4 [36mtag=[0mpeers
[90m09:06:54.810[0m [32mINF[0m OK [36maddrPort=[0m127.0.0.1:43978 [36mcmdID=[0m-1 [36mslaveID=[0m3 [36mstatus=[0m0
[90m09:06:54.815[0m [32mINF[0m OK [36maddrPort=[0m127.0.0.1:43976 [36mcmdID=[0m-1 [36mslaveID=[0m4 [36mstatus=[0m0
[90m09:06:55.404[0m [32mINF[0m Finished waiting for slaves. [36mnumSlaves=[0m4 [36mtag=[0mpeers
[90m09:06:55.405[0m [32mINF[0m Processing command. [36mcmdName=[0m"wait for slaves" [36mn=[0m1 [36mtag=[0m1client
[90m09:06:55.406[0m [32mINF[0m Finished waiting for slaves. [36mnumSlaves=[0m1 [36mtag=[0m1client
[90m09:06:55.406[0m [32mINF[0m Processing command. [36mcmd=[0m" mkdir -p experiment-output/0000/slave-__id__/config" [36mcmdName=[0mexec-start [36mtag=[0m__all__
[90m09:06:55.408[0m [32mINF[0m Processing command. [36mcmdName=[0mexec-wait [36mtag=[0m__all__ [36mtimeoutCommands=[0m
[90m09:06:55.417[0m [32mINF[0m OK [36maddrPort=[0m127.0.0.1:43972 [36mcmdID=[0m3 [36mslaveID=[0m1 [36mstatus=[0m0
[90m09:06:55.419[0m [32mINF[0m OK [36maddrPort=[0m127.0.0.1:43976 [36mcmdID=[0m3 [36mslaveID=[0m4 [36mstatus=[0m0
[90m09:06:55.421[0m [32mINF[0m OK [36maddrPort=[0m127.0.0.1:43978 [36mcmdID=[0m3 [36mslaveID=[0m3 [36mstatus=[0m0
[90m09:06:55.420[0m [32mINF[0m OK [36maddrPort=[0m127.0.0.1:43974 [36mcmdID=[0m3 [36mslaveID=[0m2 [36mstatus=[0m0
[90m09:06:55.429[0m [32mINF[0m OK [36maddrPort=[0m127.0.0.1:43970 [36mcmdID=[0m3 [36mslaveID=[0m0 [36mstatus=[0m0
[90m09:06:55.433[0m [32mINF[0m OK [36maddrPort=[0m127.0.0.1:43978 [36mcmdID=[0m4 [36mslaveID=[0m3 [36mstatus=[0m0
[90m09:06:55.436[0m [32mINF[0m OK [36maddrPort=[0m127.0.0.1:43974 [36mcmdID=[0m4 [36mslaveID=[0m2 [36mstatus=[0m0
[90m09:06:55.438[0m [32mINF[0m OK [36maddrPort=[0m127.0.0.1:43972 [36mcmdID=[0m4 [36mslaveID=[0m1 [36mstatus=[0m0
[90m09:06:55.439[0m [32mINF[0m OK [36maddrPort=[0m127.0.0.1:43970 [36mcmdID=[0m4 [36mslaveID=[0m0 [36mstatus=[0m0
[90m09:06:55.443[0m [32mINF[0m OK [36maddrPort=[0m127.0.0.1:43976 [36mcmdID=[0m4 [36mslaveID=[0m4 [36mstatus=[0m0
[90m09:06:55.444[0m [32mINF[0m Processing command. [36mcmd=[0m" cp config/config-0000.yml experiment-output/0000/slave-__id__/config/config.yml" [36mcmdName=[0mexec-start [36mtag=[0mpeers
[90m09:06:55.446[0m [32mINF[0m Processing command. [36mcmdName=[0mexec-wait [36mtag=[0mpeers [36mtimeoutCommands=[0m
[90m09:06:55.451[0m [32mINF[0m OK [36maddrPort=[0m127.0.0.1:43978 [36mcmdID=[0m5 [36mslaveID=[0m3 [36mstatus=[0m0
[90m09:06:55.455[0m [32mINF[0m OK [36maddrPort=[0m127.0.0.1:43974 [36mcmdID=[0m5 [36mslaveID=[0m2 [36mstatus=[0m0
[90m09:06:55.455[0m [32mINF[0m OK [36maddrPort=[0m127.0.0.1:43972 [36mcmdID=[0m5 [36mslaveID=[0m1 [36mstatus=[0m0
[90m09:06:55.461[0m [32mINF[0m OK [36maddrPort=[0m127.0.0.1:43976 [36mcmdID=[0m5 [36mslaveID=[0m4 [36mstatus=[0m0
[90m09:06:55.489[0m [32mINF[0m OK [36maddrPort=[0m127.0.0.1:43978 [36mcmdID=[0m6 [36mslaveID=[0m3 [36mstatus=[0m0
[90m09:06:55.493[0m [32mINF[0m OK [36maddrPort=[0m127.0.0.1:43976 [36mcmdID=[0m6 [36mslaveID=[0m4 [36mstatus=[0m0
[90m09:06:55.494[0m [32mINF[0m OK [36maddrPort=[0m127.0.0.1:43972 [36mcmdID=[0m6 [36mslaveID=[0m1 [36mstatus=[0m0
[90m09:06:55.496[0m [32mINF[0m OK [36maddrPort=[0m127.0.0.1:43974 [36mcmdID=[0m6 [36mslaveID=[0m2 [36mstatus=[0m0
[90m09:06:55.496[0m [32mINF[0m Processing command. [36mcmd=[0m" cp config/config-0000.yml experiment-output/0000/slave-__id__/config/config.yml" [36mcmdName=[0mexec-start [36mtag=[0m1client
[90m09:06:55.497[0m [32mINF[0m Processing command. [36mcmdName=[0mexec-wait [36mtag=[0m1client [36mtimeoutCommands=[0m
[90m09:06:55.503[0m [32mINF[0m OK [36maddrPort=[0m127.0.0.1:43970 [36mcmdID=[0m7 [36mslaveID=[0m0 [36mstatus=[0m0
[90m09:06:55.512[0m [32mINF[0m OK [36maddrPort=[0m127.0.0.1:43970 [36mcmdID=[0m8 [36mslaveID=[0m0 [36mstatus=[0m0
[90m09:06:55.514[0m [32mINF[0m Processing command. [36mcmdName=[0msync [36mtag=[0mpeers
[90m09:06:55.518[0m [32mINF[0m Noop. Doing nothing. [36maddrPort=[0m127.0.0.1:43974 [36mcmdID=[0m9 [36mslaveID=[0m2 [36mstatus=[0m0
[90m09:06:55.518[0m [32mINF[0m Noop. Doing nothing. [36maddrPort=[0m127.0.0.1:43972 [36mcmdID=[0m9 [36mslaveID=[0m1 [36mstatus=[0m0
[90m09:06:55.522[0m [32mINF[0m Noop. Doing nothing. [36maddrPort=[0m127.0.0.1:43978 [36mcmdID=[0m9 [36mslaveID=[0m3 [36mstatus=[0m0
[90m09:06:55.523[0m [32mINF[0m Noop. Doing nothing. [36maddrPort=[0m127.0.0.1:43976 [36mcmdID=[0m9 [36mslaveID=[0m4 [36mstatus=[0m0
[90m09:06:55.524[0m [32mINF[0m Processing command. [36mcmdName=[0msync [36mtag=[0m1client
[90m09:06:55.528[0m [32mINF[0m Noop. Doing nothing. [36maddrPort=[0m127.0.0.1:43970 [36mcmdID=[0m10 [36mslaveID=[0m0 [36mstatus=[0m0
[90m09:06:55.529[0m [32mINF[0m Processing command. [36mcmdName=[0mdiscover-reset [36mnPeers=[0m4
[90m09:06:55.530[0m [32mINF[0m Processing command. [36mcmd=[0m" orderingpeer experiment-output/0000/slave-__id__/config/config.yml 127.0.0.1:9999 127.0.0.1 127.0.0.1 experiment-output/0000/slave-__id__/peer.trc experiment-output/0000/slave-__id__/prof" [36mcmdName=[0mexec-start [36mtag=[0mpeers
[90m09:06:55.533[0m [32mINF[0m Processing command. [36mcmdName=[0mdiscover-wait
[90m09:06:55.539[0m [32mINF[0m OK [36maddrPort=[0m127.0.0.1:43972 [36mcmdID=[0m12 [36mslaveID=[0m1 [36mstatus=[0m0
[90m09:06:55.539[0m [32mINF[0m OK [36maddrPort=[0m127.0.0.1:43974 [36mcmdID=[0m12 [36mslaveID=[0m2 [36mstatus=[0m0
[90m09:06:55.544[0m [32mINF[0m OK [36maddrPort=[0m127.0.0.1:43978 [36mcmdID=[0m12 [36mslaveID=[0m3 [36mstatus=[0m0
[90m09:06:55.549[0m [32mINF[0m OK [36maddrPort=[0m127.0.0.1:43976 [36mcmdID=[0m12 [36mslaveID=[0m4 [36mstatus=[0m0
[90m09:06:55.784[0m [32mINF[0m Discovered node. [36maddrPort=[0m127.0.0.1:43980 [36mid=[0m0 [36mprivateAddr=[0m127.0.0.1 [36mpublicAddr=[0m127.0.0.1
[90m09:06:55.809[0m [32mINF[0m Discovered node. [36maddrPort=[0m127.0.0.1:43982 [36mid=[0m1 [36mprivateAddr=[0m127.0.0.1 [36mpublicAddr=[0m127.0.0.1
[90m09:06:55.814[0m [32mINF[0m Discovered node. [36maddrPort=[0m127.0.0.1:43988 [36mid=[0m2 [36mprivateAddr=[0m127.0.0.1 [36mpublicAddr=[0m127.0.0.1
[90m09:06:55.819[0m [32mINF[0m Discovered node. [36maddrPort=[0m127.0.0.1:43986 [36mid=[0m3 [36mprivateAddr=[0m127.0.0.1 [36mpublicAddr=[0m127.0.0.1
[90m09:06:55.821[0m [32mINF[0m All peer processes started. Waiting until they connect to each other (discover-wait).
[90m09:06:55.821[0m [32mINF[0m Generating membership list.

Below is the process status. (By running ps -aux) 03b92069cd4feda598a8db028bb109f In the status, we can see that most state of the process is Sl+ or S+, which means it is interruptible sleep (waiting for an event to complete), but what are the processes waiting?

It says it is waiting until they connect to each other. I am so confused and wondering why, am I missing some operation or something wrong? And how to fix it?

Thank you soooo much!

matejpavlovic commented 2 years ago

Hi Jeff, sorry for the late response! I tried to run the local deployment too and indeed the nodes wouldn't connect to each other. But I found the reason and the fix.

The problem was that by default, TLS is enabled and the (dummy) TLS certificates that are part of the repository are expired. This can be resolved in 2 ways:

  1. Re-generate the TLS certificates:
    cd tls-data
    ./generate.sh -f # The -f option forces re-generation even of certificates that are already present
  2. Disabling TLS: For that you need to set UseTLS to false in config-file-templates/mir-modular.yml (From experience so far, using TLS has no observable impact on performance anyway)

For remote deployments, I expect this issue not to occur, as certificates are always freshly generated with remote deployments.

JeffXiesk commented 2 years ago

Ohh, The problem has been fixed perfectly. Thanks for your kind and detailed response !