nspcc-dev / neofs-aio

NeoFS All-in-One single node deployment helper
5 stars 9 forks source link

Docker container start is not stable #59

Closed smallhive closed 1 year ago

smallhive commented 1 year ago

Inside #54 we fixed some problems, but some of them left with us.

We have special message aio container started for successful container start. I tend to think we post it too fast. Check the docker logs below:

2023-07-26T05:56:58.197Z    debug   controller/calls.go:94  iterator over locally collected metrics aborted {"epoch": 1, "error": "could not build placement 6dNgUzSJa27mTig4BfAomZ28qP6rsSScaawrYCPA3kBo: status: code = 3072 message = container not found"}
2023-07-26T05:56:58.197Z    debug   controller/calls.go:166 announcement successfully interrupted   {"epoch": 1}
2023-07-26T05:56:58.205Z    INFO    persisted to disk   {"blocks": 1, "keys": 56, "headerHeight": 221, "blockHeight": 221, "took": "4.687109ms"}
aio container started
/usr/bin/neofs-rest-gw
2023-07-26T05:56:58.730Z    info    neofs-http-gw/app.go:279    no wallet path specified, creating ephemeral key automatically for this run
2023-07-26T05:56:58.731Z    info    neofs-http-gw/app.go:162    add connection  {"address": "localhost:8080", "weight": 1, "priority": 1}
2023-07-26T05:56:58.740Z    debug   session/executor.go:31  serving request...  {"component": "SessionService", "request": "Create"}
2023-07-26T05:56:58.745Z    warn    neofs-http-gw/app.go:227    metrics are disabled
2023-07-26T05:56:58.745Z    info    neofs-http-gw/app.go:336    starting application    {"app_name": "neofs-http-gw", "version": "v0.27.1"}
2023-07-26T05:56:58.745Z    info    neofs-http-gw/app.go:458    added path /upload/{cid}
2023-07-26T05:56:58.745Z    info    neofs-http-gw/app.go:461    added path /get/{cid}/{oid}
2023-07-26T05:56:58.745Z    info    neofs-http-gw/app.go:464    added path /get_by_attribute/{cid}/{attr_key}/{attr_val:*}
2023-07-26T05:56:58.745Z    info    neofs-http-gw/app.go:466    added path /zip/{cid}/{prefix}
2023-07-26T05:56:58.746Z    info    metrics/service.go:33   service hasn't started since it's disabled  {"service": "Pprof"}
2023-07-26T05:56:58.746Z    info    metrics/service.go:33   service hasn't started since it's disabled  {"service": "Prometheus"}
2023-07-26T05:56:58.746Z    info    neofs-http-gw/app.go:496    added server    {"address": "0.0.0.0:8081", "tls enabled": false, "tls cert": "", "tls key": ""}
2023-07-26T05:56:58.747Z    info    neofs-http-gw/app.go:359    starting server {"address": "0.0.0.0:8081"}
2023-07-26T05:56:59.186Z    INFO    sending PrepareRequest  {"height": 222, "view": 0}
2023-07-26T05:56:59.186Z    INFO    sending Commit  {"height": 222, "view": 0}
2023-07-26T05:56:59.186Z    INFO    approving block {"height": 222, "hash": "71365d9dda0ca0e600f92ee6890946a44bfb675d4831edab1a393b5aee11b43b", "tx_count": 0, "merkle": "0000000000000000000000000000000000000000000000000000000000000000", "prev": "2579d83328b56c8157c96c49ad2bd60810d940e43c50e5e63fcc3e26a0005b95"}
2023-07-26T05:56:59.187Z    INFO    initializing dbft   {"height": 223, "view": 0, "index": 0, "role": "Primary"}
2023-07-26T05:56:59.187Z    debug   neofs-node/morph.go:208 new block   {"index": 222}
2023-07-26T05:56:59.187Z    debug   innerring/innerring.go:236  new block   {"index": 222}
2023-07-26T05:56:59.204Z    INFO    persisted to disk   {"blocks": 1, "keys": 27, "headerHeight": 222, "blockHeight": 222, "took": "2.486695ms"}
2023-07-26T05:56:59.231Z    info    neofs-rest-gw/config.go:379 added connection peer   {"address": "localhost:8080", "priority": 1, "weight": 1}
2023-07-26T05:56:59.238Z    debug   session/executor.go:31  serving request...  {"component": "SessionService", "request": "Create"}
2023-07-26T05:56:59.296Z    info    metrics/service.go:33   service hasn't started since it's disabled  {"service": "Prometheus"}
2023-07-26T05:56:59.295Z    info    metrics/service.go:33   service hasn't started since it's disabled  {"service": "Pprof"}
2023/07/26 05:56:59 Serving neofs rest gw at http://[::]:8090
2023-07-26T05:57:00.188Z    INFO    sending PrepareRequest  {"height": 223, "view": 0}
2023-07-26T05:57:00.188Z    INFO    sending Commit  {"height": 223, "view": 0}
2023-07-26T05:57:00.189Z    INFO    approving block {"height": 223, "hash": "f6023244bfadee322653b971ac3042c4a17984846e7f1b1a5b6a1b013f3eb6f0", "tx_count": 0, "merkle": "0000000000000000000000000000000000000000000000000000000000000000", "prev": "71365d9dda0ca0e600f92ee6890946a44bfb675d4831edab1a393b5aee11

We post aio container started about:

2023-07-26T05:56:58.205Z    INFO    persisted to disk   {"blocks": 1, "keys": 56, "headerHeight": 221, "blockHeight": 221, "took": "4.687109ms"}
aio container started

but the container is ready to serve clients only one second after:

2023/07/26 05:56:59 Serving neofs rest gw at http://[::]:8090

Even if we wait for a second inside app, we may put only containers. Uploading objects raise an error about not enough nodes to SELECT from.

According to run script we already have sleep, with force ticking. We may add one more while with checking the correct response from the gate (or just wait a couple of seconds) and ticking during this wait.