midl-dev / tezos-on-gke

A secure, turn-key public Tezos baking service on Kubernetes
Apache License 2.0
32 stars 15 forks source link

remove sentry node #49

Open nicolasochem opened 3 years ago

nicolasochem commented 3 years ago

I get many missed bakes and endorsements since florence activation or v9 release, even with v9.2 which was supposed to have a fix for it.

I am trying now to run with "naked nodes" not protected by sentries, to see if it fixes it. I will post my findings.

We are an exception, no one else (kiln or big bakers) are using sentries for their baking operations.

I will monitor for a few days and sees if it fixes it. If it does, I will push the code for sentry removal here.

denver-s commented 3 years ago

Main problem for missed endorsements is that the private node is not reliably connecting to the public nodes (due to bad local DNS probably).

If you choose to remove sentry nodes:

nicolasochem commented 3 years ago

When the public/private node link disconnects, I get log messages "too few connections (1)". The times I had missed endorsements, I didn't see these messages, so I assume the connectivity was ok. There is a long-standing issue with dns ( https://gitlab.com/tezos/tezos/-/issues/1382) but I don't think it makes a difference here, except maybe at first boot. Comparing the timestamps, I did see 5-10 second lag between the time the public node imports the block and the time it propagates to the private node. No, it can't be a private node with public trusted nodes. It has to open connections to a large number of peers and advertise its node id. Note that it does not accept incoming connections, which is good and bad. (bad for the network, but good for us in terms of security) Security-wise it's not as good as sentries, but if sentries cause endorsement to be missed, it's not worth it. I will leave it running for a few more days to see if it makes a difference...

On Wed, Jun 2, 2021 at 11:22 PM denver-s @.***> wrote:

Main problem for missed endorsements is that the private node is not reliably connecting to the public nodes (due to bad local DNS probably).

If you choose to remove sentry nodes:

  • is security compromised? can it be still set as private node, with about 50 public trusted nodes as peers?
  • you could also "lower" the machine type (right now it costs 200 euro/month).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/midl-dev/tezos-on-gke/issues/49#issuecomment-853601317, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAWXC5VRBJ4PJU7L3COHYDTQ4NSBANCNFSM457YGBYA .

denver-s commented 3 years ago

How's it going?

nicolasochem commented 3 years ago

I still have missed endorsements occasionally despite removing sentry nodes.

nicolasochem commented 3 years ago

Nodes have plenty of cpu and ram, but it takes about 5 seconds to validate each block (completed in 7.650s):

│ Jun  7 16:12:14.101 - validator.chain: Request pushed on 2021-06-07T16:12:13.846-00:00, treated in 1.351ms, completed in 251ms                                                                                                                             │
│ Jun  7 16:13:10.632 - validator.block: block BL1DAnEVpT7L1UAzTpBbPQvWmgGGA7RbxPa9oEbBTync4TchpWG successfully validated                                                                                                                                    │
│ Jun  7 16:13:10.632 - validator.block: Request pushed on 2021-06-07T16:13:02.982-00:00, treated in 7.187us, completed in 7.650s                                                                                                                            │
│ Jun  7 16:13:10.816 - prevalidator.NetXdQprcVkpa.PsFLorenaUUu: switching to new head BL1DAnEVpT7L1UAzTpBbPQvWmgGGA7RbxPa9oEbBTync4TchpWG                                                                                                                   │
│ Jun  7 16:13:10.816 - prevalidator.NetXdQprcVkpa.PsFLorenaUUu:  Request pushed on 2021-06-07T16:13:10.644-00:00, treated in 376us, completed in 171ms                                                                                                      │
│ Jun  7 16:13:10.925 - validator.chain: Update current head to Block Hash BL1DAnEVpT7L1UAzTpBbPQvWmgGGA7RbxPa9oEbBTync4TchpWG (level 1505379, timestamp 2021-06-07T16:12:58-00:00, fitness 01::00000000000cf863), same branch                               │
│ Jun  7 16:13:10.925 - validator.chain: Request pushed on 2021-06-07T16:13:10.631-00:00, treated in 1.958ms, completed in 289ms                                                                                                                             │
│ Jun  7 16:14:24.541 - validator.block: block BM5UfHWamYapeUANsVfpkyJNVWGboEGyohNFJEa2kG8tRqEWn8Q successfully validated                                                                                                                                    │
│ Jun  7 16:14:24.541 - validator.block: Request pushed on 2021-06-07T16:14:19.223-00:00, treated in 11.776us, completed in 5.317s                                                                                                                           │
│ Jun  7 16:14:25.329 - prevalidator.NetXdQprcVkpa.PsFLorenaUUu: switching to new head BM5UfHWamYapeUANsVfpkyJNVWGboEGyohNFJEa2kG8tRqEWn8Q                                                                                                                   │
│ Jun  7 16:14:25.329 - prevalidator.NetXdQprcVkpa.PsFLorenaUUu:  Request pushed on 2021-06-07T16:14:24.565-00:00, treated in 2.302ms, completed in 761ms                                                                                                    │
│ Jun  7 16:14:25.461 - validator.chain: Update current head to Block Hash BM5UfHWamYapeUANsVfpkyJNVWGboEGyohNFJEa2kG8tRqEWn8Q (level 1505380, timestamp 2021-06-07T16:13:58-00:00, fitness 01::00000000000cf864), same branch                               │
│ Jun  7 16:14:25.461 - validator.chain: Request pushed on 2021-06-07T16:14:24.540-00:00, treated in 5.146ms, completed in 912ms                                                                                                                             │
│ Jun  7 16:15:12.834 - validator.block: block BM2VtPURmfixu7F6t9Sgb5mDfHamQ18WTryDi28d1RMWpS4VtSa successfully validated                                                                                                                                    │
│ Jun  7 16:15:12.834 - validator.block: Request pushed on 2021-06-07T16:15:06.737-00:00, treated in 4.495us, completed in 6.96s                                                                                                                             │
│ Jun  7 16:15:13.081 - prevalidator.NetXdQprcVkpa.PsFLorenaUUu: switching to new head BM2VtPURmfixu7F6t9Sgb5mDfHamQ18WTryDi28d1RMWpS4VtSa                                                                                                                   │
│ Jun  7 16:15:13.081 - prevalidator.NetXdQprcVkpa.PsFLorenaUUu:  Request pushed on 2021-06-07T16:15:12.847-00:00, treated in 282us, completed in 233ms                                                                                                      │
│ Jun  7 16:15:13.183 - validator.chain: Update current head to Block Hash BM2VtPURmfixu7F6t9Sgb5mDfHamQ18WTryDi28d1RMWpS4VtSa (level 1505381, timestamp 2021-06-07T16:14:58-00:00, fitness 01::00000000000cf865), same branch                               │
│ Jun  7 16:15:13.183 - validator.chain: Request pushed on 2021-06-07T16:15:12.834-00:00, treated in 877us, completed in 346ms    

On Florencenet it's much faster.

denver-s commented 3 years ago

I noticed an improvement without sentry nodes, so I'm in favor of removing them.

nicolasochem commented 3 years ago

OK, will do.

I opened a ticket on tezos repo where they claim that it will get better once more bakers upgrade to 9.2

https://gitlab.com/tezos/tezos/-/issues/1446