spacemeshos / go-spacemesh

Go Implementation of the Spacemesh protocol full node. 💾⏰💪
https://spacemesh.io
MIT License
751 stars 212 forks source link

Make uniform reaction on wrong smeshing opts in node-config #3411

Open brusherru opened 2 years ago

brusherru commented 2 years ago

Description

In case the User set up a custom (and out of range) value to smeshing-opts-numunits field, then the Node just logs an error, and keeps working. (see also #3410).

But in case if User specified something invalid into smeshing-opts-numfiles — it crashes with the panic:

{"L":"ERROR","T":"2022-08-03T15:44:59.504+0300","N":"00000.defaultLogger","M":"Fatal: goroutine panicked. Stacktrace: goroutine 1347 [running]:\nruntime/debug.Stack()\n\t/Users/runner/hostedtoolcache/go/1.18.2/x64/src/runtime/debug/stack.go:24 +0x65\ngithub.com/spacemeshos/go-spacemesh/log.Log.Panic({0xc00038ab40?, {0x0?, 0xc000746000?}}, {0x4ef6c43, 0x1c}, {0xc00063af90, 0x1, 0x1})\n\t/Users/runner/work/go-spacemesh/go-spacemesh/log/zap.go:46 +0xd7\ngithub.com/spacemeshos/go-spacemesh/log.Panic({0x4ef6c43, 0x1c}, {0xc00063af90, 0x1, 0x1})\n\t/Users/runner/work/go-spacemesh/go-spacemesh/log/log.go:168 +0x50\ngithub.com/spacemeshos/go-spacemesh/cmd/node.(*App).startServices.func1()\n\t/Users/runner/work/go-spacemesh/go-spacemesh/cmd/node/node.go:764 +0xd9\ncreated by github.com/spacemeshos/go-spacemesh/cmd/node.(*App).startServices\n\t/Users/runner/work/go-spacemesh/go-spacemesh/cmd/node/node.go:762 +0x21c\n"}
{"L":"PANIC","T":"2022-08-03T15:44:59.505+0300","N":"00000.defaultLogger","M":"failed to start smeshing: failed to start post setup session: new initializer: invalid `cfg.LabelsPerUnit` & `opts.NumUnits`; expected: `cfg.LabelsPerUnit` * `opts.NumUnits` to be evenly divisible by `opts.NumFiles` (10), given: 4096"}
panic: failed to start smeshing: failed to start post setup session: new initializer: invalid `cfg.LabelsPerUnit` & `opts.NumUnits`; expected: `cfg.LabelsPerUnit` * `opts.NumUnits` to be evenly divisible by `opts.NumFiles` (10), given: 4096

goroutine 1347 [running]:
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc0008ff5c0, {0x0, 0x0, 0x0})
    /Users/runner/go/pkg/mod/go.uber.org/zap@v1.19.0/zapcore/entry.go:232 +0x44c
go.uber.org/zap.(*SugaredLogger).log(0xc0001dfdf0, 0x4, {0x4ef6c43?, 0x0?}, {0xc0001dff90?, 0x2?, 0x2?}, {0x0, 0x0, 0x0})
    /Users/runner/go/pkg/mod/go.uber.org/zap@v1.19.0/sugar.go:227 +0xee
go.uber.org/zap.(*SugaredLogger).Panicf(...)
    /Users/runner/go/pkg/mod/go.uber.org/zap@v1.19.0/sugar.go:159
github.com/spacemeshos/go-spacemesh/log.Log.Panic({0xc00038ab40?, {0x0?, 0xc000746000?}}, {0x4ef6c43, 0x1c}, {0xc0001dff90, 0x1, 0x1})
    /Users/runner/work/go-spacemesh/go-spacemesh/log/zap.go:47 +0x1fa
github.com/spacemeshos/go-spacemesh/log.Panic({0x4ef6c43, 0x1c}, {0xc00063af90, 0x1, 0x1})
    /Users/runner/work/go-spacemesh/go-spacemesh/log/log.go:168 +0x50
github.com/spacemeshos/go-spacemesh/cmd/node.(*App).startServices.func1()
    /Users/runner/work/go-spacemesh/go-spacemesh/cmd/node/node.go:764 +0xd9
created by github.com/spacemeshos/go-spacemesh/cmd/node.(*App).startServices
    /Users/runner/work/go-spacemesh/go-spacemesh/cmd/node/node.go:762 +0x21c

So I believe that we should have the same behavior for both options (and probably others as well), and either crash with a user-friendly message or keep syncing, but log an error and probably send it via NodeError stream.

Steps to reproduce

  1. Open node-config
  2. Change smeshing-opts-numfiles to any value greater 1
  3. Start the node

Actual Behavior

It crashes with the error described above.

Expected Behavior

Start the node and start syncing, but log an error and send it via NodeErrorStream.

Previously, I thought that this is OK to panic on startup when smeshing options are not valid. But it will introduce a fork in the logic because it is silly to crash with the panic in the case when the User sends the wrong value to the GRPC endpoint. It is much better to keep Node working, but show an error to the User.

So I think this is the best option. If you have any better ideas — please, write them in the comments below ;)

Environment

moshababo commented 2 years ago

When attempting to start smeshing via RPC using an invalid value, an error is logged and returned. When attempting on node start via config, the node crashes.

The rationale is to reduce the likelihood of not noticing the error (not all users will monitor NodeErrorStream or every log entry). It’s a common practice for a process to crash upon an invalid config value. That's obviously not the case for RPC error handling.

Any other opinions?