onlineconf / onlineconf-go

Go library for reading OnlineConf files
MIT License
2 stars 7 forks source link

new (backward incompatible) interface proposal #3

Open mrsndmn opened 3 years ago

mrsndmn commented 3 years ago

There are some disadvantages of current onlineconf-go interface:

Common

Reloader

Interaface of module reloader would be like:

    reloader := MustReloader("TREE")
    ctx := context.Background()
    ctx, cancel := context.WithCancel(ctx)
    wg := &sync.WaitGroup{}
    wg.Add(1)

    go func() {
        for {
            err := reloader.RunWatcher(ctx)
            if err != nil {
                fmt.Fprintf(os.Stderr, "onlineconf reloader error: %s", err.Error())
                time.Sleep(time.Second)
                continue
            }
            break
        }

        wg.Done()
    }()

    module := reloader.Module()
    // retrieve config params from module
    // ...

    cancel()
    wg.Wait()

For simple cases (short-running scripts) reloader watcher is not needed at all.

Module

NewModuleReloader(&ReloaderOptions{Name: moduleName, Dir: moduleDir}) 

// we also could provide simplified interface for default module dir:

// Reloader returns reloader for specified module in default module dir
func Reloader(moduleName string) (*ModuleReloader, error) {

// MustReloader returns reloader for specified module  in default module dir. Panics on error.
func MustReloader(moduleName string) *ModuleReloader {

module := relaoder.Module()

Module should not be saved in global map.

Getters

val, err := module.Int(path, 0)
if isErrConfigParamNotFound(err) {
    // handle err
}

Path validation

Invalid parameters paths sometimes can lead to hurtful mistakes.

Path validation would be more reasonable in compile time.

intPath := MustParamPath("/path/to/param/int") # panics on invalid parameter path
ocInt, err := module.Int(intPath)

Sometimes it is also useful to specify onleneconf parameters path prefix and further join prefix with sub-config parameters. Considering all this I formulated properties of valid onlineconf paths:

otherPath := MustParamPath("/onlineconf/test")
pathSuffix := MustParamPath("/path/to/subconfig")
suffixedOtherPath := otherPath.Join(pathSuffix)

Problem: nobody cares on handling errors in onlineconf getters and in PathValidations. I guess that the most frequent way to retrieve parameter value would be

module.MustInt(MustParamPath("/onlineconf/test"))

Runtime panics are undesirable. So we can move parameters paths check to compile time with specifying default value right there.

var VeryImportantParam = MustParamPathInt("/onlineconf/test", 0) // path + default

func main() {
    reloader := MustReloader("TREE") // config reloader form default config-dir

    // run reloader watcher if needed
    // go reloader.RunWatcher(context.TODO())

    module := reloader.Module()           // consistent copy of module params
    val, err := module.Int(VeryImportantParam)  // also with default
    if err != nil {
        // nobody would check this error but we have to pass it out because we don't know how to handle it
        // here could be useful errors e.g. can't parse param
    }
    ...
}

further work

Not sure these features are reasonable right now or at all

The most safe way to work with onlineconf I imagine:

With this approach getter never return error because all the parameters are cached.

aleksey-mashanov commented 3 years ago

There are some disadvantages of current onlineconf-go interface:

Common

  • panics I don't see any reason this library to panic. Possibly it would be acceptable to raise panic inside methods like Must{Int,String,Reloader,Module...}. But panics difenetly not appropriate inside newModule and initWatcher. Errors must be returned instead of panics.

Yes, panics are not desirable in most cases. Especially if there is a way to gracefully handle errors returned by a function. Do you see a way to handle errors in all hundreds calls to Get...() spread around the code? Almost always a service is not usable without the configuration.

Reloader

  • I don't think it is a best practice to create goroutines inside any method of util library with no ability to handle errors. Possible solutions:

    • return error chan to handle errors that happened inside initWatcher/module reloader

I don't think a user can do something with these errors except to log them.

  • do not create goroutine. Creating goroutine is a responsibility of onlineconf-go's user. Personally I like this approach more.

The ability to create goroutines in a library through the only standard API is a strong point of the Go language. I don't see a reason to drop this advantage.

This approach let us drop log dependency.

Is it a problem? This is quite a common pattern. It is not a dependency, it is a possibility to redefine the logger.

Module

  • global state/global variables Initialize() is a crutch. Any (almost all) global variable complicates testing and decreases flexibility of library. I guess it would be better to provide interface like this:

A simplified testing must not lead to a complicated usage. Agree what a global state is not a good solution. But passing a reloader through all libraries may be annoying.

NewModuleReloader(&ReloaderOptions{Name: moduleName, Dir: moduleDir}) 

// we also could provide simplified interface for default module dir:

// Reloader returns reloader for specified module in default module dir
func Reloader(moduleName string) (*ModuleReloader, error) {

// MustReloader returns reloader for specified module  in default module dir. Panics on error.
func MustReloader(moduleName string) *ModuleReloader {

module := relaoder.Module()

Almost always MustReloader will be used.

Module should not be saved in global map.

I'd prefer not to mmap one file twice if reloader was created twice in two unlinked libraries.

Getters

  • Getters ( module.Int(..), module.String(..) ) methods must return errors if there was one while reading from mmapped file or parsing parameter string (for ints).

What do you expect a user must do if an error happened? They must get a value. Right now they get a default value if a parameter not exists or an error happened.

  • go native maps are faster and safer than reading even from mmapped cdb file. I guess it would be convenient to cache all parameters. todo reproduce benchmarks (i benched it too long ago and lost the snippet)

Reading from a mmaped file is a memcpy. Why do you think it is unsafe? Read locks are required in the both cases. I do not expect significant advantage of go map if any.

  • each getter method must have default value Methods like *IfExists are redundant. Requiring the second argument to be passed will simplify interface of onlineconf-go.

Agree in most cases. The only case when they can be useful is if a default value must be calculated or fetched from some storage.

val, err := module.Int(path, 0)
if isErrConfigParamNotFound(err) {
    // handle err
}

Do you really want to write such checks every time?

Path validation

Invalid parameters paths sometimes can lead to hurtful mistakes.

Path validation would be more reasonable in compile time.

Golang has no compile time hooks. Of course a validation is always desirable but not by the cost of the readability of the code.

intPath := MustParamPath("/path/to/param/int") # panics on invalid parameter path
ocInt, err := module.Int(intPath)

Sometimes it is also useful to specify onleneconf parameters path prefix and further join prefix with sub-config parameters. Considering all this I formulated properties of valid onlineconf paths:

  • Onlineconf path can't end with /.
  • Onlineconf path must be prefixed with /.
  • Single slash / is a valid onlineconf path.
  • Double slash // is prohibited

From an application point of view OnlineConf parameter path is just a string. There are no ways to get siblings or iterate through children. Moreover depending of a module configuration the path can be in the two forms: unix-like or dot-separated.

Problem: nobody cares on handling errors in onlineconf getters and in PathValidations. I guess that the most frequent way to retrieve parameter value would be

module.MustInt(MustParamPath("/onlineconf/test"))

Yes. Totally agree. So is it reasonable to complicate?

Runtime panics are undesirable. So we can move parameters paths check to compile time with specifying default value right there.

Golang has no compile time checks. Anything will be in the runtime. Sooner or later.

var VeryImportantParam = MustParamPathInt("/onlineconf/test", 0) // path + default

func main() {
    reloader := MustReloader("TREE") // config reloader form default config-dir
    module := reloader.Module()           // consistent copy of module params
    val, err := module.Int(VeryImportantParam)  // also with default
    if err != nil {
        // nobody would check this error but we have to pass it out because we don't know how to handle it
        // here could be useful errors e.g. can't parse param
    }
    ...
}

Too hard to use isn't it?

further work

Not sure these features are reasonable right now or at all

The most safe way to work with onlineconf I imagine:

  • Declarative parameters description

Yes. OnlineConf is schemaless now. I'd prefer to have a schema but probably it will be too complicated to implement and migrate.

  • Reloader reads and caches all declared parameters.

    • if there are any error in config (parameter type is wrong or required parameter is missing) reloader goroutine would be able to handle this error and not to update current copy of module.
    • there is also disadvantages of thus approach: application with broken config won't be able to restart
    • sometimes you don't know the real path of parameters you want to get. There are 2 solutions: use json parameters for dynamic config keys or allow reloader to cache parameters by regexp/template etc

With this approach getter never return error because all the parameters are cached.

You really don't want to get random values from cache by different instances depending of their start time. Right now getters return a default value in case of errors. This is more stable.

aleksey-mashanov commented 3 years ago

Right now there are two scenarios when onlineconf-go panics:

First case can be solved by requiring to call Get...() on already opened cdb. Just removing shortcuts for "TREE" will be enough.

The second one can be solved by moving from Get...(path, ...) to Get...(path, default).

mrsndmn commented 3 years ago

tl dr

verbose


This approach let us drop log dependency.

Is it a problem? This is quite a common pattern. It is not a dependency, it is a possibility to redefine the logger.

And no possibility to change log format for example. Do you know any example of popular go module with error logging inside?


global state/global variables

A simplified testing must not lead to a complicated usage. Agree what a global state is not a good solution. But passing a reloader through all libraries may be annoying.

ModuleReloader would be suitable as external service constructor argument. But to keep consistent Module per request it would be better to pass through Module object. Also you are able to create context with Module.

However, could you provide any alternative solution?


Module should not be saved in global map.

I'd prefer not to mmap one file twice if reloader was created twice in two unlinked libraries.

External libraries must not create their own Reloaders. I suppose either Reloader should be used in external library object constructor or Module passed with context.


Getters ( module.Int(..), module.String(..) ) methods must return errors if there was one while reading from mmapped file or parsing parameter string (for ints).

What do you expect a user must do if an error happened? They must get a value. Right now they get a default value if a parameter not exists or an error happened.

Ok. I'd love not to return errors on getters too.


go native maps are faster and safer than reading even from mmapped cdb file

Reading from a mmaped file is a memcpy.

Well yes, but actually no!

wiki

[mmap] implements demand paging because file contents are not read from disk directly and initially do not use physical RAM at all. The actual reads from disk are performed in a "lazy" manner, after a specific location is accessed

Why do you think it is unsafe? I was stuck on how mmapped file reading errors are returned. The answer is SIGSEGV. Nice! Now I agree there is no need returning error for getters at all.

I do not expect significant advantage of go map if any.

pushed benchmarks.

go test -run=^$ -bench . -benchmem -benchtime=100000x
goos: linux
goarch: amd64
pkg: github.com/onlineconf/onlineconf-go
BenchmarkMmappedCdb-2             100000               252 ns/op              48 B/op          5 allocs/op
BenchmarkGoMap-2                  100000                12.0 ns/op             0 B/op          0 allocs/op

With no mmapping:

BenchmarkOrdinaryFileCdb-2             100000              2274 ns/op              48 B/op          5 allocs/op

You see map is up to 20 time faster due to there is no useless allocations with string <-> byte copying.


Path validation Invalid parameters paths sometimes can lead to hurtful mistakes. Path validation would be more reasonable in compile time.

Golang has no compile time hooks.

Sure! I meant runtime as soon as possible as a global variables.

Of course a validation is always desirable but not by the cost of the readability of the code.

With purposed approach readability doesn't suffer.

All the parameters defined as global variables in one place. Then defined params could be used with no fear of misprint. It would be better than a ton of params scattered around the code.


From an application point of view OnlineConf parameter path is just a string. There are no ways to get siblings or iterate through children. Moreover depending of a module configuration the path can be in the two forms: unix-like or dot-separated.

So what?


You really don't want to get random values from cache by different instances depending of their start time.

What do you mean?

Right now getters return a default value in case of errors.

Me too

aleksey-mashanov commented 3 years ago

And no possibility to change log format for example. Do you know any example of popular go module with error logging inside?

grpc-go?

ModuleReloader would be suitable as external service constructor argument. But to keep consistent Module per request it would be better to pass through Module object. Also you are able to create context with Module.

You must have the only opened file per a module and must close the file once it was replaced. In the other case you will run out of file descriptors sooner or later.

External libraries must not create their own Reloaders. I suppose either Reloader should be used in external library object constructor or Module passed with context.

In this case adding a configurable parameter to a deeply nested library may be painful and will require changing an API without a backward compatibility.

Reading from a mmaped file is a memcpy.

Well yes, but actually no!

wiki

[mmap] implements demand paging because file contents are not read from disk directly and initially do not use physical RAM at all. The actual reads from disk are performed in a "lazy" manner, after a specific location is accessed

Yes. This is why we use mmaped cdb in all our onlineconf client libraries. Page is read from the disk only the first time a configuration parameter located in it was read. After that this is just a memcpy. So it doesn't affect performance. Moreover even the first time with a very high chance page already will be in the memory because onlineconf-updater just wrote it.

Why do you think it is unsafe? I was stuck on how mmapped file reading errors are returned. The answer is SIGSEGV. Nice! Now I agree there is no need returning error for getters at all.

This is absolutely OK. You will never get SIGSEGV from a mmaped file on a healthy system. More than that if you break you drive while your application is in swap you will have the same result because underling logic is the same. Does anyone care about this?

I do not expect significant advantage of go map if any.

pushed benchmarks.

go test -run=^$ -bench . -benchmem -benchtime=100000x
goos: linux
goarch: amd64
pkg: github.com/onlineconf/onlineconf-go
BenchmarkMmappedCdb-2             100000               252 ns/op              48 B/op          5 allocs/op
BenchmarkGoMap-2                  100000                12.0 ns/op             0 B/op          0 allocs/op

With no mmapping:

BenchmarkOrdinaryFileCdb-2             100000              2274 ns/op              48 B/op          5 allocs/op

You see map is up to 20 time faster due to there is no useless allocations with string <-> byte copying.

The benchmarks are too synthetics. They compare two relatively fast operations. There are even no locks. If you benchmark GetString method implementations you will see that the difference is less but both cases are still very fast. After that you can remove ReaderAt from the underling cdb/mmap libraries and use direct access to the mmaped region. And finally you can use string instead of []byte in function input/output arguments. As you can see there are 5 allocs/op but only one is really required. In the end you will see what the timings are comparable.

But before all of this the main question is: does somebody really must read configuration parameters at such high rate or they just must fix their code?

You really don't want to get random values from cache by different instances depending of their start time.

What do you mean?

There are a lot of servers running a lot of instances. This instances are started at different time. They must have the same configuration. If you fallback to cached value in the case of integer parsing for example you can get two different values in two instances if one of them was started before somebody made a mistake (cached value from previous version) and another after (default value).