segmentio / parquet-go

Go library to read/write Parquet files
https://pkg.go.dev/github.com/segmentio/parquet-go
Apache License 2.0
341 stars 103 forks source link

Considerations for changing the design of passing configuration options to constructor functions #94

Open achille-roussel opened 2 years ago

achille-roussel commented 2 years ago

This issue is a follow up to discussions that happened on pull requests and deserved taking a closer look. Links for context:

Currently the library uses a model based on Functional Options for Friendly APIs, which has become a popular model in the Go ecosystem, for example:

Another common model found in Go packages is the use of struct types that carry the configuration. The configuration struct is either passed as argument to a constructor function, or is the instantiated object itself. In the former case, configuration is usually applied by the function instantiating the object, while in the latter, configuration is lazily consumed when methods of the instantiated value are called. Here are examples of these models:

Each approach navigates a multi-dimensional spectrum, optimizing for different properties, including:

Functional Options

The functional option model is based on passing function values as parameters to the variable argument list of a constructor function. The constructor also uses default configuration internally for option values that were not defined.

This model arguably optimizes for readability, especially in the case of instantiating an object with the default options, where the program simply calls to constructor without arguments (or only with the required arguments), leaving the variable option list empty:

obj := pkg.New()

Not being exposed to the often complex configuration helps smooth the learning curve, as concepts can be introduced more gradually. This syntax is also useful to keep tests short and expressive.

This approach is also effective at improving readability; options are usually defined through constructor functions of their own with descriptive names:

obj := pkg.New(
  pkg.FirstName(firstName),
  pkg.LastName(lastName),
)

Often times, the model is extended to use single-method interface instead of functions, which helps allow for more complex options to be passed as argument to the function. This helps leverage named fields in the instantiation of the option value, for example:

// Identity would implement an Option interface required by values of the New argument list
obj := pkg.New(
  &pkg.Identity{
    FirstName: firstName,
    LastName:  lastName,
  },
)

Configuration Structures

The configuration struct carries the configuration in the exported fields, and the application assigns the fields to set the values of options. In this model, the zero-value is used to indicate that an option has the default value.

When a constructor is employed, the program must always pass a configuration object, even if it has the zero value. If the configuration struct is passed by pointer, the program can pass nil, which tends to negatively impact readability since the nil value does not give any indication at the call site of what it means:

obj := pkg.New(nil) // the reader must refer to the documentation to understand what nil is

Some packages have used variable argument lists to work around this constraint, but must then define how to handle how to handle receiving multiple configuration values: do the values stack up or is it invalid to have more than one value? The former can be confusing to the reader, while the latter delays failure to runtime since the compiler does not know what only zero or one arguments are valid.

The zero-value is often a useful default but it can be complex to express the absence of a value in some cases (e.g. when zero is a valid value but not a desirable default). Some APIs resort to using pointer types in order to work around this limitation, but the result sacrifices usability since Go does not support declaring literals for pointers to primitive types (e.g. we cannot write &int(0)), requiring to declare separate variables that we pass the address of, or helper functions to return the address of the value they receive as argument.

A major advantage of configuration structs remains the ability to generate configuration from serialized data (e.g. the command line or a json file), as well as making it possible for the application to inspect and mutate the configuration value in different contexts; to illustrate, these qualities were called out by kafka-go users in https://github.com/segmentio/kafka-go/issues/706

parquet-go

The parquet-go package attempts to embrace the qualities of these various approaches by offering a model based on functional options applying to a configuration struct, the latter also implementing the option interface(s) itself for cases where the application as already constructed a configuration object by loading it from an external location.

Programs can take the approach of using functional parameters, configuration struct, or a mix of both if needed. All these forms would be equivalent and offer the same level of type safety:

sortFunc := parquet.SortFuncOf(
  parquet.Descending(true),
  parquet.NullsFirst(true),
)
sortFunc := parquet.SortFuncOf(
  &parquet.SortConfig{
    Descending: true,
  },
  parquet.NullsFirst(true),
)
sortFunc := parquet.SortFuncOf(
  &parquet.SortConfig{
    Descending: true,
    NullsFirst: true,
  },
)
config := &parquet.SortConfig{}
config.Apply(
  parquet.Descending(true),
  parquet.NullsFirst(true),
)
sortFunc := parquet.SortFuncOf(config)

The model optimizes for readability and usability at the call site, but trades off some cognitive overhead for the library maintainers when it comes to implementing the option interface on configuration structs. Those methods tend to have ad-hoc implementations with a signature which does not communicate well the intent:

func (c *SortConfig) ConfigureSort(config *SortConfig) { /* assign values from config to c */ }

End users of the library are not required to deal with this complexity, as they should be able to leverage the API without having to understand the implementation.

kevinburkesegment commented 2 years ago

This all sounds reasonable. I don't know. I guess just to me the ...Option style is too much overhead when all we're doing is setting single properties on a configuration struct - it reminds me of setters and getters in Java.

The other issue is that if you see it in e.g. Godoc, clicking on options ...Option takes you to the Option interface definition, which doesn't actually tell you which options you can configure... you then need to hunt around to actually find the things that implement that interface. If you just pass in a Config{} struct or whatever the options are right there.

I agree the Option() type is more useful when you are setting larger numbers of options or if they interact in unexpected ways, and maybe for a library this large it's worth being flexible about how we accept stuff.

For me there's just a lot of overhead getting up to speed with Parquet right now so my tendency is to try to simplify the need to think about all the other stuff (an extra interface indirection) as much as possible

achille-roussel commented 2 years ago

The part about Godoc is true, tho the summary groups functions that look like constructors under the type documentation, which actually highlights options at the top-level better than when they are struct fields:

image

I would expect Go developers to also be familiar with interfaces, when encountering one and looking for implementations, it is simple to search for the method name(s) (I believe some IDEs help with this as well).

This is all about navigating trade offs, neither model is better than the other, they're just good at different things, we just need to strike the balance of qualities we want to optimize for.