Refactor Codebase To Conform To Best Practices And Increase Stability

Overview

Nanobox Desktop has suffered from a general lackadaisical approach to design, architecture, stability, and correctness. In it's current form, there is no prevailing pattern or structure, no guidelines for how new features should be implemented, and very little tests. The lack of tests is largely a consequence of poor design, making the process of adding unit tests extremely difficult.

The good news is that the problems that Nanobox Desktop was created to address are largely solved, we just need a better implementation that leverages best practices and tests to confirm assumptions. It’s not like we have to throw everything away and start from scratch. It would essentially be establishing an agreed-upon pattern, and then moving things from one location into the other, cleaning them up as they come across.

Before continuing, please read the following articles that explain (in great detail) the current, best-practices approaches to Go design:

SOLID Go Design - https://dave.cheney.net/2016/08/20/solid-go-design

Go Best Practices - https://peter.bourgon.org/go-best-practices-2016/

Factory Pattern In Go - http://matthewbrown.io/2016/01/23/factory-pattern-in-golang/

Current anti-patterns

Heads up: Please read the aforementioned articles before continuing, to provide some context.

Globals

The entire design currently depends heavily on package level state, and global state. This anti-pattern will effectively ensure that testing at a granular level is nearly impossible. With this pattern, the only way to test is to test the entire system as a whole. Moving forward, we should not use global, or package level state for anything.

Coupling

Everything within the application is tightly coupled together. Refactoring is extremely difficult as every contract is explicit, and nothing can be modified without also changing the calling contracts. Moving forward, every component of this application should be designed with implicit contracts, and the components should be able to be replaced by a mock version to encourage testing.

Vertical Dependencies

From the entry-point of the application, a vertical dependency tree is established, wherein a component at the top level depends on another component at a layer just below that, and continues in a vertical fashion until the bottom-most layer. Every component has been designed to both consume a level below and be consumed by a higher layer. This makes re-use nearly impossible. Moving forward we will follow the Dependency Inversion principle, which enables a flat, non-vertical dependency structure which will also encourage re-use.

Multi-Purpose

Many, many functions and methods are written to encapsulate multiple responsibilities and purposes. As an example, a function that should only be responsible for coordination of logic and delegation to other components, actually performs the logic. This makes testing nearly impossible. Moving forward, each function or method should only ever have a single responsibility.

Bad Practices

Other facets are less-pronounced, and could also be considered more of a bad practice than an anti-pattern. Some examples include: incomplete or unhelpful comments, poor package naming, inconsistent conventions, packages with unrelated logic, or any other careless and sloppy approach. Moving forward, we will have no tolerance for any such practice.

Suggested Design

Directory Structure

One of the challenges of implementing this refactor, is that we'll need a way to build the new application alongside the existing, while still being able to reference the existing application to migrate pieces over. We don't want to blast the code and start fresh (and thus not have a reference), and we also don't want to complicate the directory structure and put new modules next to old ones, causing general confusion.

Fortunately, Peter Bourgon has suggested a better structure (in his Best Practices article above) that will give us the best of both worlds.

/cmd
  nanobox/
    nanobox.go
  update/
    update.go
/pkg
  /foo
    bar.go
  /baz
    boa.go

Essentially, the top-level commands will live in /cmd and the low-level modules/libraries will live in /pkg and will encourage re-use (potentially even outside of the project) as a sort of standard library.

Dependency Inversion

Assuming you read the articles above, the notion of Dependency Inversion should be somewhat familiar, so I really just wanted to highlight one aspect of this principle: High-Low. If followed correctly, this pattern should yield a flat, 2-level design of composable modules.

The high-level modules are primarily responsible for:

Initializing any dependencies (logger, display, etc)
Initializing the low-level modules, providing any dependencies during initialization
Executing the process and flow of the low-level modules (a.DoYourThing(), b.DoYourThing(), etc)

In contrast, the low-level modules are responsible for execution details only, not for assembling dependencies or even control flow.

In essence, you can think of the high-level modules as the control flow, or process of execution, and the low-level modules as the details of the execution. In other words, the high-level modules control the "what" of the program, and the low-level modules implement the "how".

The contract between the high and low modules should always be implicit, via interfaces. Additionally, one low-level component may be given another low-level component as a dependency, but the contract (dependency) between the two components should be via an interface. In this way we can make changes over time without having to refactor everything (like we're doing here).

Design

The High-Low pattern is pretty straight forward in most go applications, as the general flow would like this, within the program's Main() function:

Read Config
Parse Arguments
Setup Logger
Initialize Low-Level Components
Call Low-Level Components(passing in logger and config, etc)
Wait For Return

This flow works well with servers, or simple programs with little execution flow logic.

With Nanobox, specifically, there is a unique challenge that the primary function of this program is to conditionally chain sequences of logic (low-level modules), and potentially recurse over multiple sequences of logic, depending on boxfile configuration or host machine state.

In our case, I believe the best approach is to delegate this high-level module logic to a special "processor" package.

Please note, the existing codebase contains a processor package, but despite the name, the proposal here is fundamentally different and it just so happens that I think the name is fitting in this new design.

The general flow might look like this:

Main():

Initialize Logger
Initialize Configuration
Initialize Any Other Global Modules
Parse The Command/Args/Flags
Initialize A Processor (pass global dependencies like logger, etc)
processor.Process()

processor.Process():

Initialize/Process Any Dependency Processors (if any)
Initialize Any Low-Level Modules (providing any dependencies like logger or other low-level modules)
Step Through Flow Logic, Invoking Low-Level Modules Until Complete
Initialize/Process Any Dependency Processors (if any)

In this design, the Main() func is a high-level module, who's primary responsibility is to initialize the global dependencies, and delegate the remainder of the high-level functionality to a processor.

In this case, the processor is actually a low-level module to the Main() func, but a high-level module to the rest of the system. Ultimately, we're really just delegating the high-level logic to a flat, composable processor package, since the role of this program naturally invokes a lot of complex, high-level logic which would be extremely unwieldy to manage in the Main() func.

Processors

Processors are significant enough in this design, it's worth setting some ground rules and explanations.

First, they are fundamentally, high-level modules. They are primarily responsible for chaining together the sequence of events, ultimately carried out by low-level modules, from the command line request.

Second, there is exactly 1 processor for every nanobox command. nanobox run, nanobox dns add, nanobox stop, etc, will be matched to a processor. In this way, the processor is directly responsible for "processing" the requested command, and sequencing the flow of logic (low-level) modules.

Third, a processor should only have a single responsibility. For instance, if nanobox run is requested, the corresponding processor will not also be responsible for starting the VM, adding mounts, etc. Instead, the nanobox run processor will call any and all processors responsible for carrying out those responsibilities, explicitly, before running it's sequence. This will consequently expose a level of granularity to the available commands. This is a good thing, though not all of the subcommands need to be documented, and may primarily be used internally as dependencies of other commands.

Fourth, if a processor needs to run another processor, it won't be required to shell-out and run a subprocess. It will essentially just use the same interface that Main() uses to delegate to the primary processor in the first place. In this case, it will even provide the same dependencies that were originally provided by Main(), thus the processor itself won't know whether it was invoked from Main() or from some other process. In this manner, we can achieve a flat, composable design, where processors may be re-used as it makes sense.

Fifth, since a processor is primarily a top-level delegation of the Main() func, and wouldn't make sense to include in any project outside of this one, we'll place them within the /cmd/processor namespace.

Simplicity and Explicity

I think we should avoid trying to finagle frameworks like cobra into doing what we need. We should instead depend on patterns and best practices. In the past we spent weeks (if not months) trying to get cobra to work exactly as we need. As this pattern will not be the same with this design, I suggest we not try to spend time in areas like this, but instead create a simple low-level component to parse the command line how we expect, and then use a factory method pattern for dispatching to the appropriate processor.

Lastly, I would strongly suggest we prefer being explicit over using clever frameworks, even if the result of being explicit is a bit verbose. Ultimately, it should be very easy to follow a chain of logic, and make a necessary change, or at least understand what the program is doing.

Conclusion

Let's do this!

nanobox-io / nanobox