kubernetes-sigs / kubebuilder

Kubebuilder - SDK for building Kubernetes APIs using CRDs
http://book.kubebuilder.io
Apache License 2.0
7.76k stars 1.43k forks source link

Feature Request: Plugins Phase 2 #1378

Closed estroz closed 1 year ago

estroz commented 4 years ago

Overview

Phase 1 of the proposed scaffold plugin system is being implemented (#1290) at the time of writing. Progress on phase 2, which adds support for chaining plugins together (more below), will begin shortly afterwards, starting with a proposal doc. Before writing that doc, which will contain implementation details for chaining, I'd like to start a general discussion of how chaining will occur.

Phase 2 requirements, summarized from previous discussions 1. [mdbook](https://github.com/kubernetes-sigs/kubebuilder/tree/5187a15/docs/book)-like structure 1. A Plugin receives a JSON blob "universe" from upstream plugins containing file "nodes", each of which contain file metadata. 1. The plugin can add or delete nodes (nodes are not mutable), then returns the modified universe so downstream plugins can do the same. 1. A debug function dumps the universe with configurable verbosity and some intelligence as to which node failed (latter is a nice-to-have). 1. Pre-generate a project file (for complex plugin config use cases). 1. Enhanced ["layout"](https://github.com/kubernetes-sigs/kubebuilder/blob/5187a15/designs/extensible-cli-and-scaffolding-plugins-phase-1.md) PROJECT file key. 1. Either and ordered list of plugin `{name}/{version}`, or ordered map of `{name}/{version}` to plugin-specific config. 1. Order matters for certain plugins, ex. a bazel plugin would need to run after all Go files are generated.

Initial thoughts

Hypothetical CLI setup: a global --plugins flag that takes an ordered list of plugin names. The subcommand invoked with --plugins executes "downstream" plugins from that list that match those the kubebuilder binary knows about.

Given that we want to pass some initial universe between plugins, that universe should be initialized by the plugin invoked via CLI (the "base" plugin). For example, kubebuilder init --plugins addon will invoke an Init plugin, which passes its generated state to the addon plugin, modifying scaffolded files then writing them to disk.

Open questions

Assuming this flow is what we want, a few questions pop up:

  1. For the initial state: at what point do we pass a base plugin's state to downstream plugins?
    1. Approach 1: aggregate files being scaffolded and generated by the base plugin in a universe and pass that universe to a downstream plugin
      • Pro: fully encapsulates all plugin logic within Run().
      • Pro: all files have been scaffolded and post-scaffolding has been run by this point so we don't have to thread a bunch of plugin execution logic through scaffolds, just filepath.Walk the whole directory.
      • Con: requires reading the entire directory after scaffolding and post-scaffolding code runs, since external generators (controller-gen or go mod init) will have modified on-disk state.
    2. Approach 2: pass a universe subset to plugins on each scaffold event (each call to Scaffold.Execute())
      • Pro: plumbing already exists to do this; we'd need to change method signatures of existing plugins, some types etc. In this case, the universe is a model.Universe and a node is a model.File.
      • Pro: the scaffolder controls what is being written to disk, with all the features that the current plumbing provides.
      • Pro: writing a debug function may be easier than in approach 1, since we have more fine-grained control with per-scaffold event plugin execution.
      • Con: model.Universe will have to be modified to guarantee node immutability.
      • Con: A downstream plugin won't receive the full universe, just portions on each scaffold event, and will never "see" post-scaffolded files.
      • Con: each downstream plugin will be run on each scaffolding event, which may cause issues (in the naive implementation) if Run() is not idempotent. A non-naive implementation may require aggregation of scaffold events.
  2. How can we encourage, if not enforce, modification of a universe over just writing files to disk in Run()?
    • Perhaps we could modify Run() to return a list of files to add/delete from a universe rather than having Run() write such logic itself.
  3. Do we want PROJECT to be present in the universe passed between plugins?
    • Probably not, since we gate access via Read() vs. Load().

For 1., the first approach makes the most sense to me given its pros, but reading the full directory after Run() completes feels like a heavy cost to incur for one plugin execution.

More context

Related issues: #1249 Related PR's: #1250, #1290

/kind feature

estroz commented 4 years ago

/cc @DirectXMan12 @mengqiy @Adirio @joelanford @hasbro17

Adirio commented 4 years ago

A few comments on model.Universe. I'm working on a couple PRs that end up modifying it a bit. Basically, the file models will be stored in a map with strings keys (their paths) so that they can be accessed without having to iterate over all of them:

// Universe describes the entire state of file generation
type Universe struct {
    // Config stores the project configuration
    Config *config.Config `json:"config,omitempty"`

    // Boilerplate is the copyright comment added at the top of scaffolded files
    Boilerplate string `json:"boilerplate,omitempty"`

    // Resource contains the information of the API that is being scaffolded
    Resource *resource.Resource `json:"resource,omitempty"`

    // Files contains the model of the files that are being scaffolded
    Files map[string]*file.File `json:"files,omitempty"`
}

One of the main goals behind those PRs is to have file update logic inside Scaffold.Execute. This is, updating a file will also be done including it as the list of files in a Execute call. I don't know if you were considering file updates, but they are out-of-execute file modifications like controller-gen or go mod init. If they get included in Execute instead of directly writting the changes to disk, there will be one less out-of-execute file modifications. Could we also follow this approach for post-scaffolding scripts? Instead of actually running the script to generate the file on disk, running it to get the output as a []byte or string inside kubebuilder so that it can be inseted in a model.Universe. This will prevent the whole directory read that you are mentioning.

About passing the configuration file info, I do think that it should be passed in order to have the access to this file centralized. Also, this would allow plugins to update it, kubebuilder could make a diff of the passed and returned configuration to each plugin and decide to accept that change or not. For example, a plugin should probably not be able to update the repo, domain, version or plugins fields, but adding a resource may be allowed to some plugins, and thus we could have some sort of authorithation mechanism for plugins.

estroz commented 4 years ago

@Adirio great to hear that you're working on integrating file Update logic with Scaffold.Execute. I was doing something similar to test out some theories for (1ii).

Concerning out-of-execute file generation/updates: if we can figure out a general method of piping bytes from a binary into the Go runtime, sure. AFAIK that isn't possible for 2 reasons:

  1. We don't know what external (arbitrary) generators are being run so we can't control how plugins not implemented by kubebuilder write bytes. We could possibly run generators in a temp dir and read files, although this approach becomes problematic considering point 2 below.
  2. Generators often require some on-disk state to exist first (ex. controller-gen requires an APIs dir to exist) so we'd have to write everything in the universe to disk first, run the generators, then collect output.

Having kubebuilder control what exists in PROJECT after a plugin potentially modifies its structure makes sense. This exposes the config to arbitrary modification (outside of what kubebuilder allows of course), permitting a plugin to have their own section of the config.

Adirio commented 4 years ago

@Adirio great to hear that you're working on integrating file Update logic with Scaffold.Execute. I was doing something similar to test out some theories for (1ii).

@estroz feel free to take a look at #1375

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

estroz commented 4 years ago

/lifecycle frozen

hasbro17 commented 4 years ago

So to refresh, the main requirements of plugins phase 2 is to have a true plugin architecture that supports:

  1. Plugins as separate binaries that can be discovered by the Kubebuilder CLI binary via user specified plugin filepaths.
  2. Plugins that can be written in any language.

The existing experimental plugin interface outlines these aims for the future to replace the in-process demo plugin with an external process plugin.

I wanted to propose exploring the use of hasicorp's go-plugin system which allows you to write plugins that communicate over gRPC. This would fulfill our above requirements, plus it's been tested with other CLI tools like terraform, vault etc. The examples demonstrate some basic plugins for bidirectional communication and how to write a plugin in python.

Go has an official plugins package but as far as I am aware it does not have cross-language support and seems to be intended more for in-tree plugins than for external users to write their own plugin code.

DirectXMan12 commented 4 years ago

Go's plugin package is a bit of a mess, and basically not useful unless the plugins are built alongside the main process.

The only thing I'm hesitant about for the hashicorp go-plugin library is that it seems like it's built for long-running plugins and not short-lived plugins, but if it works well for that case, that seems fine. The alternative is just passing JSON/proto back and forth across stdin/stdout, and logging on stderr.

rashmigottipati commented 3 years ago

Based on the investigation on plugin libraries such as the built-in Go's plugin package and Hashicorp’s go-plugin library, we have come to the conclusion that it is more suitable to write our own custom typed plugin library.

The built-in plugin library seems to be a non-starter as it is more suitable for in-tree plugins rather than out-of-tree plugins and it doesn’t offer cross language support. Hashicorp’s go-plugin system seems more suitable than built-in go-plugin library as it enables cross language/platform support. However, it is more suited for long running plugins as opposed to short lived plugins and it could be overkill with the usage of protobuf. For the stated reasons, we would like to write our own plugin system that passes JSON blobs back and forth across stdin/stdout.

The plugin specification will include type metadata to allow the potential for using other plugin libraries in the future if the need arises. From an implementation standpoint, we could create a new implementation of our plugin interface based on the type. We can do a type switch on the type of plugin the data is being passed to. For e.g. if it is native go, then we pass the universe directly & if it is a binary wrapper, then we pass the serialized stream of JSON bytes to it.

camilamacedo86 commented 3 years ago

just to share; The RFE https://github.com/kubernetes-sigs/kubebuilder/issues/2198 seems more +1 motivation for this use case.

camilamacedo86 commented 1 year ago

It is partial done, and we have a meta issue which is required as follow up #2600 Therefore, I am closing this one.