https://user-images.githubusercontent.com/31739405/233336538-7489b6db-3128-48a1-9dc9-75cc6b907b1e.mp4

Integrations CLI Design

Purpose

Provide the user a simple UI to work with integrations, including: creation, packaging, and publishing.

Background

Users don’t have access to a simple workflow to initialize, deploy, and publish Integrations. Maintaining integrations is hard, as it requires a configuration file and multiple directories with correctly set-up resources. To make the process easier for users, we will develop a CLI tool that enables:

Create a template project for a new integration from scratch.
- Example Integration: https://github.com/opensearch-project/observability/tree/2.x/integrations/nginx
Deploying a local integration template to an OpenSearch cluster.
- Needs to check if the cluster is healthy with the integration plugin.
- Since we only need to push to the repository index, a cluster health status of YELLOW should suffice.
- The integrations plugin needs its own healthcheck endpoint, unless there’s already a request that shows the installed plugins.
Packaging an integration to upload to their local cluster.
- Packaged as a zip file, uploaded via a mechanism similar to plugins.

The CLI is primarily meant for ease-of-use with regard to the integrations ecosystem. There are 3 use cases being targeted.

A new Integrations user who wants to make an integration for their toolkit. They know roughly what they want, but not an exact dependency list or the components they need to supply.
An experienced user who knows precisely what they want, and they want to be able to make it happen quickly and accurately.
An integration developer, who will have an integration set up, but they want to quickly iterate on it and share it with confidence that it won't cause breakage for them or their users.

For now, the integrations CLI is a part of the Observability repository. When integrations are moved to their own repo, the CLI will move with it.

Requirements

As a user, I can create a new integration with minimal difficulty—in less than 15 minutes or so.
- integrations-cli create
- The user is presented with an interactive configuration that lets them select: integration name, license (SPDX-compatible), data source, schema, catalog, categories, repository.
- The schema will be selected from a closed list, the list of valid schemas must be maintained somewhere.
- Once the basic options are selected, the user is presented a list of collections. They can select multiple collections to add to their integration.
  - A collection contains info, dataset, labels, schema.
  - The collection accepts an input_type that the user further selects. The input type defines how the data should be interpreted, for example, a logfile.
As a user, I can check if the integration I’ve created is correct, such that it can be readily uploaded to my cluster without issue.
- integrations-cli check
As a user, I can upload a local integration template to a remote OS instance and see the integration template in the repository.
- integrations-cli package
- Will we handle pushing it to their cluster or is that their responsibility?
  - Yes, we will POST to the _integration/repository endpoint.
- If something goes wrong, how do we rollback?
  - Server’s responsibility, we should just listen for an error response.
As an integration developer, I will be informed if my integration is invalid before pushing to a remote repository.
- Done via a commit hook. This should be added to a git repository as part of integration generation.
As a user, I should not need to have any language-specific tooling installed to run the CLI.
- Decision still pending on how to distribute it.
- One option is converting to an executable with pyinstaller.
- For now, we ignore this requirement and require the user has Python.
As a user, I can use the tool even if the OS cluster is running serverlessly.

Design Considerations

Consolidated Validation Logic

The project will have two components that need to work together. Consolidating the logic for the two components is important to avoid inconsistent results.

The integration consuming API
The CLI integration validator/template generator

To facilitate consistent logic for each component, we need to settle on using a standard system for organizing that logic. After consideration, we've settled on:

JSON Schema is a mature declarative language for annotating and validating JSON documents.
- Pros
  - Mature ecosystem.
  - Similar to mapping syntax.
  - More purpose-built for request validation.
- Libraries: http://json-schema.org/implementations.html (should support -06).
- Example: it is the current system used in the implemented Validator.
- Selected due to the wide amount of cross-language support. JSON Schema is much more purpose-built for the task, and requires less work to port. Swagger will still be used for API specification.

In addition to JSON validation, there is more complicated version checking logic that we’ll have to write. How do we maintain consistency with this logic? Some options:

Interop between the two tools.
Shared library that both tools depend on.
Duplicate logic.
Internal CLI that can be piped to.

Commands

There has historically been a lot of disagreement on what the different verbs regarding this process mean. Please see the glossary at the end of this document. The exact usage of these verbs must be communicated to the users.

For processing CLI arguments, we will be using the click library, for consistency with existing OS CLI tools.

CLI Precise Description

integrations-cli --help

Usage: integrations-cli [--version] [--help] <command> [<args>]
 integrations-cli create  Create a new Integration from a specified template
 integrations-cli check   Analyze the current Integration and report errors
 integrations-cli package Zip the current Integration so it can be uploaded to the Integration Plugin

integrations-cli create --help

Usage: integrations-cli create [--help] [--presets] <name> [<args>]
Create a new Integration from a specified template

Arguments:
  name               The name of the integration

Options:
  --help             Show this help page
  --presets          List available presets
  --preset <preset>  Generate using the provided preset
  --directory <dir>  Specify the directory in which to create the integration (default: './<name>')

When create is run without a preset, the user is interactively shown the following prompts:

Creating new integration '<name>'

Integration description (default: ''): 

License (default: 'Apache-2.0'):

Data Source Examples:
- kubernetes
- nginx
- otel-collector
Select a Data Source:  

Data Source Version: (default: 'latest'): 

Schema Version Options:
- 0.1
- 0.2
- 1.0
- latest
Schema Version (default: 'latest'): 

Integration labels (comma-separated list, default: none): 

Catalog for the Integration (default: 'observability'): 

Would you like to add collections interactively? (y/n):

integrations-cli check --help

Usage: integrations-cli check [--help] [<dir>] [<args>]
Analyze the current Integration and report errors

Arguments:
  dir     The directory of the integration to check (default: .)

Options:
  --help  Show this help page

integrations-cli package --help

Usage: integrations-cli package [--help] [<dir>] [<args>]
Zip the current Integration so it can be uploaded to the Integration Plugin

Arguments:
  dir     The directory of the integration to package (default: .)

Options:
  --help          Show this help page

Tests

A difficult part of testing will be ensuring that the front-end validator does not certify an integration that the API will reject. Using a stable JSON Schema library for the task will be critical, but there should also be integration tests that check the CLI upload against a running cluster with Observability.

To ensure validation is functioning as intended, we should find and include a fuzzing framework, such as JSON Schema to Elm.

New endpoints will need Pen Testing.

Implementation Plan

Short-Term Deliverable

What is our first deliverable?
- CLI Library that can generate/zip a preset integration.
- We may want to consider using a templating engine instead of the standard json module.
  - To research: which templating engine?
  - Some options: Jade, Pug, Mustache, HandlebarsJS, Jinja2
  - For now, use the standard module, no need to over-complicate this.

Goals for 2.7

Initializing Integrations
Deploying Integrations

Goals for 2.8+

Publishing Integrations

Glossary

Build (= Package): Packaging a folder containing an integration into a zip file. The zip should be able to be deployed to the cluster as-is. Check (= Validate): Ensuring that the integration is correct and will be accepted by the server. Deploy (= Import): Moving a built integration from a local filesystem to a remote cluster. After this step, all references to the integration should be via a cluster index. Install: Putting an integration in a local filesystem that will be loaded.

Install differs from Deploy by process: Deploy is through an API while Install is on the FS.
Install should not be used for adding new integrations at runtime, it primarily refers to pre-installed integrations. Integration: a folder containing resources that define how to process and display information generated by a data source. Creation: Creating a new integration in an empty directory, using a template and user input. Publish: Committing and pushing a local integration as a PR.

opensearch-project / observability

[FEATURE] Integration Templating CLI #1451