Introduce JSON schemas for quickstart artifacts

zstix commented 2 weeks ago

This PR starts the process of introducing JSON Schemas for all of the quickstart "artifacts". The thinking is that, with a single source of truth for validation, we can more easily create and update these artifacts (and remove duplicate validation logic).

That said, there are a number of ways we could approach breaking this validation down. I've gone ahead and created an example of how we could define our schema, but there are a lot of open questions that we should align on first (listed below).

To build out these schemas, I used a combination of our Typescript types, GraphQL schema, Terraform definition, and common sense.

Questions

`answered` How do we want to organize the schema directory?

The proposed solution will work.

Prior discussion

I've opted for a single schema JSON file for artifact, all stored in a single directory. In my opinion, this should make it easy for discoverability and aren't really concerned about having too many schemas since we are not likely to be adding a lot of new artifacts.

`answered` How do we want to reference other schemas?

Let's avoid having to reference other schemas entirely by combining this into one single "artifact" schema. We can use defs (called definitions in our version of the spec) to avoid duplicate configuration (e.g. install directives).

Prior discussion

I've added an `$id` field to each schema so that they can reference each other with a `$ref` key: ```json { "$id": "foo.json", "definitions": { "bar": { "type": "string" } } } ``` ```json { ... "properties": { "baz": { "$ref": "foo.json$/definitions/bar" } } } ``` This is what [the documentation](https://ajv.js.org/guide/combining-schemas.html) recommends, but there is not prescription on what shape the `$id` field should take. In the documentation, they provide a full URI, but I've opted for just the directory and filename here.

`answered` How much detail do we want to provide?

Follow the catalog service's level of granularity with the schema validation. This will require a little bit more detail for alerts, but everything else seems to be in line.

Prior discussion

I have taken a few different approaches in this PR and I'm curious with we like best: - `alert.json` I provide _some_ of the fields, but have set `additionalProperties` to `true` so we can accept many more (e.g. timing functions or runbook URLs). - `dashboard.json` I left `pages` as a generic "object" type without a defined shape. This provides some flexibility within that field, but the overall dashboard artifact is locked down (`additionalProperties` is set to `false`). - `datasource.json` I explicitly type everything and provide no variation. I kinda lean towards defining everything _except_ for the dashboard pages, since the dashboard API just accepts a JSON glob anyway.

`answered` What version (and features) of JSON schema do we want to embrace?

The default is draft-07. We are going to use this. We should encode the version in the schema itself.

Prior discussion

In a number of places, I have defined the `items` property of an `array` type. It looks like there are [different specs](https://ajv.js.org/json-schema.html#items) for how this should work depending on which version of JSON schema you're using. I picked the style that felt most natural to write, but I'm not sure if that's "industry standard".

`answered` Are we alright using extra package(s) for improved validation

While there is good support for URI (and other formats) on the front-end, we're unable to reliably use this on the back-end. Use the stock library and string over uri.

Prior discussion

In a few spots, I have the following validation for strings: ```json { "type": "string", "format": "uri" }, ``` This is only possible with the [ajv-formats](https://github.com/ajv-validator/ajv-formats) package. I think this would be really great for validating URLs, UUIDs, etc...but I want to check with the group to see if we're comfortable going beyond "stock" JSON schema.

Next steps

[x] Combine all the schemas into a single file that uses def
[x] Encode the schema version in the schema
[x] Add the correct level of validation based on the catalog service
[x] Remove URI format in favor regular ol' strings
[x] Hard-code a list of "core" datasource IDs
[x] Write code that adds an enum for all the datasource IDs (including the "core" datasources)
[x] Documentation (ensure we call out that datasource feature flags must be removed before people reference them in quickstarts)

mickeyryan42 commented 2 weeks ago

How do we want to organize the schema directory?

This is what I imagined as well, unless we have a reason to go a different way this seems to be easiest to maintain.

How do we want to reference other schemas?

I may be misunderstanding this, but is this saying if we use $refs then those entire sub schemas wont be validated from the top level?
$ref property, the object is considered a reference, not a schema. Therefore, any other properties you put in that object will not be treated as JSON Schema keywords and will be ignored by the validator.

How much detail do we want to provide?

I agree with being mostly specific about details, it seems like the point of defining an artifact we can validate that we want to be able to fail on unexpected or missing fields. That said we can probably be more loose on certain things that we won't break on should they be. I could see additionalProperties being ok since we would just ignore additional fields on the backend.

What version (and features) of JSON schema do we want to embrace?

I'm not entirely sure about this and would be interested to see side by side differences. I do see this though, which is interesting: Optionally, you can add draft-07 meta-schema, to use both draft-07 and draft-2019-09 schemas in one Ajv instance:

Are we alright using extra package(s) for improved validation?

Ajv has a number of extensions/plugins they list in their docs https://ajv.js.org/packages/#extending-ajv Imo especially since we aren't worrying about a bundle size or package vulnerabilities in a production environment we should go for whatever we find useful as long as its maintained and credible. One other consideration is whether we would need the same type of extension on the catalog service side and whether its available in Elixir, or we can accomplish the same task without it easily enough.

zstix commented 2 weeks ago

Decisions from a zoom:

Maintain a list of "core" datasource IDs
When we go to build the artifact:
- Also build a schema that has an enum of all datasourceIds
- Add in the core datasource IDs to the enum
Publish both the artifact and the (lightly) generated schema

The schema is mostly static, the only thing we actually "generate" is the list of datasource IDs.

zstix commented 1 week ago

Some additional questions, and my attempts to answer them:

We seem to be using the capitalization of datasourceIds in the schema, but in the actual quickstart definitions, it's dataSourceIds (this always feels like a painpoint)

This was a conscious choice, but made purely out of my own personal preference. I suppose we probably should update this to be dataSourceIds since that's how it's represented in NerdGraph 😩

We require a datasource ID on a quickstart, but it seems like we currently have quickstarts that aren't associated to a datasource ID so the artifact is failing validation. I think it's valid for a quickstart to not have a datasource ID in the case that it's a docs-only quickstart?

datasourceIds are not one of the required fields for the quickstart schema.

d3caf commented 1 week ago

@zstix

datasourceIds are not one of the required fields for the quickstart schema.

Huh. I'm getting the following error in the validation.

Error: schema is invalid: data/properties/datasourceIds/enum must NOT have fewer than 1 items
    at Ajv.validateSchema (/Users/aanguiano/c/newrelic-quickstarts/utils/node_modules/ajv/lib/core.ts:519:18)
    at Ajv._addSchema (/Users/aanguiano/c/newrelic-quickstarts/utils/node_modules/ajv/lib/core.ts:721:30)
    at Ajv.compile (/Users/aanguiano/c/newrelic-quickstarts/utils/node_modules/ajv/lib/core.ts:384:22)
    at Ajv.validate (/Users/aanguiano/c/newrelic-quickstarts/utils/node_modules/ajv/lib/core.ts:361:16)
    at main (/Users/aanguiano/c/newrelic-quickstarts/utils/build-validate-quickstart-artifact.ts:31:7)
    at Object.<anonymous> (/Users/aanguiano/c/newrelic-quickstarts/utils/build-validate-quickstart-artifact.ts:37:3)
    at Module._compile (node:internal/modules/cjs/loader:1233:14)
    at Module.m._compile (/Users/aanguiano/.local/share/mise/installs/node/20.5.1/lib/node_modules/ts-node/src/index.ts:1618:23)
    at Module._extensions..js (node:internal/modules/cjs/loader:1287:10)
    at Object.require.extensions.<computed> [as .ts] (/Users/aanguiano/.local/share/mise/installs/node/20.5.1/lib/node_modules/ts-node/src/index.ts:1621:12)

I thought that was coming from the quickstart definition but I guess it's not. The top-level datasourceIds field has an array of datasource ID strings in it so I'm confused what the issue is.

[!NOTE] Resolved! Needed to make some tweaks to the pathing for the type references in the schema 👍 d1a7f255f2c3e5a20dc1e8112eb47d0bdf684990

zstix commented 1 week ago

@d3caf Enums require at least one entry. We plan on populating that field at build time (the script included in this PR) so that should always have items in it.

zstix commented 1 week ago

Next steps

[x] Cherry-pick commit from other branch for schema
[x] Ensure consistent dataSource naming (capital S)

newrelic / newrelic-quickstarts