meltano / sdk

Write 70% less code by using the SDK to build custom extractors and loaders that adhere to the Singer standard: https://sdk.meltano.com
https://sdk.meltano.com
Apache License 2.0
100 stars 70 forks source link

JSON Schema vocabulary for Singer schemas #2783

Open edgarrmondragon opened 4 hours ago

edgarrmondragon commented 4 hours ago

Stub.


For example OpenAPI defines its own vocabulary on top of Draft 2020-12

  1. Custom data types and formats: https://github.com/OAI/OpenAPI-Specification/blob/main/versions/3.1.0.md#data-types

  2. Examples: https://github.com/OAI/OpenAPI-Specification/blob/main/versions/3.1.0.md#schema-object-examples

  3. https://spec.openapis.org/oas/3.1/dialect/base

{
    "$id": "https://spec.openapis.org/oas/3.1/dialect/base",
    "$schema": "https://json-schema.org/draft/2020-12/schema",

    "title": "OpenAPI 3.1 Schema Object Dialect",
    "description": "A JSON Schema dialect describing schemas found in OpenAPI documents",

    "$vocabulary": {
        "https://json-schema.org/draft/2020-12/vocab/core": true,
        "https://json-schema.org/draft/2020-12/vocab/applicator": true,
        "https://json-schema.org/draft/2020-12/vocab/unevaluated": true,
        "https://json-schema.org/draft/2020-12/vocab/validation": true,
        "https://json-schema.org/draft/2020-12/vocab/meta-data": true,
        "https://json-schema.org/draft/2020-12/vocab/format-annotation": true,
        "https://json-schema.org/draft/2020-12/vocab/content": true,
        "https://spec.openapis.org/oas/3.1/vocab/base": false
    },

    "$dynamicAnchor": "meta",

    "allOf": [
        { "$ref": "https://json-schema.org/draft/2020-12/schema" },
        { "$ref": "https://spec.openapis.org/oas/3.1/meta/base" }
    ]
}
  1. https://spec.openapis.org/oas/3.1/meta/base

{
    "$id": "https://spec.openapis.org/oas/3.1/meta/base",
    "$schema": "https://json-schema.org/draft/2020-12/schema",

    "title": "OAS Base vocabulary",
    "description": "A JSON Schema Vocabulary used in the OpenAPI Schema Dialect",

    "$vocabulary": {
        "https://spec.openapis.org/oas/3.1/vocab/base": true
    },

    "$dynamicAnchor": "meta",

    "type": ["object", "boolean"],
    "properties": {
        "example": true,
        "discriminator": { "$ref": "#/$defs/discriminator" },
        "externalDocs": { "$ref": "#/$defs/external-docs" },
        "xml": { "$ref": "#/$defs/xml" }
    },

    "$defs": {
        "extensible": {
            "patternProperties": {
                "^x-": true
            }
        },

        "discriminator": {
            "$ref": "#/$defs/extensible",
            "type": "object",
            "properties": {
                "propertyName": {
                    "type": "string"
                },
                "mapping": {
                    "type": "object",
                    "additionalProperties": {
                        "type": "string"
                    }
                }
            },
            "required": ["propertyName"],
            "unevaluatedProperties": false
        },

        "external-docs": {
            "$ref": "#/$defs/extensible",
            "type": "object",
            "properties": {
                "url": {
                    "type": "string",
                    "format": "uri-reference"
                },
                "description": {
                    "type": "string"
                }
            },
            "required": ["url"],
            "unevaluatedProperties": false
        },

        "xml": {
            "$ref": "#/$defs/extensible",
            "type": "object",
            "properties": {
                "name": {
                    "type": "string"
                },
                "namespace": {
                    "type": "string",
                    "format": "uri"
                },
                "prefix": {
                    "type": "string"
                },
                "attribute": {
                    "type": "boolean"
                },
                "wrapped": {
                    "type": "boolean"
                }
            },
            "unevaluatedProperties": false
        }
    }
}

Related:

edgarrmondragon commented 3 hours ago

https://datatracker.ietf.org/doc/html/draft-bhutton-json-schema-validation-00#section-7.2.3

Vocabularies do not support specifically declaring different value sets for keywords. Due to this limitation, and the historically uneven implementation of this keyword, it is RECOMMENDED to define additional keywords in a custom vocabulary rather than additional format attributes if interoperability is desired.

This means we probably should not use custom format values, but rather define custom keywords (e.g. dbtype).