jupyter / telemetry

Configurable event-logging for Jupyter applications and extensions.
https://jupyter-telemetry.readthedocs.io
BSD 3-Clause "New" or "Revised" License
50 stars 20 forks source link

$id must be valid URI in json schema #56

Open kiendang opened 4 years ago

kiendang commented 4 years ago

Problem

Since JSON Schema Draft 6, $id is required to be a valid URI reference as defined in RFC3986, section 4.1, which is either a URI (e.g., http://eventlogging.jupyter.org/event-schema) or a relative reference ( e.g., /event-schema). Currently, the $ids used in event schemas across different Jupyter project do not follow this rule:

binderhub.jupyter.org/launch hub.jupyter.org/server-action

They lack the scheme part of the URI and thus are not valid URIs.

Due to this the schemas are not guaranteed to always work with JSON schema validators. One example is when trying to use $ref

import jsonschema

schema = {
    "$id": "hub.jupyter.org/example-schema",
    "properties": {
        "requester": {"$ref": "#/definitions/user"},
        "target_user": {"$ref": "#/definitions/user"}
    },
    "definitions": {
        "user": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "id": {"type": "string"}
            }
        }
    }
}

instance = {
    "requester": {"name": "a", "id": "1"},
    "target_user": {"name": "b", "id": "2"}
}

jsonschema.validate(instance, schema)

This would fail

jsonschema.exceptions.RefResolutionError: unknown url type: 'hub.jupyter.org/hub.jupyter.org/example-schema'

Change to "$id": "http://hub.jupyter.org/example-schema" or "$id": "/example-schema" and it validates fine.

There are potentially other undiscovered problems as well.

Proposed solution

Either change $id to fully formed URI or keep it as a relative reference. The later is what's being used in MediaWiki eventlogging (example)

kiendang commented 3 years ago

$id can just start with /, e.g. /hub/server-event is a valid $id. This is also how MediaWiki set the $ids for their schemas. Example: https://schema.wikimedia.org/repositories/primary/jsonschema/mediawiki/user/blocks-change/current.yaml See first comment.

yuvipanda commented 3 years ago

Thanks for writing this up! Looks like Mediawiki's $id aren't intended to be URIs, so it makes sense they would use the relative path. Since we have a domain component already, let's add a schema and do https://?

kiendang commented 3 years ago

MediaWiki's $id are actual relative URIs though. They resolve against the origin https://schema.wikimedia.org/repositories/primary/jsonschema. For example https://schema.wikimedia.org/repositories/primary/jsonschema/mediawiki/user/blocks-change/1.1.0 contains the schema with $id /mediawiki/user/blocks-change/1.1.0. Relative URIs might provide us with some flexibility down the line.

Since we have a domain component already, let's add a schema and do https://?

I'm ok with this. I do think absolute URIs work for us better since we have multiple domains for different schemas, especially with client events. Thanks!