singer-io / singer-python

Writes the Singer format from Python
https://singer.io
Apache License 2.0
544 stars 129 forks source link

JSONSchema Draft 7 array Tuple Validation unsupported? Schema.from_dict() raise exception #145

Open adrien-mg opened 3 years ago

adrien-mg commented 3 years ago

From the JSON Schema draft 7.0 specification: array types can be used to validate tuple as such:

{
  "type": "array",
  "items": [
    {
      "type": "something"
    },
    {
      "type": "otherthing"
    }
  ]
}

... and as such, the items property of a "type": "array" property can be a python list.

Using the above schema will raise an exception with singer-python==5.1.5 when entering the @singer.utils.handle_top_exception(LOGGER) decorator, although I also can see the error in the master version of the file: https://github.com/singer-io/singer-python/blob/master/singer/schema.py#L107 where the items variable is expected to be a dict (it can also be a list, as stated above)

I think I have a trivial fix which would be to add a isinstance(items, dict) check:

    @classmethod
    def from_dict(cls, data, **schema_defaults):
        '''Initialize a Schema object based on the JSON Schema structure.
        :param schema_defaults: The default values to the Schema
        constructor.'''
        kwargs = schema_defaults.copy()
        properties = data.get('properties')
        items = data.get('items')

        if properties is not None:
            kwargs['properties'] = {
                k: Schema.from_dict(v, **schema_defaults)
                for k, v in properties.items()
            }
        if items is not None and isinstance(items, dict):
            kwargs['items'] = Schema.from_dict(items, **schema_defaults)
        for key in STANDARD_KEYS:
            if key in data:
                kwargs[key] = data[key]
        return Schema(**kwargs)

and I'm happy to raise a PR for it. However I wanted to have the opinion of someone who's closer to the library to know if we even want to support Tuple Validation from the JSON Schema draft 7.0 specifications?

edit: made a bit more readable

modularTaco commented 3 years ago

I think jsonschema draft7 is fully unsupported, as of https://github.com/singer-io/singer-python/blob/66688d736fa699beeed63af91db2fdc1f8914bb1/setup.py#L14

and the jsonschema lib in version 2.6.0 only includes the Draft3Validator and Draft4Validator, no validators for draft 5, 6 or 7.

So if there is a specific reason for this jsonschema version, there might be other screws that need to be adjusted for a full draft 7 support.

But i started experimenting with singer-python yesterday, so i may be wrong.

Nozima29 commented 4 weeks ago

Is this issue not solved yet? I mean jsonschema==2.6.0 is quite old, some integrated systems (e.g, airflow:v2) require > 4.18. Does the suggested PR even work?