sbrunner / jsonschema-gentypes

Tool to generate Python types based on TypedDict from a JSON Schema
BSD 2-Clause "Simplified" License
39 stars 13 forks source link

Can't handle .. paths + ref path resolution properly #850

Open dbushong opened 9 months ago

dbushong commented 9 months ago

If you run jsonschema-gentypes with a path like --json=../path/to/some.json, and that file has relative $ref entries in it, resolution will fail.

Repro:

$ cat base.json
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$ref": "sub.json"
}
$ cat sub.json
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type":"object",
  "properties": { "x": { "type": "string" } }
}
$ jsonschema-gentypes --json=base.json --python=out.py
Processing base.json
$ cd subdir
$ jsonschema-gentypes --json=../base.json --python=out.py
Processing ../base.json
Traceback (most recent call last):
  File "/Users/dbushong/.pyenv/versions/3.8.14/lib/python3.8/site-packages/referencing/_core.py", line 417, in get_or_retrieve
    resource = registry._retrieve(uri)
  File "/Users/dbushong/.pyenv/versions/3.8.14/lib/python3.8/site-packages/jsonschema_gentypes/resolver.py", line 83, in _open_uri_resolver
    my_resource = referencing.Resource.from_contents(_open_uri(uri))
  File "/Users/dbushong/.pyenv/versions/3.8.14/lib/python3.8/site-packages/jsonschema_gentypes/resolver.py", line 63, in _open_uri
    with open(uri, encoding="utf-8") as open_file:
FileNotFoundError: [Errno 2] No such file or directory: 'sub.json'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/dbushong/.pyenv/versions/3.8.14/lib/python3.8/site-packages/referencing/_core.py", line 667, in lookup
    retrieved = self._registry.get_or_retrieve(uri)
  File "/Users/dbushong/.pyenv/versions/3.8.14/lib/python3.8/site-packages/referencing/_core.py", line 424, in get_or_retrieve
    raise exceptions.Unretrievable(ref=uri) from error
referencing.exceptions.Unretrievable: 'sub.json'

Note that it works fine if you don't have .. in your path

passchieri commented 1 month ago

I believe this is due to that RefResolver does not take into account relative references, i.e., its not looking for file relative to the file the reference is coming from. This can be fixed by adding the base location to RefResolver, or by supplying the base location when calling resolver.lookup from API, or even resolving relative locations in API before calling resolver.lookup.

For the simple case of a relative reference from the provided schema file, it does not make a big difference, but fixing it in API keeps changes localized to only API, and not to RefResolver, which I can image is then preferred.

This will, however, only provide a partial solution, because if a file is referenced that has relative references itself, then these subsequent references need to be resolved relative to a new base. So that need fixes by e.g. instantiating new RefResolvers for each external reference, but I cant see what that would do to the caching. Or we need to track the full reference tree somehow in API, and flatten it somehow to a single uri before calling resolver.lookup. But also in that case, I cant oversee the complexity with a combination of absolute and relative references.

A much more fundamental change would be to mimic the complete reference tree in a complete python code tree: every schema source file would generate its own python file, referencing each other in the same way as schema files do. But that would be such a major (breaking) change that that is probably out of scope anyway.