sbrunner / jsonschema-gentypes

Tool to generate Python types based on TypedDict from a JSON Schema
BSD 2-Clause "Simplified" License
38 stars 12 forks source link

Invalid Python Code and Type Definitions Generated #998

Open holmanb opened 3 months ago

holmanb commented 3 months ago

First of all, great work. The idea of generating type definitions from a jsonschema is exciting, and I'd like to be able to put this to good use.

I noticed two bugs, and as well as some other issues that are likely less impactful right now.

1) When I feed this schema to jsonschema-gentypes, it produces invalid Python.

2) If I work around 1), one of the schema keys is defined as a variable rather than a type alias, which introduces lots of noisy warnings when used with mypy.

For the first issue, I noticed that the invalid Python is on lines that are whitespace with a comment. I'm not sure what the issue is, but it goes away when I delete the line.

For the second issue, mypy produced a convenient warning message which hinted at the issue with this link: https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases

When I checked the line that it was complaining about I noticed that it was just a variable definition something = None. Once I redefined this variable as a TypeAlias, those issues went away.

Reproducer:

git clone https://github.com/canonical/cloud-init.git
cd cloud-init
python3 -m venv .venv
. .venv/bin/activate
pip3 install jsonschema-gentypes tox -r test-requirements.txt
jsonschema-gentypes --json-schema=./cloudinit/config/schemas/schema-cloud-config-v1.json --python=cloudinit/config/config_type.py
echo "from cloudinit.config.config_type import _Root as Config" > cloudinit/config/__init__.py
tox -e mypy

The syntax failure:

$ tox -e mypy
mypy: commands[0]> .tox/mypy/bin/python -m mypy cloudinit/ tests/ tools/
cloudinit/config/config_type.py:1329: error: invalid syntax
Found 1 error in 1 file (errors prevented further checking)
mypy: exit 2 (0.35 seconds) /home/holmanb/ci> .tox/mypy/bin/python -m mypy cloudinit/ tests/ tools/ pid=561085
  mypy: FAIL code 2 (0.39=setup[0.04]+cmd[0.35] seconds)
  evaluation failed :( (0.50 seconds)

When I delete all comment lines it gets a little further:

$ sed -i 's/^\s*#.*//' cloudinit/config/config_type.py
$ tox -e mypy
cloudinit/config/config_type.py:1762: SyntaxWarning: invalid escape sequence '\w'
  """
cloudinit/config/config_type.py:2136: SyntaxWarning: invalid escape sequence '\$'
  """
cloudinit/config/config_type.py:3049: SyntaxWarning: invalid escape sequence '\+'
  """
cloudinit/config/config_type.py:3660: SyntaxWarning: invalid escape sequence '\.'
  """ pattern: ^([0-9]+)?\.?[0-9]+[BKMGT]$ """
cloudinit/config/config_type.py:3671: SyntaxWarning: invalid escape sequence '\.'
  """ pattern: ^([0-9]+)?\.?[0-9]+[BKMGT]$ """
cloudinit/config/config_type.py:1: error: Module "typing" has no attribute "Required"
cloudinit/config/config_type.py:9: error: Name "_DEFSALLMODULES_APK_CONFIGURE" already defined on line 7
...
cloudinit/config/config_type.py:589: error: Variable "cloudinit.config.config_type._DefsUsersGroupsFullStopUserObjectGroupsOneof2Type" is not valid as a type
cloudinit/config/config_type.py:589: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases

...
...
cloudinit/config/config_type.py:589: error: Variable "cloudinit.config.config_type._DefsUsersGroupsFullStopUserObjectGroupsOneof2Type" is not valid as a type
cloudinit/config/config_type.py:589: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
cloudinit/config/cc_zypper_add_repo.py:589: error: Name "_DefsUsersGroupsFullStopUserObjectGroupsOneof2Type" is not defined
...
cloudinit/config/cc_yum_add_repo.py:589: error: Name "_DefsUsersGroupsFullStopUserObjectGroupsOneof2Type" is not defined
...

I'm running 3.12, which probably explains the syntax warnings. The "has no attribute" error has me puzzled, I can't reproduce it just by running that import inside of the venv. The "already defined on line" error is due to some keys that we have which are duplicates, we might be able to change this but I know that there was a good reason to add them when those were introduced.

The is not valid as a type error is due to a line which contains:

_DefsUsersGroupsFullStopUserObjectGroupsOneof2Type = None

It goes away if I redefine it as a TypeAlias

_DefsUsersGroupsFullStopUserObjectGroupsOneof2Type: TypeAlias = str

With that issue resolved, I think that most if the issues I see are due mypy reporting valid issues.

I'll be happy to test a fix if you figure out the issues here. My project could potentially even use jsonschema-gentypes as part of a CI job for my project if these issues get resolved.

sbrunner commented 3 months ago

The issue I see with your schema file:

mypy don't like when we have a comment with ^ *type: then I think that we should add something like | before the code blocks.

The "has no attribute" error has me puzzled, I can't reproduce it just by running that import inside of the venv. It's looks like that you do a check with Python 3.10 (this Class is new in Python 3.11)

The "already defined on line" error is due to some keys that we have which are duplicates, we might be able to change Effectively, this is an error. We shouldn't define tow times the same constant, the second one should be postfixes with a random number or a counter.

The is not valid as a type error is due to a line which contains:... In fact, I don't understand this part of the JSON schema:

{
"type": "object",
"patternProperties": {
"^.+$": {
"label": "<group_name>",
"description": "When providing an object for users.groups the ``<group_name>`` keys are the groups to add this user to",
"deprecated": true,
"deprecated_version": "23.1",
"type": [
"null"
],
"minItems": 1
}
},
"hidden": [
"patternProperties"
]
}

We can add the TypeAlias just to make it passing.

holmanb commented 3 months ago

Thanks for the response @sbrunner!

The "already defined on line" error is due to some keys that we have which are duplicates, we might be able to change

Effectively, this is an error. We shouldn't define tow times the same constant, the second one should be postfixes with a random number or a counter.

That sounds like a reasonable fix to me. As long as the correct class ends up in the top level schema class (the name is something like _Root... if I recall correctly), that should work.

The is not valid as a type error is due to a line which contains:...

In fact, I don't understand this part of the JSON schema:

              {
                "type": "object",
                "patternProperties": {
                  "^.+$": {
                    "label": "<group_name>",
                    "description": "When providing an object for users.groups the ``<group_name>`` keys are the groups to add this user to",
                    "deprecated": true,
                    "deprecated_version": "23.1",
                    "type": [
                      "null"
                    ],
                    "minItems": 1
                  }
                },
                "hidden": [
                  "patternProperties"
                ]
              }

Thanks for drawing my attention to that. The "null" doesn't make sense here I think that this was supposed to be "string" or "array".

holmanb commented 3 months ago

Thanks for drawing my attention to that. The "null" doesn't make sense here I think that this was supposed to be "string" or "array".

Update: Apparently the only valid value is "null", and this is deprecated. Supporting adding the TypeAlias here makes sense, I think, and would be much appreciated!

holmanb commented 3 months ago

@sbrunner Thanks for the prompt responses!

For someone that is new to the codebase, do you have any suggestions about where in the codebase changes might need to be made to resolve duplicate keys and ensure that TypeAlias is set?

sbrunner commented 3 months ago

This line https://github.com/sbrunner/jsonschema-gentypes/blob/master/jsonschema_gentypes/__init__.py#L441 should probably be replaced by:

_type = ": TypeAlias" if isinstance(self.subtype, BuiltinType) and self.subtype.name() == "None" else ""
result.append(f"{self._name}{_type} = {self.sub_type.name()}")