koxudaxi / datamodel-code-generator

Pydantic model and dataclasses.dataclass generator for easy conversion of JSON, OpenAPI, JSON Schema, and YAML data sources.
https://koxudaxi.github.io/datamodel-code-generator/
MIT License
2.65k stars 296 forks source link

Code generation crashes with "black.parsing.InvalidInput: Cannot parse" #1969

Open jstasiak opened 4 months ago

jstasiak commented 4 months ago

Describe the bug datamodel-codegen crashed with an error instead of generating code:

% poetry run datamodel-codegen --input openapi.yml --input-file-type openapi --output models.py
Traceback (most recent call last):
  File "/Users/user/Library/Caches/pypoetry/virtualenvs/spreadsheet-offset-tool-L93AmRO5-py3.12/lib/python3.12/site-packages/datamodel_code_generator/__main__.py", line 447, in main
    generate(
  File "/Users/user/Library/Caches/pypoetry/virtualenvs/spreadsheet-offset-tool-L93AmRO5-py3.12/lib/python3.12/site-packages/datamodel_code_generator/__init__.py", line 468, in generate
    results = parser.parse()
              ^^^^^^^^^^^^^^
  File "/Users/user/Library/Caches/pypoetry/virtualenvs/spreadsheet-offset-tool-L93AmRO5-py3.12/lib/python3.12/site-packages/datamodel_code_generator/parser/base.py", line 1304, in parse
    body = code_formatter.format_code(body)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/Library/Caches/pypoetry/virtualenvs/spreadsheet-offset-tool-L93AmRO5-py3.12/lib/python3.12/site-packages/datamodel_code_generator/format.py", line 226, in format_code
    code = self.apply_black(code)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/Library/Caches/pypoetry/virtualenvs/spreadsheet-offset-tool-L93AmRO5-py3.12/lib/python3.12/site-packages/datamodel_code_generator/format.py", line 234, in apply_black
    return black.format_str(
           ^^^^^^^^^^^^^^^^^
  File "src/black/__init__.py", line 1225, in format_str
  File "src/black/__init__.py", line 1239, in _format_str_once
  File "src/black/parsing.py", line 90, in lib2to3_parse
black.parsing.InvalidInput: Cannot parse: 12:0: class NullEnum(BaseModel):

To Reproduce

The OpenAPI schema used:

openapi: 3.0.1

components:
  schemas:
    NullEnum:
      # Yes, this is a weird type. It has a reason to exist in this form.
      type: string
      enum:
        - null
      nullable: true

Used commandline:

$ datamodel-codegen --input openapi.yml --input-file-type openapi --output models.py

Expected behavior I'd expect the code to be generated successfully.

Version:

% pip freeze      
annotated-types==0.7.0
argcomplete==3.3.0
black==24.4.2
click==8.1.7
coverage==7.5.1
datamodel-code-generator==0.25.6
dnspython==2.6.1
email_validator==2.1.1
genson==1.3.0
idna==3.7
inflect==5.6.2
iniconfig==2.0.0
isort==5.13.2
Jinja2==3.1.4
MarkupSafe==2.1.5
mypy==1.10.0
mypy-extensions==1.0.0
packaging==24.0
pathspec==0.12.1
platformdirs==4.2.2
pluggy==1.5.0
pydantic==2.7.1
pydantic_core==2.18.2
pytest==8.2.1
pytest-cov==5.0.0
PyYAML==6.0.1
ruff==0.4.4
typing_extensions==4.11.0

Additional context If I modify datamodel-code-generator locally to skip the formatting step this is the code it generates:

# generated by datamodel-codegen:
#   filename:  openapi.yml
#   timestamp: 2024-05-23T11:52:03+00:00

from __future__ import annotations

from enum import Enum
from typing import Optional

from pydantic import BaseModel

class NullEnumEnum(Enum):

class NullEnum(BaseModel):
    __root__: Optional[NullEnumEnum] = None
iodbh commented 4 months ago

I have the same issue:

black.parsing.InvalidInput: Cannot parse: 107:55:     title: constr(pattern=r'[\s\w\{\}\$\-\(\)\.\[\]"\\'_/\\,\*\+\#:@!?;=]*') = Field(..., description='Human readable title of the case enquiry')

In this case, it is caused by regexes containing escape sequences that are not properly escaped in the generated code