Open spyoungtech opened 1 year ago
Hey @spyoungtech, @koxudaxi
I'm also experiencing this issue of being stuck in an infinite loop when generating a datamodel out of a json schema (stuck in function: datamodel_code_generator.generate()) System: ubuntu: 22.04 python: 3.10 datamodel-code-generator: v0.17.2
I've tried to parse the OpenLabel schema json downloaded here.
The module works as expected until v0.17.1. The issues seems to be introduced with v0.17.2. Note that this is different to the comment above where the problematic version seems to be already v0.14.1
@lhmwtum
v0.17.2
is too old.
Can you try the latest version 0.25.1
?
Hey @koxudaxi, thanks for this awesome generator!
I just tried to generate model for authentik blueprints schema and experienced the same RecursionError
using the latest version 0.25.1
of datamodel-codegen.
If i remove the line
"$id": "https://goauthentik.io/blueprints/schema.json",
from the schema the recursion error is gone
Deleting the "$id" field also "solves" the problem with my JSON schema.
I've tried to figure out what's wrong and it seems that the comparison of this if statement is always True. Therefore, the same function is called over and over again resulting in the described infinite loop. I've also watched the size of self.results and it grows infinitely.
@koxudaxi do you have any idea why this happens? Could you please provide some details or hints in order to understand what's happening here and why this check is necessary? Thank you :)
EDIT: I also found out that the reserved_refs variable looks different with/without the id. The id is put at the beginning of each ref. When running the code without the id, reference.loaded == True, with id this variable is False. Furthermore, there is a different number of reserved_refs in both cases.
I've figured out that the path composition of the id field makes the difference. When there are more than two parts, the code ends up in an infinite loop.
Examples
These paths make the code fail:
"$id": "https://openlabel.asam.net/V1-0-0/schema#"
"$id": "https://goauthentik.io/blueprints/schema.json"
"$id": "https://dummy/test/path"
"$id": "https://dummy/test/test2/path"
These paths work:
"$id": "https://openlabel.asam.net/schema#"
"$id": "https://goauthentik.io/schema.json"
"$id": "https://dummy/path"
Can you confirm that @benedikt-bartscher ?
@lhmwtum thanks for investigating. I just tested the paths you provided and I can confirm the behavior.
Sorry for the late reply everyone. And thank you for your detailed research. From what I have read of your findings, it seems that the Path resolution is not working.
I've figured out that the path composition of the id field makes the difference. when there are more than two parts, the code ends up in an infinite loop.
Thank you. We will try to add more than 3 hierarchical paths to the test case.
~~I guess the unittest is broken.
The request url doesn't have the definitions
prefix.
https://github.com/koxudaxi/datamodel-code-generator/blob/b3fbbcade9814d4080098ae61ba69e6f8dd018f5/tests/test_main.py#L3090-L3153~~
@benedikt-bartscher @lhmwtum How do you reproduce the error?
https://github.com/koxudaxi/datamodel-code-generator/issues/986#issuecomment-1878607726
I saved the schema in this post as --url
or wget and ran it with --input
and the file was created without error.
However, when I ran it like cat blueprints.json | datamodel-codegen
I got stuck in an infinite loop. :thinking:
Hi @koxudaxi I am currently using a small python script, like this:
import json
import logging
from pathlib import Path
import requests
from datamodel_code_generator import DataModelType, InputFileType, generate
from datamodel_code_generator.format import PythonVersion
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# renovate: datasource=github-tags depName=goauthentik/authentik extractVersion=version/(?<version>.*)
authentik_version = "2024.2.2"
url = f"https://raw.githubusercontent.com/goauthentik/authentik/version/{authentik_version}/blueprints/schema.json"
logger.info(f"Fetching schema from {url}")
schema = requests.get(url).text
logger.info("Fetching schema done")
# logger.info("Loading schema from file")
# with open("schema.json", "r") as f:
# schema = f.read()
# load schema in dict
schemadict = json.loads(schema)
del schemadict["$id"]
# # write schema to file
# logger.info("Writing modified schema to file")
# with open("schema.json", "w") as f:
# f.write(json.dumps(schemadict, indent=4))
logger.info("Loading modified schema back to string")
schema = json.dumps(schemadict)
aliases = {
"resource": "resource_",
}
outpath = Path("src/authentik_blueprints")
logger.info(f"Creating output directory {outpath}")
outpath.mkdir(parents=True, exist_ok=True)
logger.info(f"Start generating models")
generate(
schema,
aliases=aliases,
# reuse_model=True,
input_file_type=InputFileType.JsonSchema,
target_python_version=PythonVersion.PY_312,
output=outpath,
output_model_type=DataModelType.PydanticV2BaseModel,
use_default_kwarg=True,
# modern python
use_union_operator=True,
use_standard_collections=True,
use_generic_container_types=True,
use_annotated=True,
field_constraints=True,
)
I have the schema stored locally as a json file. This is my code:
import json
from pathlib import Path
from datamodel_code_generator import InputFileType, generate
filename_openlabel_json_schema = "openlabel_json_schema_v1-0-0.json"
# get absolute path to repository
abspath_repo = Path(__file__).parent
abspath_json_schema = (abspath_repo / filename_openlabel_json_schema)
abspath_output = abspath_repo / "openlabel_annotation_schema.py"
# Load OpenLABEL JSON schema file
with open(abspath_json_schema) as fp:
json_schema = json.load(fp)
generate(
str(json_schema),
input_filename=filename_openlabel_json_schema,
input_file_type=InputFileType.JsonSchema,
reuse_model=True,
output=abspath_output,
# NOTE: set to False to suppress auto-generated doc strings which do not
# meet pep257 standards.
use_schema_description=False,
class_name="OpenLabelAnnotationSchema",
)
Describe the bug
I am trying to produce pydantic models from a JSONSchema file I have. When I try to do this, the process never finishes and just accumulates memory without end. I let it run for a while and it ended up taking up 8+GB of memory. The schema itself is a good handful of megabytes with probably over 10,000 discrete components, which could be a problem, but I believe it should stop eventually.
Eventually this stack trace is produced with a
RecursionError
:To Reproduce
The schema is too large to put in the issue, but it can be found in this gist.
Used commandline:
Expected behavior
The expectation is that the model generation eventually completes.
Version:
Additional context
The schema itself was created by dynamically generating pydantic models and dumping
model.json_schema()
. Not sure if that's relevant, but in my mind I guess it's not out of the realm of possibility that this could matter.