Closed pederhan closed 1 year ago
Somewhat big problem: model overrides don't affect generated models that reference them. The generated models will still reference the original (generated) model, not the override. In order to fix this, we override the models that reference the model we have overridden.
This is not ideal, and introduces a lot of room for human error. Certain tests are added to ensure models behave as expected, but tests are also written manually, and they, too, can miss important changes.
All these overrides really make it necessary to document these models clearly in the documentation. We can't expect users to jump through the whole class hierarchy to get a decent idea of how the model is structured.
To that end, we need to find out how we can use mkdocstrings or something similar to generate documentation from Pydantic models, because as it stands, it's very messy.
A new problem has come to light: datamodel-code-generator has introduced a breaking change some time between version 0.13.0 and 0.17.0.
Models such as ExtraAttrs
are generated as models with a single field __root__: T
instead of an empty model with extra = "allow"
. This is technically a more correct approach, as it attempts to follow the spec more closely, but is actually a huge problem in our case.
Specifically ExtraAttrs
, and probably others too, is specified as following in swagger.yaml:
ExtraAttrs:
type: object
additionalProperties:
type: object
Which in datamodel-code-generator 0.13.0 generates the following model:
class ExtraAttrs(BaseModel):
pass
class Config:
extra = Extra.allow
But in 0.17.0 generates this model:
class ExtraAttrs:
__root__: Optional[Dict[str, Dict[str, Any]]] = None
The latter (0.17.0) is technically more correct when you take into consideration the spec, but the only problem is: THE SPEC IS WRONG!
Here is an example of what the API returns from GET /projects/library/repositories/ubuntu/artifacts/latest
:
{
"id": 2,
"type": "IMAGE",
"media_type": "application/vnd.docker.container.image.v1+json",
"manifest_media_type": "application/vnd.docker.distribution.manifest.v2+json",
"project_id": 1,
"repository_id": 2,
"digest": "sha256:965fbcae990b0467ed5657caceaec165018ef44a4d2d46c7cdea80a9dff0d1ea",
"size": 30430700,
"icon": "sha256:0048162a053eef4d4ce3fe7518615bef084403614f8bca43b40ae2e762e11e06",
"push_time": "2023-02-05T12:20:35.997000+00:00",
"pull_time": "2023-02-06T10:50:38.829000+00:00",
"extra_attrs": {
"config": {
"Cmd": [
"bash"
],
"Env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
]
},
"created": "2022-12-09T01:20:31.321639501Z",
"os": "linux",
"author": "",
"architecture": "amd64"
},
"annotations": null,
"references": null,
"tags": [
{
"id": 2,
"repository_id": 2,
"artifact_id": 2,
"name": "latest",
"push_time": "2023-02-05T12:20:36.013000+00:00",
"pull_time": "2023-02-05T17:38:27.844000+00:00",
"immutable": false,
"signed": false
}
],
"addition_links": {
"build_history": {
"absolute": false,
"href": "/api/v2.0/projects/library/repositories/ubuntu/artifacts/sha256:965fbcae990b0467ed5657caceaec165018ef44a4d2d46c7cdea80a9dff0d1ea/additions/build_history"
},
"vulnerabilities": {
"absolute": false,
"href": "/api/v2.0/projects/library/repositories/ubuntu/artifacts/sha256:965fbcae990b0467ed5657caceaec165018ef44a4d2d46c7cdea80a9dff0d1ea/additions/vulnerabilities"
}
},
"labels": null,
"scan_overview": null,
"accessories": null
}
Notice how extra_attrs
is NOT a dict of dicts, but rather a dict of Any
. We observe the values to be both dicts and strings. As such, the definition generated by 0.13.0 is more correct, as it doesn't make incorrect assumptions about the shape of the data. I need to read up on OpenAPI/Swagger to understand if this is a problem with the spec or with datamodel-code-generator, but either way, for now we will have to continue using 0.13.0
Another thing to note about __root__
models is that they are going away in Pydantic V2, so we should steer clear of these as much as we can. Hopefully datamodel-code-generator introduces something to avoid generating root models altogether in the future.
As of e06b6cd8933807348fd933c8d85b47fac547c114, auto-generated Harbor API models are located in
harborapi.models.models
andharborapi.models.scanner
. In these modules, we have made manual modifications to the model definitions to fix errors in the spec and/or add new functionality to the models. This has a major drawback; updating the models from updated spec definitions is a cumbersome and manual process, as the modifications are overwritten by the new auto-generated definitions.Changing the models directly makes it very difficult to generate new models definitions whenever the Harbor API spec and the Scanner API spec are updated. This is because any new definitions will overwrite the changes we have made, and thus every update would have to be manually incorporated into the
models.models
andmodels.scanner
modules.Changes
To remedy the problem outlined above, this pull request puts the auto generated models into their own modules (
harborapi.models._models
andharborapi.models._scanner
), and modifies them inharborapi.models.models
andharborapi.models.scanner
respectively. These modules re-export the models from_models
and_scanner
, thus maintaining backwards compatibility by keeping the API for users unchanged.Auto generating new models going forward
Auto-generating models only requires running
just genapi
andjust genscanner
. Since the definitions and modifications/overrides are stored separately, generating new models should have no effect on the overrides - unless the models are changed by the update in a way that is not compatible with the current overrides. Re-run tests each time models are generated to verify that the new definitions are compatible with the application, and makes changes accordingly.Tests
New tests have been added to ensure that the modified models are compatible with the generated models. The tests check the modified fields for deviations between the generated models and the modified models. For now, these tests are quite specific and strict, wherein the modified fields are manually specified. This is potentially not optimal, and we could look into adding some sort of
_overrides
field to modified models, so we can automate the testing process:https://github.com/pederhan/harborapi/blob/f6bfa6f981c2d6f410d379d49c7e8d76f813ec29/tests/models/test_scanner.py#L26-L33
We manually specify the modified fields in the tests, and which field attributes we expect to diverge (if any). Default field value is not compared, since that is the most likely override.
In pull request
In the future
The testing regime will be outlined in a future pull request adding a development guide.