yanyongyu / githubkit

The modern, all-batteries-included GitHub SDK for Python, including rest api, graphql, webhooks, like octokit!
MIT License
159 stars 21 forks source link

Should `Missing` type implicitly allow `None`? #47

Closed frankie567 closed 9 months ago

frankie567 commented 9 months ago

GitHub API schema is not always consistent with the required and nullable fields. Currently, this is solved by maintaining the schema_overrides table and report them to GitHub (hoping they'll solve it).

Unfortunately, this is fragile and can break code if GitHub suddenly decides to pop-up nullable fields in its API response. For example, this was working well some days ago:

from githubkit import GitHub
github = GitHub()
r = github.rest.orgs.get("frankie567-test-org-renamed")
r.parsed_data

But not anymore:

Traceback (most recent call last): File "", line 1, in File "/Users/fvoron/Development/githubkit/githubkit/response.py", line 50, in parsed_data return parse_raw_as(self._data_model, self._response.content) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "pydantic/tools.py", line 82, in pydantic.tools.parse_raw_as File "pydantic/tools.py", line 38, in pydantic.tools.parse_obj_as File "pydantic/main.py", line 341, in pydantic.main.BaseModel.init pydantic.error_wrappers.ValidationError: 2 validation errors for ParsingModel[OrganizationFull] root -> name none is not an allowed value (type=type_error.none.not_allowed) root -> blog none is not an allowed value (type=type_error.none.not_allowed)


That's why I was wondering if we could tweak the Missing type so it implicitly allows None. So basically this part:

https://github.com/yanyongyu/githubkit/blob/fd80590ffd0efbed97398421f5e614075b740f75/githubkit/utils.py#L42

becomes:

Missing: TypeAlias = Union[Literal[UNSET], T, None]

Admittedly, this is not perfectly accurate regarding the schema but it seems more bullet-proof while GitHub make up their mind about this.

yanyongyu commented 9 months ago

The Missing type is designed for field not exist in the data not only for API response but also for the request schema. Modifying this type may affect the entire behavior.

GitHub API may return None due to permission issues or some other server-side errors. The parsed_data is fully validated and designed to provide complete type hints to avoid programing errors that caused by type errors. If you do not need such feature, response.json() is also available for accessing the raw response data.

frankie567 commented 9 months ago

Thank you for your answer, but that's not exactly my point. Validation is useful and desired, of course.

The thing is, GitHub schema is not really reliable regarding nullable or missing fields, as witnessed by the long list of overrides you maintain: https://github.com/yanyongyu/githubkit/blob/fd80590ffd0efbed97398421f5e614075b740f75/pyproject.toml#L105-L205

If I take my example again, calling the following API https://api.github.com/orgs/frankie567-test-org-renamed returns this:

{
  "login": "frankie567-test-org-renamed",
  "id": 145430071,
  "node_id": "O_kgDOCKsWNw",
  "url": "https://api.github.com/orgs/frankie567-test-org-renamed",
  "repos_url": "https://api.github.com/orgs/frankie567-test-org-renamed/repos",
  "events_url": "https://api.github.com/orgs/frankie567-test-org-renamed/events",
  "hooks_url": "https://api.github.com/orgs/frankie567-test-org-renamed/hooks",
  "issues_url": "https://api.github.com/orgs/frankie567-test-org-renamed/issues",
  "members_url": "https://api.github.com/orgs/frankie567-test-org-renamed/members{/member}",
  "public_members_url": "https://api.github.com/orgs/frankie567-test-org-renamed/public_members{/member}",
  "avatar_url": "https://avatars.githubusercontent.com/u/145430071?v=4",
  "description": null,
  "name": null,
  "company": null,
  "blog": null,
  "location": null,
  "email": null,
  "twitter_username": null,
  "is_verified": false,
  "has_organization_projects": true,
  "has_repository_projects": true,
  "public_repos": 3,
  "public_gists": 0,
  "followers": 0,
  "following": 0,
  "html_url": "https://github.com/frankie567-test-org-renamed",
  "created_at": "2023-09-19T06:57:34Z",
  "updated_at": "2023-09-19T08:25:55Z",
  "archived_at": null,
  "type": "Organization"
}

name and blog are null. It's not a problem of permission or server error, it's just that they are not set for this organization. And it's indeed not compliant with the schema of GitHub:

{
  //...
  "organization-full": {
    "title": "Organization Full",
    "description": "Organization Full",
    "type": "object",
    "properties": {
      // ...
      "name": {
        "type": "string",
        "examples": [
          "github"
        ]
      },
      // ...
      "blog": {
        "type": "string",
        "format": "uri",
        "examples": [
          "https://github.com/blog"
        ]
      },
      // ...
    },
    "required": [
      "login",
      "url",
      "id",
      "node_id",
      "repos_url",
      "events_url",
      "hooks_url",
      "issues_url",
      "members_url",
      "public_members_url",
      "avatar_url",
      "description",
      "html_url",
      "has_organization_projects",
      "has_repository_projects",
      "public_repos",
      "public_gists",
      "followers",
      "following",
      "type",
      "created_at",
      "updated_at",
      "archived_at"
    ]
    // ...
  }
}

So, for this specific case, we could add another override in the pyproject.toml file, but it feels like cat and mouse game, even more since GitHub doesn't seem really in a hurry to fix the schema (https://github.com/github/rest-api-description/issues/1639, https://github.com/github/rest-api-description/issues/1812...)

That's why I suggested something to workaround this. Maybe not the best idea, but I think it's worth to open a discussion about it.

yanyongyu commented 9 months ago

There are indeed many inaccuracies in the rest api description file. UNSET and None are both falsy value, changing this type may have little impact programmingly (like if boolean condition). PR welcome :).
But for me, I am more inclined to correct the schema to ensure that the validation is performed correctly.

For this example (https://api.github.com/orgs/frankie567-test-org-renamed), it is caused by the errors in github's schema. The name and blog can be unset but the schema provides type string. I have submitted many issues to the description repo :joy:.

yanyongyu commented 9 months ago

From https://github.com/github/rest-api-description#contributing

This repository is automatically kept up to date with the description used to validate GitHub API requests as well as powering contract tests.

😟

frankie567 commented 9 months ago

There are indeed many inaccuracies in the rest api description file. UNSET and None are both falsy value, changing this type may have little impact programmingly (like if boolean condition). PR welcome :).

Ok 👍

But for me, I am more inclined to correct the schema to ensure that the validation is performed correctly.

Totally understandable! If you want, I can also make a PR to add the override for the error above with OrganizationFull.