yanyongyu / githubkit

The modern, all-batteries-included GitHub SDK for Python, including rest api, graphql, webhooks, like octokit!
MIT License
158 stars 21 forks source link

Getting "RuntimeError: GitHub client has already been collected." error while using API from a Task #109

Closed FeodorFitsner closed 3 weeks ago

FeodorFitsner commented 3 weeks ago

I have the following module:

import logging

from githubkit import AppAuthStrategy, AppInstallationAuthStrategy, GitHub
from githubkit.exception import RequestFailed

from textmd import config

logger = logging.getLogger("pull_request_opened")

def installation_github(installation_id: int):
    return GitHub(
        AppInstallationAuthStrategy(
            config.GitHubAppID,
            config.GithubAppPrivateKey.replace("\\n", "\n"),
            installation_id,
        )
    )

async def pull_request_opened(
    installation_id: int,
    owner: str,
    repo: str,
    pull_request_id: int,
    settings: dict[str, str],
):
    logger.info(
        f"Start: installation_id={installation_id}, pull_request_id={pull_request_id}"
    )

    pr = (
        await installation_github(installation_id).rest.pulls.async_get(
            owner, repo, pull_request_id
        )
    ).parsed_data

    logger.info(f"Pull request title: {pr.title}")

While trying to call pull_request_opened() in a Task:

asyncio.create_task(
            pull_request_opened(
                event.installation.id,
                event.repository.owner.login,
                event.repository.name,
                event.pull_request.id,
                {},
            )

I'm getting error:

RuntimeError: GitHub client has already been collected. Do not use the namespace after the client has been collected.
FeodorFitsner commented 3 weeks ago

It gives the same error if invoked as a background task in FastAPI as well:

@app.post("/github/webhook")
async def github_webhook(request: Request, background_tasks: BackgroundTasks):
    ...
     if isinstance(event, WebhookPullRequestReopened):
        background_tasks.add_task(
            pull_request_opened,
            event.installation.id,
            event.repository.owner.login,
            event.repository.name,
            event.pull_request.id,
            {},
        )
FeodorFitsner commented 3 weeks ago

Found the solution. It works with with:

    with github.installation_github(installation_id) as installation_github:
        pr = (
            await installation_github.rest.pulls.async_get(
                owner, repo, pull_request_number
            )
        ).parsed_data

        logger.info(f"Pull request title: {pr.title}")

But could you please explain why?

yanyongyu commented 3 weeks ago

The error occurs when the github client instance has been collected by the gc. I'm not sure why the instance is collected in your code, could you please try something like this:

def installation_github(installation_id: int):
    return GitHub(
        AppInstallationAuthStrategy(
            config.GitHubAppID,
            config.GithubAppPrivateKey.replace("\\n", "\n"),
            installation_id,
        )
    )

async def pull_request_opened(
    installation_id: int,
    owner: str,
    repo: str,
    pull_request_id: int,
    settings: dict[str, str],
):
    logger.info(
        f"Start: installation_id={installation_id}, pull_request_id={pull_request_id}"
    )

    # keep a reference to the instance
    g = installation_github(installation_id)
    pr = (
        await g.rest.pulls.async_get(
            owner, repo, pull_request_id
        )
    ).parsed_data

    logger.info(f"Pull request title: {pr.title}")

and which python version you are using?

yanyongyu commented 3 weeks ago

I reproduce the error with a minimal example. Now i'm sure that this error is caused by not holding a reference to the installation_github() function call return value. GC immediately collect the return value after you get the .rest attribute. The github instance is not valid any more.

Following is the minimal example:

import weakref

class A:
    def __init__(self) -> None:
        weakref.finalize(self, lambda: print("finalize"))

    @property
    def b(self) -> "B":
        return B(self)

class B:
    def __init__(self, a: A) -> None:
        self._a = weakref.ref(a)

    @property
    def a(self) -> A:
        if a := self._a():
            return a
        raise RuntimeError("a is dead")

def func() -> A:
    return A()

# This will cause error
# func return value is collected immediately after getting b
func().b.a

# This is ok
a = func()
a.b.a
FeodorFitsner commented 3 weeks ago

Yep, this works:

    g = installation_github(installation_id)
    return (await g.rest.pulls.async_get(owner, repo, pull_request_number)).parsed_data

Thanks for checking on this. You would probably need to mention that behaviour in the readme to help others.

Still enjoying the library - probably the best GitHub API client for Python with async support!