GitHub rate limit hit when running MFE image building in CI/CD pipelines

gabor-boros commented 1 year ago

It seems that the tutor-mfe plugin uses the GitHub API 1 that is limited to 60 requests per hour for unauthenticated requests (and 5k for authenticated ones).

Adding credentials to these requests seems to require tutor-mfe PR to add a curl or wget command call instead of the ADD defined in the Dockerfile at the moment. Docker documentation says:

If your URL files are protected using authentication, you need to use RUN wget, RUN curl or use another tool from within the container as the ADD instruction does not support authentication. – reference

Also, this would require adding optional build arguments to the Dockerfile to set the credentials.

gabor-boros commented 1 year ago

@regisb @arbrandes

In my opinion a possible solution could be (what I already mentioned in the issue description) to use RUN curl ... instead of ADD in the Dockerfile. We could have extra arguments for the Dockerfile to define the auth credentials for the call.

By default the auth credential argument could be empty, hence being backward compatible.

arbrandes commented 1 year ago

Presuming curl or wget is already in the image, it could work. Note the implied doubt: we're trying to second-guess the caching mechanism by forcing the file to be re-fetched via ADD at every single run. I don't know if RUN will have the same effect - it pretty much doesn't for the git clone line immediately following.

To test the change, you have to re-run the build after pushing a single commit to the target branch. If the commit makes it into the build, it worked. If it didn't... it didn't.

gabor-boros commented 1 year ago

@arbrandes It probably won't as the RUN command will be cached. The only way to work that around (and not disabling the cache) is to pass the commit reference directly to the Dockerfile instead of the branch name (i.e. do the api call right before building the image). That way we would still have the possibility to authenticate the API calls and to have proper layer caching.

FTR: I don't know how many Tutor plugins (or even the main Tutor repo) are using GitHub APIs like this, but the issue should be fixed at all occurrences.

gabor-boros commented 1 year ago

@regisb & @arbrandes Would you mind I'm opening a PR to work this around using the method I mentioned above?

regisb commented 1 year ago

@gabor-boros the whole point of this ADD statement is that it is not cached. Cache for subsequent steps will be cleared if there was a change upstream. This feature has proved very useful in re-building the mfe image when we target a branch, and not a tag. Such a use case happens for instance when running on nightly, or building the candidate release branch (open-release/quince.master).

Please see this PR for more information: https://github.com/overhangio/tutor-mfe/pull/131

I understand that hitting the GitHub rate limit is an issue, and we need to address that. Can we do that in a way that preserves this feature?

gabor-boros commented 1 year ago

@regisb how about what I wrote above?

pass the commit reference directly to the Dockerfile instead of the branch name (i.e. do the api call right before building the image). That way we would still have the possibility to authenticate the API calls and to have proper layer caching.

regisb commented 1 year ago

I usually lean very strongly towards pinned dependencies, and this includes explicit commit sha1 or tags. But in the present case we do not have access to the commit sha1, precisely because we are building the image for a branch. This is unusual, but a useful feature nonetheless. Does that make sense? If not, maybe we can talk about it in a call.

gabor-boros commented 1 year ago

[..] in the present case we do not have access to the commit sha1 [...]

@regisb the API response explicitly contains the commit SHA1. At least the call to https://api.github.com/repos/openedx/frontend-app-profile/git/refs/heads/master returns the following:

{
  "ref": "refs/heads/master",
  "node_id": "MDM6UmVmMTY1MTA2NDMwOnJlZnMvaGVhZHMvbWFzdGVy",
  "url": "https://api.github.com/repos/openedx/frontend-app-profile/git/refs/heads/master",
  "object": {
    "sha": "c6c5521ecf720dafe292874327716bf3b70ac7c0",
    "type": "commit",
    "url": "https://api.github.com/repos/openedx/frontend-app-profile/git/commits/c6c5521ecf720dafe292874327716bf3b70ac7c0"
  }
}

So the following would do the trick to retrieve the SHA1 from the API:

$ curl -s https://api.github.com/repos/openedx/frontend-app-profile/git/refs/heads/master | jq -r '.object.sha'
c6c5521ecf720dafe292874327716bf3b70ac7c0

Did I misunderstand something? If you feel a call would be beneficial, it is good for me too.

regisb commented 1 year ago

Just got out of a chat with @gabor-boros. Thanks for taking the time Gábor!

@arbrandes The ADD statement is causing two important issues:

GitHub API rate limits during build.
Breaks the build with private GitHub repositories.

For these reasons, I would lean towards the solution proposed by Gábor: move the API call outside of the build, run it on the host (only for openedx/ repos and open.release/.master branches) and pass the git commit sha1 to the Docker build as an argument that would bust the cache.

Am I making sense? @gabor-boros would you like to open a PR or should we do it?

gabor-boros commented 1 year ago

@regisb It was a really useful discussion I believe!

Am I making sense?

Absolutely, just one question (I probably missed something): Why only for openedx/* repos and open.release/*.master? Although that's not an issue yet, that wouldn't fix the second point: private GitHub repositories.

So if the private repo is github.com/example/frontend-app-something you wouldn't do an (optionally) authenticated call? How would you handle those cases?

regisb commented 1 year ago

Why only for openedx/ repos and open.release/.master?

The goal would be to address the case of GitHub private repositories. Basically, we would use that caching feature only for open source repositories and branches.

So if the private repo is github.com/example/frontend-app-something you wouldn't do an (optionally) authenticated call?

That is correct.

How would you handle those cases?

I wouldn't. Private repositories would not benefit from the caching feature.

gabor-boros commented 1 year ago

I wouldn't. Private repositories would not benefit from the caching feature.

@regisb Fair enough, though it would mean that the API calls can still be authenticated, right?

regisb commented 1 year ago

I wouldn't even try to make authenticated calls. It's just not worth the trouble. Would you still be affected by the rate limit though?

gabor-boros commented 1 year ago

So you mean to build the image, it will be the responsibility of the caller to provide the commit sha1 as a build argument for the Dockerfile. Right? By caller, I mean the CI job or developer who is calling tutor MFE build using CLI.

If that's the case, then I guess that covers everyone's needs as authentication can be done when getting the commit sha1 from GitHub API.

gabor-boros commented 1 year ago

would you like to open a PR or should we do it?

I'm happy to do it.

@regisb I'm back to work. Would you have some time tomorrow to sync about the exact solution to be on the same page before getting started on the implementation?

regisb commented 1 year ago

We agreed to change the refs dynamically in plugin.py, such that GitHub API calls are only made when we build the "open-release/*.master" or "master" branches from the openedx upstream repos. Thus, something like:

@MFE_APPS.add(priority=tutor_hooks.priorities.LOW)
def _add_mfe_apps_refs(apps: dict[str, MFE_ATTRS_TYPE]) -> dict[str, MFE_ATTRS_TYPE]:
    for mfe, app in apps.items():
        if app["repository"].startswith("https://github.com/openedx/") and app["version"] == "master":
            app["refs"] = app["repository"].replace("github.com", "api.github.com/repos") + "/git/refs/heads/" + app["version"]
    return apps

regisb commented 11 months ago

Closed by https://github.com/overhangio/tutor-mfe/commit/c4ed70cf4e25a1c16b6945e2b73af47d82d31000

overhangio / tutor-mfe

GitHub rate limit hit when running MFE image building in CI/CD pipelines #161