Closed gabor-boros closed 11 months ago
@regisb @arbrandes
In my opinion a possible solution could be (what I already mentioned in the issue description) to use RUN curl ...
instead of ADD
in the Dockerfile. We could have extra arguments for the Dockerfile to define the auth credentials for the call.
By default the auth credential argument could be empty, hence being backward compatible.
Presuming curl
or wget
is already in the image, it could work. Note the implied doubt: we're trying to second-guess the caching mechanism by forcing the file to be re-fetched via ADD
at every single run. I don't know if RUN
will have the same effect - it pretty much doesn't for the git clone line immediately following.
To test the change, you have to re-run the build after pushing a single commit to the target branch. If the commit makes it into the build, it worked. If it didn't... it didn't.
@arbrandes It probably won't as the RUN
command will be cached. The only way to work that around (and not disabling the cache) is to pass the commit reference directly to the Dockerfile instead of the branch name (i.e. do the api call right before building the image). That way we would still have the possibility to authenticate the API calls and to have proper layer caching.
FTR: I don't know how many Tutor plugins (or even the main Tutor repo) are using GitHub APIs like this, but the issue should be fixed at all occurrences.
@regisb & @arbrandes Would you mind I'm opening a PR to work this around using the method I mentioned above?
@gabor-boros the whole point of this ADD statement is that it is not cached. Cache for subsequent steps will be cleared if there was a change upstream. This feature has proved very useful in re-building the mfe image when we target a branch, and not a tag. Such a use case happens for instance when running on nightly, or building the candidate release branch (open-release/quince.master).
Please see this PR for more information: https://github.com/overhangio/tutor-mfe/pull/131
I understand that hitting the GitHub rate limit is an issue, and we need to address that. Can we do that in a way that preserves this feature?
@regisb how about what I wrote above?
pass the commit reference directly to the Dockerfile instead of the branch name (i.e. do the api call right before building the image). That way we would still have the possibility to authenticate the API calls and to have proper layer caching.
I usually lean very strongly towards pinned dependencies, and this includes explicit commit sha1 or tags. But in the present case we do not have access to the commit sha1, precisely because we are building the image for a branch. This is unusual, but a useful feature nonetheless. Does that make sense? If not, maybe we can talk about it in a call.
[..] in the present case we do not have access to the commit sha1 [...]
@regisb the API response explicitly contains the commit SHA1. At least the call to https://api.github.com/repos/openedx/frontend-app-profile/git/refs/heads/master
returns the following:
{
"ref": "refs/heads/master",
"node_id": "MDM6UmVmMTY1MTA2NDMwOnJlZnMvaGVhZHMvbWFzdGVy",
"url": "https://api.github.com/repos/openedx/frontend-app-profile/git/refs/heads/master",
"object": {
"sha": "c6c5521ecf720dafe292874327716bf3b70ac7c0",
"type": "commit",
"url": "https://api.github.com/repos/openedx/frontend-app-profile/git/commits/c6c5521ecf720dafe292874327716bf3b70ac7c0"
}
}
So the following would do the trick to retrieve the SHA1 from the API:
$ curl -s https://api.github.com/repos/openedx/frontend-app-profile/git/refs/heads/master | jq -r '.object.sha'
c6c5521ecf720dafe292874327716bf3b70ac7c0
Did I misunderstand something? If you feel a call would be beneficial, it is good for me too.
Just got out of a chat with @gabor-boros. Thanks for taking the time Gábor!
@arbrandes The ADD statement is causing two important issues:
For these reasons, I would lean towards the solution proposed by Gábor: move the API call outside of the build, run it on the host (only for openedx/ repos and open.release/.master branches) and pass the git commit sha1 to the Docker build as an argument that would bust the cache.
Am I making sense? @gabor-boros would you like to open a PR or should we do it?
@regisb It was a really useful discussion I believe!
Am I making sense?
Absolutely, just one question (I probably missed something): Why only for openedx/* repos and open.release/*.master
? Although that's not an issue yet, that wouldn't fix the second point: private GitHub repositories.
So if the private repo is github.com/example/frontend-app-something
you wouldn't do an (optionally) authenticated call? How would you handle those cases?
Why only for openedx/ repos and open.release/.master?
The goal would be to address the case of GitHub private repositories. Basically, we would use that caching feature only for open source repositories and branches.
So if the private repo is github.com/example/frontend-app-something you wouldn't do an (optionally) authenticated call?
That is correct.
How would you handle those cases?
I wouldn't. Private repositories would not benefit from the caching feature.
I wouldn't. Private repositories would not benefit from the caching feature.
@regisb Fair enough, though it would mean that the API calls can still be authenticated, right?
I wouldn't even try to make authenticated calls. It's just not worth the trouble. Would you still be affected by the rate limit though?
So you mean to build the image, it will be the responsibility of the caller to provide the commit sha1 as a build argument for the Dockerfile. Right? By caller, I mean the CI job or developer who is calling tutor MFE build using CLI.
If that's the case, then I guess that covers everyone's needs as authentication can be done when getting the commit sha1 from GitHub API.
would you like to open a PR or should we do it?
I'm happy to do it.
@regisb I'm back to work. Would you have some time tomorrow to sync about the exact solution to be on the same page before getting started on the implementation?
We agreed to change the refs
dynamically in plugin.py, such that GitHub API calls are only made when we build the "open-release/*.master" or "master" branches from the openedx upstream repos. Thus, something like:
@MFE_APPS.add(priority=tutor_hooks.priorities.LOW)
def _add_mfe_apps_refs(apps: dict[str, MFE_ATTRS_TYPE]) -> dict[str, MFE_ATTRS_TYPE]:
for mfe, app in apps.items():
if app["repository"].startswith("https://github.com/openedx/") and app["version"] == "master":
app["refs"] = app["repository"].replace("github.com", "api.github.com/repos") + "/git/refs/heads/" + app["version"]
return apps
It seems that the tutor-mfe plugin uses the GitHub API 1 that is limited to 60 requests per hour for unauthenticated requests (and 5k for authenticated ones).
If your URL files are protected using authentication, you need to use RUN wget, RUN curl or use another tool from within the container as the ADD instruction does not support authentication. – reference
Also, this would require adding optional build arguments to the Dockerfile to set the credentials.
-- related forum discussion: https://discuss.openedx.org/t/tutor-mfe-build-github-api-authentication/11542