Closed dduportal closed 1 year ago
First of all read #938 (reverted by #2047); I am not sure offhand which infra repo had the actual proxy configuration that you could use as a starting point. You would need to do a bit of digging. I recall it being nginx configured with a simple LRU cache of 2xx results, i.e., successful retrieval of release or *-SNAPSHOT
artifacts or metadata XML files from public URLs. I suppose the K8s equivalent would be a StatefulSet
with a cache volume.
what are the "ways" to use such a proxy caching in maven builds
At a first approximation, revert https://github.com/jenkins-infra/pipeline-library/pull/135 + https://github.com/jenkins-infra/pipeline-library/pull/216 + https://github.com/jenkins-infra/pipeline-library/pull/219 (but keeping some positive things from those PRs, such as removal of obsolete JDK 7 support).
Many thanks for the pointers @jglick !
We've started refreshing https://github.com/jenkins-infra/docker-repo-proxy (https://github.com/jenkins-infra/docker-repo-proxy/pull/5) which has the behavior you describe so it means we are in the correct directions! (I'm currently trying this with a local build of a plugin before trying to deploy to production).
Sounds like with the informations you gave, we have enough to have a first version soon.
Oh https://github.com/jenkins-infra/docker-repo-proxy, I see.
If you get the service running, I can help draft a pipeline-library
PR to use it. Just specify the URL. (Or would we have two URLs, one public via ingress and one cluster-internal for efficiency?) Not sure how we test such PRs prior to use; I guess you can override the version in a @Library
annotation in some draft plugin PR.
yeah you can access it via @Library('pipeline-library@refs/pull/number')
or just push an origin branch
I was wondering if we would have a mirror per cloud? and then determine which cloud we were running on? to minimise bandwidth use but I guess that can be added on top
Putting in pause (not enough bandwidth for the team for now) + Jforg works again as expected.
Slow again today AFAICT.
I don't know if it's related but for the record, there is a maintenance in progress: https://github.com/jenkins-infra/helpdesk/issues/2806#issuecomment-1060862749
Working on this, we realized we didn't need a custom nginx image as only its configuration was modified.
Consequently, I'm archiving jenkins-infra/docker-repo-proxy.
Note: we'll probably use https://plugins.jenkins.io/config-file-provider/ in order to have specific settings.xml for each provider/region.
I'll create an env var with the provider/region at the agent initialization so we can use it in the shared pipeline to choose the correct settings.xml (Ex: repo.azure.jenkins.io, repo.aws.jenkins.io, repo.do.jenkins.io), like what was done before https://github.com/jenkins-infra/pipeline-library/pull/216/files
Regarding https://github.com/jenkins-infra/digitalocean/pull/63, I've manually added a do.jenkins.io
NS record in jenkins.io DNS zone on Azure, pointing to DigitalOcean nameservers:
To be reimported as code with https://github.com/jenkins-infra/helpdesk/issues/2924 & https://github.com/jenkins-infra/helpdesk/issues/2981
We wanted initially to protect the access to these proxies by adding a basic authentication and an IPs whitelisting.
Unfortunately whitelisting all IPs used by the different agents will need some work, as currently (for example) every VM agent have their own IP.
We'll need to control network resources to use non default network setup in order to control public IPs.
For now I'll keep only the basic auth.
is it a problem if people can access it? could be useful for debugging for developers.
is it a problem if people can access it? could be useful for debugging for developers.
Yes it is: we are paying the outbound bandwidth, the storage for this new service and it's not cheap (currently witout the proxy, we have 2 to 3k€ per month on AWS and also on Azure of outbound bandwidth).
Also we must decrease the outbound bandwidth on repo.jenkins (Jfrog) of a factor of 5x to have Jfrog continuing to sponsor us: the main pain point being people using our infra as a public free mirror, which we are not expecting to do.
(PS : GitHub is drunk: I posted a comment and it edited your message 🤔 . I've edited it back)
I mean is it a problem if people can access these mirrors for debugging? it's not like we would be advertising them.
I mean is it a problem if people can access these mirrors for debugging? it's not like we would be advertising them.
Yep, it is still a problem as the URLs are stored in public code so any bot or abusive user could use it as a "free" mirror. Adding a user/password auth seems a nice proposal by @lemeurherve : it avoids the "allow/deny list of IP", and we can debug if we have access to the Kubernetes cluster (as the auth is only for the ingress: a port-forward to the service would bypass the auth).
Created a CNAME record in jenkins.io DNS zone via Azure portal from repo.aws.jenkins.io to a0b8dc2af4aa74c9f8c27f542db939f1-1791101266.us-east-2.elb.amazonaws.com (the load balancer url I've obtained from the installation of ingress-nginx on cik8s)
Status:
Todo:
Additionnally:
<mirrorOf>*
mirror every repositories
Test carefully, e.g. https://github.com/jenkinsci/stapler/pull/404#issuecomment-1238327013 / #3115
mirror every repositories
Test carefully, e.g. jenkinsci/stapler#404 (comment) / #3115
Thanks for the pointers, really useful for us to test!
Please note, in the current state and first version, that it would only be a "caching proxy": if you are able to make a given Maven project to work then it will be ok as it's not repo.jenkins directly, but a layer between that is able to reach the internet without going through repo.jenkins-ci and its mirroring.
Status:
Now that every provider has a proxy configured and running, and that the functionality has been integrated to the shared pipeline library as opt-in, I've opened PRs on the following plugins advised by @MarkEWaite to check it in situ:
These PR activate the use of an Artifact Caching Proxy caching the requests done to repo.jenkins-ci.org sponsored by JFrog, in order to reduce our bandwidth consumption and be more resilient.
Apart from an additional build log entry with the proxy provider configured for Maven depending on the agent location, there shouldn't be any change for any maintainer of these plugins.
There will be another PR to remove these changes as soon as the functionality would have been approved and switched to opt-out.
Moving this issue in "infra-team-sync-next" because work is done on https://github.com/jenkins-infra/helpdesk/issues/2844 to solve https://github.com/jenkins-infra/helpdesk/issues/3221.
Next steps (in order):
Update with the team-work today by @lemeurherve @smerle33 and I on the ACP tasks:
Azure ACP Debugging topic. TL;DR.; now it works © (but we don't know why it was so slow)
Next steps:
ACP setup to nominal configuration:
Revert hotfix puppet defaulting to aws
- https://github.com/jenkins-infra/jenkins-infra/pull/2621
Revert pipeline-library defaulting to aws
- https://github.com/jenkins-infra/pipeline-library/pull/573
PR to add the global env var $ARTIFACT_CACHING_AVAILABLE_PROVIDERS
on ci.jenkins.io
PR to define/update the 3 settings.xml files on ci.jenkin.io to mirror everything - https://github.com/jenkins-infra/jenkins-infra/pull/2622
PR to set ACP to 2 replicas (for HA when operating clusters) everywhere - https://github.com/jenkins-infra/kubernetes-management/pull/3536#pullrequestreview-1275657364
Preparing the "opt-in using ACP by default for all plugins":
PR on pipeline-library to check for "skip-artifact-caching-proxy" label - https://github.com/jenkins-infra/pipeline-library/pull/552
write a runbook to operate ACP on ci.jenkins.io (how to switch on/off, how to enable/disable providers)
Improvement for sustainability:
PR puppet + pipeline-library to add a new global env var defining the "default fallback" ACP (instead having raw value in pipeline library that led me to hotfixes)
PRs on the ACP helm-chart:
Reopening to include more builds like jenkins
, bom
, etc. (List to be completed)
I also noticed in e.g. https://ci.jenkins.io/job/Core/job/jenkins/job/master/4585/flowGraphTable/ that Windows tests take more than twice as long as Linux tests, accounting for the majority of clock time. Using a repository cache should reduce the overhead time for a branch (time spent downloading deps & building rather than running tests), which would make it more practical to aggressively apply https://plugins.jenkins.io/parallel-test-executor/ (currently used only in acceptance-test-harness
and kubernetes-plugin
AFAICT). CC @jtnord @Vlatombe
mirror every repositories
Test carefully, e.g. jenkinsci/stapler#404 (comment) / #3115
We forgot about this comment, resulting in #3382, fixed by https://github.com/jenkins-infra/jenkins-infra/pull/2630 & https://github.com/jenkinsci/stapler/pull/441
Is there a way to identify similar cases of artifacts not published in Maven Central?
All the successful plugin bill of materials jobs run over the weekend were run with the artifact caching proxy disabled. When the artifact caching proxy is enabled for plugin bill of materials jobs, there is a high overall failure rate of the job. The failure often does not become visible until 90 minutes or more into the job.
Some examples are visible at:
In particular, search for repo.do.jenkins.io
from the bottom of each log upwards. You'll see a bunch of I/O errors, socket read timeouts, "Premature end of Content-Length delimited message body" errors, etc.
MNG-714 would be helpful. I was hoping to use this trick but it did not seem to work. Created
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
<mirrors>
<mirror>
<id>proxy</id>
<url>https://repo.do.jenkins.io/public/</url>
<mirrorOf>*,!repo.jenkins-ci.org</mirrorOf>
</mirror>
</mirrors>
<profiles>
<profile>
<id>fallback</id>
<activation>
<activeByDefault>true</activeByDefault>
</activation>
<repositories>
<repository>
<id>repo.jenkins-ci.org</id>
<url>https://repo.jenkins-ci.org/public/</url>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id>repo.jenkins-ci.org</id>
<url>https://repo.jenkins-ci.org/public/</url>
</pluginRepository>
</pluginRepositories>
</profile>
</profiles>
</settings>
where the mirror is expected to fail (since I am providing no authentication) and ran with
docker run --rm -ti --entrypoint bash -v /tmp/settings.xml:/usr/share/maven/conf/settings.xml maven:3-eclipse-temurin-17 -c 'git clone --depth 1 https://github.com/jenkinsci/build-token-root-plugin /src && cd /src && mvn -Pquick-build install'
but it fails immediately and does not fall back. additional-identities-plugin
which does not use an extension from Central builds OK but does not use the proxy.
After clearing the cache of the DigitalOcean provider, a BOM build exclusively on DigitalOcean finished with success: https://ci.jenkins.io/job/Tools/job/bom/job/master/1564/
The fact the BOM builds failed only on DO with "Premature end of Content-Length delimited message body" each time, and passed after clearing the cache on this provider make me think the error came from corrupted cache data.
I'll check to either find a way to clear the cache for a specific artifact, or either reduce the cache retention currently set to one month.
@MarkEWaite @basil could you try your next BOM builds without the skip-artifact-caching-proxy
label please?
FYI https://issues.apache.org/jira/browse/MNG-7708 (probably not relevant if the cache errors were persistent).
Closing as the "unreliable" behavior (which is BOM-only) is tracked in https://github.com/jenkins-infra/helpdesk/issues/3481
Service
ci.jenkins.io
Summary
As part of #2733 , the subject of hosting a caching proxy for ci.jenkins.io builds (at least: maybe for trusted.ci, release.ci and infra.ci also) as been re-triggered in https://groups.google.com/g/jenkins-infra/c/laSsgPOH9qs.
This issue tracks the work related to deploying this service.
Why
What
We want each build, run by ci.jenkins.io (and eventually trusted.ci and release.ci), which involves maven (and eventually gradle), to use our caching proxy service instead of directly hitting repo.jenkins-ci.org.
As per https://maven.apache.org/settings.html#mirrors, we should be able to use the User-level
settings.xml
for Maven.There are different methods to provide this
settings.xml
to the build:Adding it in the agent images in jenkins-infra/packer-images (assuming we have finished the "docker and VMs" tasks, ref. https://github.com/jenkins-infra/packer-images/issues/282 for linux and https://github.com/jenkins-infra/packer-images/issues/285)
Use the Jenkins plugin "config-file-provider" , which support pipeline: https://plugins.jenkins.io/config-file-provider/#plugin-content-using-the-configuration-files-in-jenkins-pipelines , so we could set it up in the jenkins-infra/pipeline-library (easier to opt-out and faster to disable in case of outage)
The main challenge is to provide multiple caching proxies, on each cloud region that we use. Rationale is that if we only have a single proxy, then we'll have to pay for the cross-cloud and/or cross-region bandwitdh , which we do not want. We could either:
Definition of Done
How
See associated PRs when they'll come.