jenkins-infra / helpdesk

Open your Infrastructure related issues here for the Jenkins project
https://github.com/jenkins-infra/helpdesk/issues/new/choose
17 stars 10 forks source link

Massive re-downloads of the same files #3930

Closed C-Otto closed 9 months ago

C-Otto commented 9 months ago

Service(s)

mirrors.jenkins.io

Summary

Several plugin files (HPI) are downloaded much more frequently than what seems to be intended.

Reproduction steps

I run a public mirror server (ftp.halifax.rwth-aachen.de) that also serves the Jenkins project files. This includes both regular Jenkins installations (*.war) and plugin files (*.hpi). I had a closer look at the log files and, after a quick exchange with James Nord @jtnord decided to open an issue with my findings.

Several plugin files are downloaded more often than usual. This is especially true for kubernetes-client-api, which dominates the downloads (for three versions: 5.11.2-182, 6.4.1-215, 6.10.0-240). For just the three corresponding HPI files my mirror served 160 GByte in the past 16 hours.

Other plugins are also in high demand:

For comparison, the file redhat-stable/jenkins-2.426.3-1.1.noarch.rpm tops the list of non-HPI files with around 35 GByte, the WAR for 2.426.3 clocks in at 30 GByte.

Limiting to the previous 16 hours and just one HPI file (kubernetes-client-api version 6.10.0-240, 33 MByte) I see a few hosts that consume a lot of bandwith:

The top host (24 GByte, i.e. 700 downloads in 16 hours) also downloads quite a lot of other Jenkins files, which to me looks like a list of rather useful/common plugins (judging by their name, I don't really use Jenkins myself): 69 GByte in total (in 16 hours, 24 GByte as explained above).

The numbers aren't much different on other days. I have logs starting from January 21st and I'd be happy to provide the logs or run some grep/awk/wc/... analyzes for you. For the kubernetes-client-api file I only see completed downloads (whole file size, no partial download). The downloads seem to happen continuously (the X axis is supposed to show unix timestamp values):

image

Some more data, showing the top three files (in bytes):

today (16 hours):
6.7571e+10   /jenkins/plugins/kubernetes-client-api/6.4.1-215.v2ed17097a_8e9/kubernetes-client-api.hpi
7.14586e+10   /jenkins/plugins/durable-task/547.vd1ea_007d100c/durable-task.hpi
8.47074e+10   /jenkins/plugins/kubernetes-client-api/6.10.0-240.v57880ce8b_0b_2/kubernetes-client-api.hpi

yesterday (24h):
1.01377e+11   /jenkins/plugins/kubernetes-client-api/6.4.1-215.v2ed17097a_8e9/kubernetes-client-api.hpi
1.04957e+11   /jenkins/plugins/durable-task/547.vd1ea_007d100c/durable-task.hpi
1.17029e+11   /jenkins/plugins/kubernetes-client-api/6.10.0-240.v57880ce8b_0b_2/kubernetes-client-api.hpi

January 30th (24h):
9.31869e+10   /jenkins/plugins/kubernetes-client-api/6.10.0-240.v57880ce8b_0b_2/kubernetes-client-api.hpi
9.89445e+10   /jenkins/plugins/kubernetes-client-api/6.4.1-215.v2ed17097a_8e9/kubernetes-client-api.hpi
1.01015e+11   /jenkins/plugins/durable-task/547.vd1ea_007d100c/durable-task.hpi

January 29th (24h):
9.51754e+10   /jenkins/plugins/kubernetes-client-api/6.4.1-215.v2ed17097a_8e9/kubernetes-client-api.hpi
1.10755e+11   /jenkins/plugins/kubernetes-client-api/6.10.0-240.v57880ce8b_0b_2/kubernetes-client-api.hpi
1.17757e+11   /jenkins/plugins/aws-java-sdk/1.12.633-430.vf9a_e567a_244f/aws-java-sdk.hpi
MarkEWaite commented 9 months ago

As an initial guess, I suspect that those IP addresses are Kubernetes clusters where a user has misconfigured the Jenkins controller to download plugins every time that Jenkins starts, instead of following the recommended practice of creating a container image with the correct plugins. As an additional guess, I suspect there has been some change in the Jenkins controller definition at that location that is causing it to be in a restart loop. It attempts to start, fails to start, and then tries again.

I suspect that the only alternative for the mirror provider is to block those IP addresses until the user investigates and resolves the failures themselves.

An IP address lookup of those IP addresses may also provide a link to the cloud provider that is hosting those computers. You could report those IP addresses to the abuse reporting system of the cloud provider.

Our experience with abuse reporting has been mixed. Our reports to Alibaba Cloud resulted in no action. We finally had to ban an Alibaba Cloud IP address from a different server because they would not take action on an obvious abuse case. Our report to the abuse reporting group at a large video game company resulted in prompt action.

C-Otto commented 9 months ago

Thanks, I'll add some pattern to my fail2ban setup. In my experience (in the past 10 years or so) reporting abuse isn't worth it, especially as these downloads aren't that harmful.

C-Otto commented 9 months ago

The fail2ban rules blocked some IPs and I see a reduction in traffic. Analyzing the logs gives rather sane results, although some more-or-less abusive hosts still manage to work around my rules. That's good enough.

dduportal commented 9 months ago

Thanks @C-Otto for reporting this and acting on it.

We are really grateful of your contribution and support of Jenkins!

lemeurherve commented 9 months ago

Hello @C-Otto,

I've sent to the "ftp" email attached to your mirror in our config a request about https://github.com/jenkins-infra/helpdesk/issues/3935, when you have a moment can you take a look at it please?

And thanks again for your contribution and support!

C-Otto commented 9 months ago

In addition to the discussion in #3935 I got contacted by another team. In their case, a Jenkins instance was stuck in a crash loop which caused frequent reinstalls ("while testing the Jenkins operator in Kubernetes" - I don't really know what that means, though).

MarkEWaite commented 9 months ago

a Jenkins instance was stuck in a crash loop which caused frequent reinstalls

I think that is a good reason why your fail2ban rule (or something like it) is the right choice. If the administrators of a system leave it in a crash loop that causes wasted bandwidth, it seems very reasonable that a mirror provider should ban them.

lemeurherve commented 9 months ago

In addition to the discussion in #3935 I got contacted by another team. In their case, a Jenkins instance was stuck in a crash loop which caused frequent reinstalls ("while testing the Jenkins operator in Kubernetes" - I don't really know what that means, though).

They're speaking about https://github.com/jenkinsci/kubernetes-operator

Note that we don't use this plugin, I don't have more info about its internal working, might worth a closer look.

cc @brokenpip3 in case you happen to know more about it?

brokenpip3 commented 9 months ago

@lemeurherve thanks for catching my attention here!

The problem while deploying jenkins in k8s (with and without the operator, for instance with the helm chart) a miss configured jenkins instance can led to several crashloopbackoff status that will retry forever to re-run the pod with some backing off pause (this is how kubernetes self-healing works since forever), so there isn't much we can do with it.

I can try to add to the kubernetes operator the capability to run container with already download plugins but it will require time and I'm the solo maintainer there, will try to do it as soon I will have free time to allocate.