jenkinsci / azure-ad-plugin

Authentication and Authorization with Azure AD
https://plugins.jenkins.io/azure-ad/
MIT License
27 stars 56 forks source link

Azure-AD-plugin never updates its proxy IP address following DNS TTL and record changes. #558

Open lukolszewski opened 2 months ago

lukolszewski commented 2 months ago

Jenkins and plugins versions report

Environment ```text Jenkins: 2.426.1 OS: Linux - 3.10.0-1160.105.1.el7.x86_64 Java: 11.0.21 - Red Hat, Inc. (OpenJDK 64-Bit Server VM) --- Parameterized-Remote-Trigger:3.2.0 ace-editor:1.1 allure-jenkins-plugin:2.31.1 ansicolor:1.0.4 ant:497.v94e7d9fffa_b_9 antisamy-markup-formatter:162.v0e6ec0fcfcf6 apache-httpcomponents-client-4-api:4.5.14-208.v438351942757 atlassian-bitbucket-server-integration:4.0.0 aws-credentials:218.v1b_e9466ec5da_ aws-java-sdk:1.12.606-418.vce5b_4cd017c6 aws-java-sdk-cloudformation:1.12.606-418.vce5b_4cd017c6 aws-java-sdk-codebuild:1.12.606-418.vce5b_4cd017c6 aws-java-sdk-ec2:1.12.606-418.vce5b_4cd017c6 aws-java-sdk-ecr:1.12.606-418.vce5b_4cd017c6 aws-java-sdk-ecs:1.12.606-418.vce5b_4cd017c6 aws-java-sdk-efs:1.12.606-418.vce5b_4cd017c6 aws-java-sdk-elasticbeanstalk:1.12.606-418.vce5b_4cd017c6 aws-java-sdk-iam:1.12.606-418.vce5b_4cd017c6 aws-java-sdk-kinesis:1.12.606-418.vce5b_4cd017c6 aws-java-sdk-logs:1.12.606-418.vce5b_4cd017c6 aws-java-sdk-minimal:1.12.606-418.vce5b_4cd017c6 aws-java-sdk-secretsmanager:1.12.606-418.vce5b_4cd017c6 aws-java-sdk-sns:1.12.606-418.vce5b_4cd017c6 aws-java-sdk-sqs:1.12.606-418.vce5b_4cd017c6 aws-java-sdk-ssm:1.12.606-418.vce5b_4cd017c6 azure-ad:442.v355cca_6b_c169 azure-sdk:157.v855da_0b_eb_dc2 bitbucket:241.v6d24a_57f9359 bootstrap4-api:4.6.0-6 bootstrap5-api:5.3.2-3 bouncycastle-api:2.30.1.77-225.v26ea_c9455fd9 branch-api:2.1135.v8de8e7899051 build-timeout:1.31 caffeine-api:3.1.8-133.v17b_1ff2e0599 checks-api:2.0.2 cloudbees-folder:6.858.v898218f3609d command-launcher:107.v773860566e2e commons-lang3-api:3.13.0-62.v7d18e55f51e2 commons-text-api:1.11.0-95.v22a_d30ee5d36 config-file-provider:959.vcff671a_4518b_ copyartifact:722.v0662a_9b_e22a_c credentials:1311.vcf0a_900b_37c2 credentials-binding:642.v737c34dea_6c2 data-tables-api:1.13.8-2 display-url-api:2.200.vb_9327d658781 durable-task:523.va_a_22cf15d5e0 ec2-fleet:3.2.0 echarts-api:5.4.3-2 email-ext:2.102 extended-choice-parameter:376.v2e02857547b_a_ font-awesome-api:6.5.1-1 gcloud-sdk:0.0.3 generic-webhook-trigger:1.88.2 git:5.2.1 git-client:4.6.0 git-parameter:0.9.19 git-server:99.va_0826a_b_cdfa_d github:1.37.3.1 github-api:1.318-461.v7a_c09c9fa_d63 github-branch-source:1752.vc201a_0235d80 google-metadata-plugin:0.5 google-oauth-plugin:1.322.v7b_940a_a_1a_5c8 google-storage-plugin:1.347.ve4723b_556ea_0 gradle:2.9 h2-api:11.1.4.199-12.v9f4244395f7a_ handlebars:3.0.8 hidden-parameter:202.vb_964799875d7 htmlpublisher:1.32 http_request:1.18 instance-identity:185.v303dc7c645f9 ionicons-api:56.v1b_1c8c49374e jackson2-api:2.15.3-372.v309620682326 jacoco:3.3.5 jakarta-activation-api:2.0.1-3 jakarta-mail-api:2.0.1-3 javadoc:243.vb_b_503b_b_45537 javax-activation-api:1.2.0-6 javax-mail-api:1.6.2-9 jaxb:2.3.9-1 jdk-tool:73.vddf737284550 jjwt-api:0.11.5-77.v646c772fddb_0 job-import-plugin:3.6 jobConfigHistory:1229.v3039470161a_d jquery:1.12.4-1 jquery-detached:1.2.1 jquery3-api:3.7.1-1 jsch:0.2.8-65.v052c39de79b_2 junit:1240.vf9529b_881428 junit-attachments:205.vc0677977deb_0 ldap:711.vb_d1a_491714dc leastload:3.0.0 lockable-resources:1218.va_3dd45e2b_fa_7 m2release:0.16.4 mailer:463.vedf8358e006b_ matrix-auth:3.2.1 matrix-project:818.v7eb_e657db_924 maven-plugin:3.23 mercurial:1260.vdfb_723cdcc81 mina-sshd-api-common:2.11.0-86.v836f585d47fa_ mina-sshd-api-core:2.11.0-86.v836f585d47fa_ momentjs:1.1.1 naginator:1.436.vb_e769dcb_cdf6 nodejs:1.6.1 oauth-credentials:0.646.v02b_66dc03d2e okhttp-api:4.11.0-157.v6852a_a_fa_ec11 pam-auth:1.10 parameterized-scheduler:255.v73827fcdf618 pipeline-aws:1.43 pipeline-build-step:539.v8c889169451f pipeline-github-lib:42.v0739460cda_c4 pipeline-graph-analysis:202.va_d268e64deb_3 pipeline-groovy-lib:689.veec561a_dee13 pipeline-input-step:477.v339683a_8d55e pipeline-maven:1362.vee39a_d4b_02b_1 pipeline-maven-api:1362.vee39a_d4b_02b_1 pipeline-milestone-step:111.v449306f708b_7 pipeline-model-api:2.2151.ve32c9d209a_3f pipeline-model-definition:2.2151.ve32c9d209a_3f pipeline-model-extensions:2.2151.ve32c9d209a_3f pipeline-rest-api:2.34 pipeline-stage-step:305.ve96d0205c1c6 pipeline-stage-tags-metadata:2.2151.ve32c9d209a_3f pipeline-stage-view:2.34 pipeline-utility-steps:2.16.0 plain-credentials:143.v1b_df8b_d3b_e48 plugin-util-api:3.6.0 popper-api:1.16.1-3 popper2-api:2.11.6-4 rebuild:330.v645b_7df10e2a_ remote-file:1.24 resource-disposer:0.23 s3:466.vf5b_3db_8e3eb_2 scm-api:683.vb_16722fb_b_80b_ script-security:1294.v99333c047434 snakeyaml-api:2.2-111.vc6598e30cc65 sonar:2.16.1 ssh-agent:346.vda_a_c4f2c8e50 ssh-credentials:308.ve4497b_ccd8f4 ssh-slaves:2.916.vd17b_43357ce4 sshd:3.312.v1c601b_c83b_0e stashNotifier:1.464.va_9203f84a_417 structs:325.vcb_307d2a_2782 test-results-analyzer:0.4.1 testng-plugin:835.v51ed3da_fcc35 throttle-concurrents:2.14 timestamper:1.26 token-macro:384.vf35b_f26814ec trilead-api:2.84.v72119de229b_7 uno-choice:2.8.1 variant:60.v7290fc0eb_b_cd veracode-scan:23.7.22.0 workflow-aggregator:596.v8c21c963d92d workflow-api:1283.v99c10937efcb_ workflow-basic-steps:1042.ve7b_140c4a_e0c workflow-cps:3826.v3b_5707fe44da_ workflow-cps-global-lib:609.vd95673f149b_b workflow-durable-task-step:1289.v4d3e7b_01546b_ workflow-job:1385.vb_58b_86ea_fff1 workflow-multibranch:756.v891d88f2cd46 workflow-scm-step:415.v434365564324 workflow-step-api:639.v6eca_cd8c04a_a_ workflow-support:865.v43e78cc44e0d ws-cleanup:0.45 ```

What Operating System are you using (both controller, and any agents involved in the problem)?

Linux RedHat 7.8

Reproduction steps

Configure a non-authenticated http proxy server in Jenkins by providing its DNS name. Configure the Azure-AD plugin for Azure Auth. Leave Jenkins running. Then change the proxy IP address and update the DNS A record (or /etc/hosts entry for simplicity of testing).

Result: Azure AD no longer works, because it tries to connect to the old proxy IP. After Jenkins is restarted Azure AD fetches the correct IP after performing a new DNS resolution.

Expected Results

Azure AD observes the DNS record TTL and refreshes the IP resolution.

Alternatively it refreshes the DNS resolution at a configurable interval.

Actual Results

Errorst such as: Error sending HTTP request: connection timed out: proxy-name/X.X.X.X:3128 where X.X.X.X is an old IP.

Anything else?

When the proxy DNS A record resolved to two IP addresses (for redundancy) and only one of these IPs changes (the DNS record is updated of course). On some Jenkins servers Azure AD fails to connect trying to talk to the old IP. On other servers it appears to work fine at least for a period of time if it happens to be resolving the DNS record to the IP that didn't change.

I've tracked the problem to how the plugin uses azure-sdk-for-java. I've also opened an issue there in case the maintainers want to resolve it in the library: https://github.com/Azure/azure-sdk-for-java/issues/38963 In that issue I demonstrate the issue with a simple example app. I think it would be good if the library resolved it, but I decided to open the issue here as it is mostly affecting this plugin and causes big disruption every time Jenkins masters need to be restarted.

Are you interested in contributing a fix?

I can help in testing.

timja commented 2 months ago

I guess we could expire the graph client every now and then in the cache (ensuring its shutdown cleanly as well) here: https://github.com/jenkinsci/azure-ad-plugin/blob/cc82c54af8c28ec67697925a5bcbe97bb989d616/src/main/java/com/microsoft/jenkins/azuread/GraphClientCache.java#L29-L31

Not too keen on it though. I would expect either netty or azure-sdk-for-java to handle this really, a library user shouldn't have to do updates when DNS changes.

How often do you change your proxy DNS? From my experience proxies are generally fairly static which I guess is why this hasn't came up before now.

lukolszewski commented 2 months ago

How often do you change your proxy DNS? From my experience proxies are generally fairly static which I guess is why this hasn't came up before now.

Sadly, we do quite frequently (once every 2~3 weeks or so) which means our Jenkins server with 120+ nodes that does jobs 24/7 for tens of users in 3 time zones, has suddenly starter requiring restarts at least once per month after we migrated from LDAP to Azure AD authentication (we have no way back).

The way these DNS updates are done is inherent to AWS and depending on setup a lot of people will have the same situation. For example our squid proxy is an AWS ASG behind Application Load Balancers. IPs are assigned to load balancer interfaces by AWS and these are then resolved by the DNS name. Any time a network interface for an ELB fails. AWS replaces it giving it a new IP (these are all internal IPs so no elastic IP here) and updates the DNS. Therefore causing this issue...

Regarding if this should be in the library. That was my first thought, that's why I logged it with them, but I'm not so sure anymore. It seems an elaborate mechanism based on TTL should be present in the library, but a quick fix is much easier to implement in the client. So If it could be done that the client is recreated on a defined (or even hardcoded) schedule. That would be great.

timja commented 2 months ago

Reading the code it looks like the SDK is where this best fits they have a lot of code around proxying, this plugin just delegates to the library.

If someone wants to contribute a work around here, that's fine although I probably wouldn't enable it by default.