Replace the default node launcher - Githubissues

jfrog / jenkins-artifactory-plugin

Jenkins artifactory plugin

http://jenkins-ci.org/

115 stars 186 forks source link

Replace the default node launcher #853

Closed swamygoud18 closed 10 months ago

swamygoud18 commented 11 months ago

Replacing the default node launcher with new remote launcher to avoid gradle-based-publishing-hangs when the launcher is type org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator

[x] This pull request is created in the jfrog/jenkins-artifactory-plugin repository.
[x] All tests passed. If this feature is not already covered by the tests, I added new tests.
[x] All static analysis checks passed.

github-actions[bot] commented 11 months ago

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

swamygoud18 commented 11 months ago

I have read the CLA Document and I hereby sign the CLA

RobiNino commented 10 months ago

Hi @swamygoud18 , Thank you for your contribution. We'll review this PR in the near future. I apologize for the delay in our response.

For reference - related to JENKINS-71708

swamygoud18 commented 10 months ago

Hello @RobiNino , do we know how long it may take.?. I am really sorry to bug you - but I am getting worst with the impacted users and becoming more problematic with more java version builds different than the Jenkins JNLP java versions.

swamygoud18 commented 10 months ago

Thanks for the progress on this @yahavi , I have updated as suggested.

swamygoud18 commented 10 months ago

@yahavi - I see few tests are failed.? were those not ran part of feature branch or before merging on to master branch .? would you able to look at them please.?

yahavi commented 10 months ago

@swamygoud18, we're encountering concurrency problems in the tests. After examining the failed tests, I can confirm that there haven't been any new failures. You can find all the tests that ran in the pull requests here: https://github.com/jfrog/jenkins-artifactory-plugin/actions/runs/6135676309.

swamygoud18 commented 10 months ago

@yahavi - I did not get you fully as I am surprised if the tests failed on PR , how the merge was allowed.?

do I need to do anything from me.? Thanks for helping me on this PR process.

yahavi commented 10 months ago

@swamygoud18 - at the moment, we aren't enforcing branch protection rules actively. Some preparatory work is necessary to ensure the tests are perfect. The Jenkins Artifactory plugin version 3.18.10 has been published, and it incorporates your fix. Feel free to inform us if it resolved the issue or if you require any further assistance.

swamygoud18 commented 10 months ago

Thanks for all your support @yahavi .. we will test and update you back.

davies147 commented 10 months ago

The Jenkins Artifactory plugin version 3.18.10 has been installed on our server, and rtMavenRun is now broken. It no longer uses 'docker exec' to run the maven commend inside the requested docker agent, so fails since none of the toolchain is present in the parent agent. Downgrading to 3.18.9 immediately fixed this.

Please revert the change until this has been catered for?

swamygoud18 commented 10 months ago

@yahavi - I do see issue with rtMavenRun. is it possible to revert the changes to not broke more.?

yahavi commented 10 months ago

Thanks for bringing up this problem, @davies147! We're rolling out a patch to undo it. @swamygoud18, would you be open to creating a new Pull Request?

swamygoud18 commented 10 months ago

@yahavi - some how I see the maven builder got hard coded launcher - CLASSWORLDS_LAUNCHER = "org.codehaus.plexus.classworlds.launcher.Launcher" which is breaking the changes - I don't know if your R&D team got to verify anything from my changes on the gradle hanging issue and which was resolved after having the new launcher. Is it possible to review with your R&D team for further inputs.? I also will try to figure to lower scope the new changes with only gradle impacted builds.?

ctuncan commented 9 months ago

We also see this with rtGradleRun. Which doesn't not interact with CLASSWORLDS_LAUNCHER.

I believe the issue is as follows:

ActionableHelper.getNode(nodeLauncher) tries to find the node we're launching on.
It does this by looping over j.getComputers() and finding one that has the same channel (c.getChannel() == launcher.getChannel())
This computer is the parent agent
Then we create a new launcher on that node.
Therefore all commands are run on the parent agent.

I am not sure if the "inside docker container" context would ever be returned by j.getComputers(). Even if it did, the channel would compare equal between the "inside docker container" and "parent agent" instances, because of how the docker plugin is implemented. This decoration does not override getChannel, so it will default to the inner launcher.

The only thing the docker WithContainerStep does is mutate the argv when it's asked to launch something.

Creating a new launcher from the node is stripping the decoration. Which in turn strips the docker exec ... and runs the command directly.

In general I am uncomfortable switching the launcher like this, as you cannot account for all possible launcher decorations. I would prefer to unwrap specific launchers that have known issues. (e.g. perhaps org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator. However from the name this is also exec-ing in a container, there is no guarantee that the parent agent has the environment needed to even run the gradle publishing).

ctuncan commented 9 months ago

Our specific issue is that all our rtGradleRun commands started failing with complains about: missing android SDK, incorrect Java versions, etc. Because the Jenkins agent uses a different JDK than our build, and the SDK doesn't exist on the parent agent.

Switching to just sh "/gradlew task" solved all of these issues, but also defeats the point of this plugin.