timja / jenkins-gh-issues-poc-06-18

0 stars 0 forks source link

[JENKINS-46709] Jenkins pipeline jobs get locked on master executor and leads to master restart #9356

Closed timja closed 6 years ago

timja commented 7 years ago

Dear comunity,

For a couple of month now we have recurrent issue on our master where Pipeline Jobs get stuck on the master executor and can't run anymore. We are then forced to restart the master server.

Each time we do not find any specific log in the Jenkins logs, please find bellow the latest thread dump taken on master executor.

Our server is running above 8000 different Jobs, most of them are freestyle projects.
Thus we can't afford to have a server becoming unstable because of pipeline behaviour.

Generally speaking I don't understand how Jenkins engine allows pipeline jobs to block the main server in such way, this is for me a blocker design?

 

Thank you for your feedback if you have similar issues.

./Frederic

===========

 

threaddump.txt


Originally reported by fredericmeyrou, imported from: Jenkins pipeline jobs get locked on master executor and leads to master restart
  • status: Closed
  • priority: Critical
  • resolution: Duplicate
  • resolved: 2017-11-03T16:45:11+00:00
  • imported: 2022/01/10
timja commented 7 years ago

abayer:

Any part of your Pipeline job that isn't within a node block runs on a flyweight executor on the master - is that what you're talking about? Or are you seeing node blocks ending up on master when you've specified a different label?

timja commented 7 years ago

fredericmeyrou:

Dear Andrew,

 

I don't know the code of all Pipelines Jobs of course, but in exemple attached for exemple the code starts with node allocation, and according to Job logs, it was stuck in SCM step... I found most of the pipeline Jobs stuck in the same step.

I have no logs on master for SCM problems, and the SVN SCM server is running without any problems... nonetheless I don't understand why the master executors might get lock in such case???

Log of pipeline job :

// Started by user ...(axpru)
Opening connection to http://jirasvnprod.agfahealthcare.com/svn/imedical/fluidmanagement/
Checking out svn http://jirasvnprod.agfahealthcare.com/svn/imedical/fluidmanagement/branches/fluidmanagement-84.2900 into K:\JENKINS\jobs\HCC\jobs\HCC_DEV_fluidmanagement_RELEASE\branches\branches-flu.k0nk34.ment-84-2900\workspace@script to read Jenkinsfile
Updating http://jirasvnprod.agfahealthcare.com/svn/imedical/fluidmanagement/branches/fluidmanagement-84.2900@30350 at revision 30350
Using sole credentials bob/****** (Please use for SVN / Git / Windows service and Linux) in realm �//jirasvnprod.agfahealthcare.com:80> CollabNet Subversion Repository�
U fluidmanagement\fm-services\fm-services.iml
U fluidmanagement\fm-services\src\main\resources\fluidmanagement_Resources_de.json
U fluidmanagement\fm-services\pom.xml
U fluidmanagement\packaging\pom.xml
U fluidmanagement\packaging\fluidmanagement-upgrade-package\pom.xml
U fluidmanagement\packaging\fm-war\pom.xml
U fluidmanagement\fm-importset\pom.xml
U fluidmanagement\orbis-component.properties
U fluidmanagement\pom.xml
U fluidmanagement\apis\pom.xml
U fluidmanagement\apis\fm-collector-service-api\pom.xml
U fluidmanagement\apis\fm-api-docs\pom.xml
U fluidmanagement\apis\fm-pump-info-service-api\pom.xml
U fluidmanagement\apis\fm-association-service-api\pom.xml
U fluidmanagement\apis\fm-event-consumer-api\pom.xml
U fluidmanagement\apis\fm-barcode-image-service-api\pom.xml
U fluidmanagement\apis\fm-event-reporter-api\pom.xml
U fluidmanagement\fm-dbrep\pom.xml
U fluidmanagement\fm-ui\src\main\angular\src\app\reconnection\reconnection-overview.component.ts
U fluidmanagement\fm-ui\src\main\angular\package-lock.json
U fluidmanagement\fm-ui\src\main\angular\package.json
U fluidmanagement\fm-ui\pom.xml
U fluidmanagement\fm-ui-integration\fm-ui-integration.iml
U fluidmanagement\fm-ui-integration\src\main\java\com\agfa\orbis\medical\fluidmanagement\forms\PumpSetupGUIController.java
U fluidmanagement\fm-ui-integration\src\main\java\com\agfa\orbis\medical\fluidmanagement\forms\PumpReassociationGUIController.java
U fluidmanagement\fm-ui-integration\src\main\java\com\agfa\orbis\medical\fluidmanagement\forms\EmbeddedBrowserController.java
U fluidmanagement\fm-ui-integration\src\main\forms\com\agfa\orbis\medical\fluidmanagement\forms\PumpReassociationGUI.oat
U fluidmanagement\fm-ui-integration\pom.xml
U fluidmanagement\.idea\artifacts\fm_services_war_exploded.xml
At revision 30350

Using sole credentials .... (Please use for SVN / Git / Windows service and Linux) in realm �//jirasvnprod.agfahealthcare.com:80> CollabNet Subversion Repository�
Resuming build at Fri Sep 01 10:14:38 CEST 2017 after Jenkins restart
Hard kill!
[BFA] Scanning build for known causes...
[BFA] No failure causes found
[BFA] Done. 0s
Finished: ABORTED

 

jenkinsfile : http://jenkins-hcis-main.agfahealthcare.com/job/HCC/job/HCC_DEV_fluidmanagement_RELEASE/job/branches%252Ffluidmanagement-84.2900/50/execution/node/2/ws/Jenkinsfile/*view*/

 

// #!groovy
import hudson.scm.subversion.CheckoutUpdater

node('SHARED&&WIEN&&WINDOWS64') {
    timestamps {
def mvnHome = tool 'Maven 3.3.x'
def jdkHome = tool 'JDK 1.8 Orbis 64 bits'
def pomFile = 'fluidmanagement\\pom.xml'
def mailingList = emailextrecipients([[$class: 'CulpritsRecipientProvider'],
      [$class: 'DevelopersRecipientProvider'],
      [$class: 'RequesterRecipientProvider']])
def extendMailingList = ", amar.bhatia@agfa.com"

def workspace = env.JOB_NAME.replace('branches%2F', '')
env.WORKSPACE = "D:\\DEV\\CI\\WS\\${workspace}"
env.JAVA_HOME = "${jdkHome}"

ws(env.WORKSPACE) {
    try {
echo "Building ${env.JOB_NAME} with ${env.BRANCH_NAME} and ${pomFile} using ${env.WORKSPACE}"

if (env.JOB_NAME =~ 'INTEGRATION') {
    integrate(mvnHome, pomFile)
} else if (env.JOB_NAME =~ 'TEST') {
    test(mvnHome, pomFile)
} else if (env.JOB_NAME =~ 'QUALITY') {
    analyse(mvnHome, pomFile)
} else if (env.JOB_NAME =~ 'DEPLOY') {
    deploy(mvnHome, pomFile)
} else if (env.JOB_NAME =~ 'SNAPSHOT') {
    snapshot(mvnHome, pomFile)
} else if (env.JOB_NAME =~ 'RELEASE' && env.BRANCH_NAME != 'trunk') {
    release(mvnHome, pomFile)
}
    } catch (any) {
currentBuild.result = 'FAILURE'
throw any //rethrow exception to prevent the build from proceeding
    } finally {
// wipe workspace
step([$class: 'WsCleanup', cleanWhenFailure: false])
step([$class: 'Mailer', notifyEveryUnstableBuild: true, recipients: mailingList + extendMailingList, sendToIndividuals: true])
    }
}
    }
}

// return the artifact version of a pom file
@NonCPS
def getPomVersion(pomFile) {
    def matcher = pomFile =~ '(.+)'
    matcher ? matcher[1][1] : null
}

// get build artifacts and return a list of their hyperlinks
@NonCPS
def getArtifactsAsList(artifacts) {
    def artifactList = []
    def artifactListAsString = ""

    if(artifacts != null && artifacts.size() > 0) {
artifacts.each { artifact ->
    artifactList << artifact.toString().replace('-SNAPSHOT', '')
}
    }

    if(artifactList != null && artifactList.size() > 0) {
artifactList.each { artifact ->
    artifactListAsString += "* ${artifact}\n"
}
    }

    artifactListAsString += "\nDownload: ${env.BUILD_URL}artifact/\n"

    return artifactListAsString
}

// determine last successfull build
def lastSuccessfulBuild(passedBuilds, build) {
    if ((build != null) && (build.result != 'SUCCESS')) {
passedBuilds.add(build)
lastSuccessfulBuild(passedBuilds, build.getPreviousBuild())
    }
}

// get change log
@NonCPS
def getChangeLog(passedBuilds) {
    echo "Get log"
    def log = ""
    for (int x = 0; x < passedBuilds.size(); x++) {
def currentBuild = passedBuilds[x];
def changeLogSets = currentBuild.rawBuild.changeSets
for (int i = 0; i < changeLogSets.size(); i++) {
    def entries = changeLogSets[i].items
    for (int j = 0; j < entries.length; j++) {
def entry = entries[j]
log += "* ${entry.msg} by ${entry.author} \n"
    }
}
    }

    if (log == "") {
return "No changes\n\n"
    }

    return log + "\n";
}

// check out stage: scm is defined in multibranch pipeline configuration
void checkOut(pomFile) {
    stage('Checkout') {
scm.setWorkspaceUpdater(new CheckoutUpdater())
checkout scm

def v = getPomVersion(readFile(pomFile))
if (v) {
    echo "Version ${v}"
}
    }
}

// update stage: update versions in pom file
void update(mvnHome, pomFile) {
    stage('Update') {
bat "${mvnHome}\\bin\\mvn -f ${pomFile} versions:update-parent versions:update-properties -B"
    }
}

// checkin stage: if some dependency was updated
void checkIn(mvnHome, pomFile) {
    stage('Checkin') {
def pomFileMainBackup = pomFile + '.versionsBackup'

if (fileExists (pomFileMainBackup)) {
    echo "${pomFileMainBackup} exists: " + fileExists (pomFileMainBackup)

    def message = '"[versions-maven-plugin] update parent and/or properties version to latest release"'
    bat "${mvnHome}\\bin\\mvn -f ${pomFile} versions:commit scm:checkin -Dmessage=${message} -B"
} else {
    echo "No changes detected, skip checkin"
}
    }
}

// build stage: package project
void build(mvnHome, pomFile) {
    checkOut(pomFile)
    update(mvnHome, pomFile)

    stage('Build') {
bat "${mvnHome}\\bin\\mvn -f ${pomFile} clean install -B"
    }

    checkIn(mvnHome, pomFile)
}

// archive stage: archive artifacts on jenkins
void archive(pomFile) {
    stage('Archive') {
archiveArtifacts allowEmptyArchive: true, artifacts: '**/target/*.jar, **/target/*.war, **/target/*.zip', excludes: null, fingerprint: true, onlyIfSuccessful: true
    }
}

// create release mail
void notify(version) {
    passedBuilds = []
    lastSuccessfulBuild(passedBuilds, currentBuild)

    def releaseSubject = "ORBIS Fluid Management ${version}"
    def releaseMail = 'fm-releasereports@elink.agfahealthcare.com, amar.bhatia@agfa.com, michael.auss@agfa.com, carsten.schlichting@agfa.com'

    echo "Sending emails to ${releaseMail}"
    emailext mimeType: 'text/plain', body: "${releaseSubject}\n\n" + "Changelog:\n\n" + getChangeLog(passedBuilds) + "Artifacts:\n\n" + getArtifactsAsList(manager.build.artifacts), subject: "${releaseSubject}", to: "${releaseMail}"
}

// test pipeline
void test(mvnHome, pomFile) {
    build(mvnHome, pomFile)

    stage('Test') {
bat "${mvnHome}\\bin\\mvn -f ${pomFile} verify -Pcoverage -Dtest.log4jlevel=DEBUG -Dmaven.test.failure.ignore=true -DdownloadSources=false -DdownloadJavadocs=false"
junit healthScaleFactor: 5.0, allowEmptyResults: true, testDataPublishers: [[$class: 'ClaimTestDataPublisher']], testResults: '**/target/surefire-reports/*.xml, **/target/karma-reports/*.xml, **/target/protractor-reports/*.xml'
publishHTML([allowMissing: true, alwaysLinkToLastBuild: false, keepAll: true, reportDir: 'fluidmanagement/fm-ui/target/protractor-reports', reportFiles: 'htmlReport.html', reportName: 'Protractor'])
publishHTML([allowMissing: true, alwaysLinkToLastBuild: false, keepAll: true, reportDir: 'fluidmanagement/fm-ui/target/coverage/html', reportFiles: 'index.html', reportName: 'Istanbul'])
    }
}

// analyse pipeline
void analyse(mvnHome, pomFile) {
    test(mvnHome, pomFile)

    stage('Analysis') {
withSonarQubeEnv('sonar-hcis-vie-prod') {
    bat "${mvnHome}\\bin\\mvn -f ${pomFile} sonar:sonar -Pcoverage -Dsonar.branch=${env.BRANCH_NAME} -Dsonar.scm.provider=svn"
    jacoco()
}
    }
}

// integration pipeline
void integrate(mvnHome, pomFile) {
    build(mvnHome, pomFile)

    def profile = 'oas-jenkins'
    def version = getPomVersion(readFile(pomFile))

    stage('Undeploy') {
bat "${mvnHome}\\bin\\mvn -f ${pomFile} clean -Ddeploy=${profile}"
    }

    stage('Deploy') {
bat "${mvnHome}\\bin\\mvn -f ${pomFile} install -DskipTests=true -Ddeploy=${profile}"
    }

    currentBuild.description = "${version} @ ${profile}"
}

// deploy pipeline
void deploy(mvnHome, pomFile) {
    def profiles = 'oas-remote-vie\noas-remote-vie85'
    def profile = ''
    def timeOutTime = 5
    def timeOutUnit = 'MINUTES'

    try {
echo ("wating ${timeOutTime} ${timeOutUnit} for user input")
timeout(time: timeOutTime, unit: timeOutUnit) {
    profile = input id: 'profile', message: 'Choose OAS Profile', parameters: [choice(choices: profiles, description: 'OAS profiles defined in POM', name: 'OAS')]
}
echo ('input profile: ' + profile)
    } catch (err) {
profile = 'oas-remote-vie'
echo ('input timeout, default profile: ' + profile)
    }

    build(mvnHome, pomFile)
    def version = getPomVersion(readFile(pomFile))

    stage('Deploy') {
bat "${mvnHome}\\bin\\mvn -f ${pomFile} install -DskipTests=true -Ddeploy=${profile}"
    }

    currentBuild.description = "${version} @ ${profile}"
}

// snapshot pipeline
void snapshot(mvnHome, pomFile) {
    build(mvnHome, pomFile)
    def version = getPomVersion(readFile(pomFile))

    stage('Snapshot') {
bat "${mvnHome}\\bin\\mvn -f ${pomFile} deploy -B -Porbis-dev"
publishHTML([allowMissing: true, alwaysLinkToLastBuild: false, keepAll: true, reportDir: 'fluidmanagement/apis/target/apidoc', reportFiles: 'index.html', reportName: 'API Docs'])
    }

    currentBuild.description = version
}

// release pipeline
void release(mvnHome, pomFile) {
    build(mvnHome, pomFile)
    def version = getPomVersion(readFile(pomFile)).replace('-SNAPSHOT', '')
    archive(pomFile)

    stage('Release') {
bat "${mvnHome}\\bin\\mvn -f ${pomFile} -DdownloadSources=false -DdownloadJavadocs=false -Dmaven.test.failure.ignore=true release:prepare release:perform  orbiscomponent:reset-start-revision -B -Porbis-dev"
publishHTML([allowMissing: true, alwaysLinkToLastBuild: false, keepAll: true, reportDir: 'fluidmanagement/apis/target/apidoc', reportFiles: 'index.html', reportName: 'API Docs'])
    }

    currentBuild.description = "ORBIS Fluid Management ${version}"
    notify(version)
}

 

 

timja commented 7 years ago

fredericmeyrou:

Looks like https://issues.jenkins-ci.org/browse/JENKINS-43197 might be related

timja commented 6 years ago

svanoort:

fredericmeyrou Per https://issues.jenkins-ci.org/browse/JENKINS-33358 and JENKINS-43197 as you linked this should be resolved by upgrading to the LTS 2.73.x because it is due to a groovy class metadata bug that was resolved in the later version used in that core.  It MIGHT be possible to work around it by changing your setting for the Java argument " -Dgroovy.use.classvalue=true" (if present, remove that, if absent, add it). 

I would also STRONGLY encourage use of Script Security version 1.35 in conjunction with this, because it now armors many plugins against memory leaks from groovy (after a feature I released yesterday).  I've seen several instances on the same scale as yours which were having to restart regularly due to running out of memory.

Since this groovy bug is resolve by the core upgrade, I'm going to go ahead and close as a duplicate

timja commented 2 years ago

[Duplicates: JENKINS-43197]