jenkinsci / configuration-as-code-plugin

Jenkins Configuration as Code Plugin
https://plugins.jenkins.io/configuration-as-code
MIT License
2.68k stars 717 forks source link

Jenkins Master fails to boot after CASC Plugin upgrade to release 1.48 #1595

Closed lenaing closed 3 years ago

lenaing commented 3 years ago

Hello,

We rebuild our own Jenkins Master from the base Jenkins LTS image, incorporating many plugins and the configuration-as-code one. Since the release of 1.48, our masters fails to boot without any relevant logs and just hangs at jenkins.InitReactorRunner$1#onAttained: System config loaded.

The only diff in our plugins list is :

When we rebuild with version 1.47, masters boot as expected.

Please find attached our CASC YAML, full plugin list and Jenkins logs.

Kind regards,

jenkins_casc.yaml.txt jenkins_stderr.txt jenkins_stdout.txt jenkins_plugins_list.txt

timja commented 3 years ago

@jtnord you able to take a look? ^^

lenaing commented 3 years ago

We had to strip down the CASC Yaml for obvious confidentiality issues, but if required I can provide a more complete one ;)

bguerin commented 3 years ago

Also reported here : https://issues.jenkins.io/browse/JENKINS-65518

jtnord commented 3 years ago

looking

jtnord commented 3 years ago

I can not reproduce this. I have download all plugins mentioned to JENKINS_HOME/plugins (with the exception of pipeline-build-step:2.14-SNAPSHOT (private-21cb2b18-root) which I replaced with pipeline-build-step:2.13 fixed up the YAML so it was valid ([redacted] needed to be wrapped in quotes, and there where several duplicate sections that I merged) then did the following.

  1. start jenkins normally
  2. in the wizard create an admin user so I can log in :)
  3. stop jenkins
  4. set CASC_JENKINS_CONFIG to point to the fixed up config
  5. start jenkins again
  6. observe it start up
  7. stop it and remove all filesin JENKINS_HOME except the plugin directory
  8. start it again.
Running from: C:\workarea\wars\jenkins-2.277.3.war
webroot: EnvVars.masterEnvVars.get("JENKINS_HOME")
2021-04-30 12:49:45.737+0000 [id=1]     INFO    org.eclipse.jetty.util.log.Log#initialized: Logging initialized @456ms to org.eclipse.jetty.util.log.JavaUtilLog
2021-04-30 12:49:45.812+0000 [id=1]     INFO    winstone.Logger#logInternal: Beginning extraction from war file
2021-04-30 12:49:47.167+0000 [id=1]     WARNING o.e.j.s.handler.ContextHandler#setContextPath: Empty contextPath
2021-04-30 12:49:47.209+0000 [id=1]     INFO    org.eclipse.jetty.server.Server#doStart: jetty-9.4.39.v20210325; built: 2021-03-25T14:42:11.471Z; git: 9fc7ca5a922f2a37b84ec9dbc26a5168cee7e667; jvm 1.8.0_265-b01
2021-04-30 12:49:47.571+0000 [id=1]     INFO    o.e.j.w.StandardDescriptorProcessor#visitServlet: NO JSP Support for /, did not find org.eclipse.jetty.jsp.JettyJspServlet
2021-04-30 12:49:47.599+0000 [id=1]     INFO    o.e.j.s.s.DefaultSessionIdManager#doStart: DefaultSessionIdManager workerName=node0
2021-04-30 12:49:47.599+0000 [id=1]     INFO    o.e.j.s.s.DefaultSessionIdManager#doStart: No SessionScavenger set, using defaults
2021-04-30 12:49:47.600+0000 [id=1]     INFO    o.e.j.server.session.HouseKeeper#startScavenging: node0 Scavenging every 660000ms
2021-04-30 12:49:47.916+0000 [id=1]     INFO    hudson.WebAppMain#contextInitialized: Jenkins home directory: C:\workarea\wars\CASC_HANG found at: EnvVars.masterEnvVars.get("JENKINS_HOME")
2021-04-30 12:49:47.980+0000 [id=1]     INFO    o.e.j.s.handler.ContextHandler#doStart: Started w.@6ab72419{Jenkins v2.277.3,/,file:///C:/workarea/wars/CASC_HANG/war/,AVAILABLE}{C:\workarea\wars\CASC_HANG\war}
2021-04-30 12:49:47.997+0000 [id=1]     INFO    o.e.j.server.AbstractConnector#doStart: Started ServerConnector@6f15d60e{HTTP/1.1, (http/1.1)}{127.0.0.1:8080}
2021-04-30 12:49:47.998+0000 [id=1]     INFO    org.eclipse.jetty.server.Server#doStart: Started @2717ms
2021-04-30 12:49:47.998+0000 [id=34]    INFO    winstone.Logger#logInternal: Winstone Servlet Engine running: controlPort=disabled
2021-04-30 12:49:48.811+0000 [id=41]    INFO    jenkins.InitReactorRunner$1#onAttained: Started initialization
2021-04-30 12:49:49.279+0000 [id=54]    INFO    jenkins.InitReactorRunner$1#onAttained: Listed all plugins
2021-04-30 12:49:57.301+0000 [id=66]    INFO    jenkins.InitReactorRunner$1#onAttained: Prepared all plugins
null
2021-04-30 12:49:58.474+0000 [id=43]    INFO    h.p.c.CopyArtifactConfiguration#load: CopyArtifact is set to Production mode.
2021-04-30 12:49:59.258+0000 [id=59]    INFO    jenkins.InitReactorRunner$1#onAttained: Started all plugins
2021-04-30 12:49:59.280+0000 [id=59]    INFO    jenkins.InitReactorRunner$1#onAttained: Augmented all extensions
2021-04-30 12:49:59.338+0000 [id=53]    INFO    jenkins.InitReactorRunner$1#onAttained: System config loaded
2021-04-30 12:49:59.833+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'HTTP_PROXY'. Will default to empty string
2021-04-30 12:49:59.844+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'HTTPS_PROXY'. Will default to empty string
2021-04-30 12:49:59.845+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'PROXY_HOST'. Will default to empty string
2021-04-30 12:49:59.845+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'PROXY_PORT'. Will default to empty string
2021-04-30 12:49:59.846+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'NO_PROXY'. Will default to empty string
2021-04-30 12:49:59.847+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'NO_PROXY_HOSTS'. Will default to empty string2021-04-30 12:49:59.848+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'GLOBAL_MAVEN_MIRROR'. Will default to empty string
2021-04-30 12:49:59.849+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'CONTAINER_MAVENCACHE_ROOTDIR'. Will default to empty string
2021-04-30 12:49:59.850+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'NPM_CONFIG_REGISTRY'. Will default to empty string
2021-04-30 12:49:59.851+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'CONTAINER_NPMCACHE_ROOTDIR'. Will default to empty string
2021-04-30 12:49:59.852+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'PIP_INDEX_URL'. Will default to empty string
2021-04-30 12:49:59.854+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'CURL_CA_BUNDLE'. Will default to empty string2021-04-30 12:49:59.855+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'CURL_CA_BUNDLE'. Will default to empty string2021-04-30 12:49:59.882+0000 [id=53]    WARNING i.j.p.casc.BaseConfigurator#createAttribute: Can't handle class javaposse.jobdsl.plugin.GlobalJobDslSecurityConfiguration#metaClass: type is abstract but not Describable.
2021-04-30 12:49:59.886+0000 [id=53]    WARNING i.j.p.casc.BaseConfigurator#createAttribute: Can't handle class javaposse.jobdsl.plugin.GlobalJobDslSecurityConfiguration#metaClass: type is abstract but not Describable.
2021-04-30 12:50:00.558+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'PLATEFORME_NAME'. Will default to empty string
2021-04-30 12:50:00.559+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'APPLICATION_NAME'. Will default to empty string
2021-04-30 12:50:00.559+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'ROOT_URL'. Will default to empty string
2021-04-30 12:50:00.573+0000 [id=53]    WARNING i.j.p.casc.BaseConfigurator#createAttribute: Can't handle class jenkins.plugins.git.GitSCMSource#owner: type is abstract but not Describable.
2021-04-30 12:50:00.593+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'HOST'. Will default to empty string
2021-04-30 12:50:00.604+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'MATOMO_HOST'. Will default to empty string
2021-04-30 12:50:00.604+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'MATOMO_SITE_ID'. Will default to empty string2021-04-30 12:50:00.605+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'MATOMO_PATH'. Will default to empty string
2021-04-30 12:50:00.606+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'GRAPHITE_HOST'. Will default to empty string
2021-04-30 12:50:00.606+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'GRAPHITE_PORT'. Will default to empty string
2021-04-30 12:50:00.606+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'GRAPHITE_KEY'. Will default to empty string
2021-04-30 12:50:00.852+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'HTTP_PROXY'. Will default to empty string
2021-04-30 12:50:00.853+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'HTTPS_PROXY'. Will default to empty string
2021-04-30 12:50:00.854+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'PROXY_HOST'. Will default to empty string
2021-04-30 12:50:00.855+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'PROXY_PORT'. Will default to empty string
2021-04-30 12:50:00.856+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'NO_PROXY'. Will default to empty string
2021-04-30 12:50:00.856+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'NO_PROXY_HOSTS'. Will default to empty string2021-04-30 12:50:00.857+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'GLOBAL_MAVEN_MIRROR'. Will default to empty string
2021-04-30 12:50:00.858+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'CONTAINER_MAVENCACHE_ROOTDIR'. Will default to empty string
2021-04-30 12:50:00.858+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'NPM_CONFIG_REGISTRY'. Will default to empty string
2021-04-30 12:50:00.859+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'CONTAINER_NPMCACHE_ROOTDIR'. Will default to empty string
2021-04-30 12:50:00.860+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'PIP_INDEX_URL'. Will default to empty string
2021-04-30 12:50:00.861+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'CURL_CA_BUNDLE'. Will default to empty string2021-04-30 12:50:00.862+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'CURL_CA_BUNDLE'. Will default to empty string2021-04-30 12:50:00.887+0000 [id=53]    INFO    o.s.s.l.DefaultSpringSecurityContextSource#<init>:  URL 'ldap://ldap.example.com/DC=COM', root DN is 'DC=COM'
2021-04-30 12:50:00.901+0000 [id=53]    INFO    o.s.s.l.s.FilterBasedLdapUserSearch#<init>: SearchBase not set. Searches will be performed from the root: dc=COM
2021-04-30 12:50:00.944+0000 [id=53]    INFO    o.s.s.l.u.DefaultLdapAuthoritiesPopulator#<init>: groupSearchBase is empty. Searches will be performed from the context source base
2021-04-30 12:50:01.066+0000 [id=53]    WARNING i.j.p.casc.BaseConfigurator#createAttribute: Can't handle class javaposse.jobdsl.plugin.GlobalJobDslSecurityConfiguration#metaClass: type is abstract but not Describable.
2021-04-30 12:50:01.068+0000 [id=53]    WARNING i.j.p.casc.BaseConfigurator#createAttribute: Can't handle class javaposse.jobdsl.plugin.GlobalJobDslSecurityConfiguration#metaClass: type is abstract but not Describable.
2021-04-30 12:50:01.721+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'PLATEFORME_NAME'. Will default to empty string
2021-04-30 12:50:01.721+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'APPLICATION_NAME'. Will default to empty string
2021-04-30 12:50:01.723+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'ROOT_URL'. Will default to empty string
2021-04-30 12:50:01.766+0000 [id=53]    WARNING i.j.p.casc.BaseConfigurator#createAttribute: Can't handle class jenkins.plugins.git.GitSCMSource#owner: type is abstract but not Describable.
2021-04-30 12:50:01.793+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'HOST'. Will default to empty string
2021-04-30 12:50:01.810+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'MATOMO_HOST'. Will default to empty string
2021-04-30 12:50:01.812+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'MATOMO_SITE_ID'. Will default to empty string2021-04-30 12:50:01.812+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'MATOMO_PATH'. Will default to empty string
2021-04-30 12:50:01.817+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'GRAPHITE_HOST'. Will default to empty string
2021-04-30 12:50:01.817+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'GRAPHITE_PORT'. Will default to empty string
2021-04-30 12:50:01.818+0000 [id=53]    WARNING i.j.p.c.SecretSourceResolver$UnresolvedLookup#lookup: Configuration import: Found unresolved variable 'GRAPHITE_KEY'. Will default to empty string
2021-04-30 12:50:01.825+0000 [id=53]    INFO    j.m.impl.graphite.PluginImpl#updateReporters: Starting Graphite reporter to :0 with prefix localhost
2021-04-30 12:50:01.867+0000 [id=67]    INFO    jenkins.InitReactorRunner$1#onAttained: System config adapted
2021-04-30 12:50:01.868+0000 [id=51]    INFO    jenkins.InitReactorRunner$1#onAttained: Loaded all jobs
2021-04-30 12:50:01.868+0000 [id=62]    INFO    jenkins.InitReactorRunner$1#onAttained: Configuration for all jobs updated
2021-04-30 12:50:01.869+0000 [id=92]    INFO    c.c.s.QuickDiskUsagePlugin$3#run: Waiting for Jenkins to be up before computing disk usage
2021-04-30 12:50:01.874+0000 [id=94]    INFO    hudson.model.AsyncPeriodicWork#lambda$doRun$0: Started Download metadata
2021-04-30 12:50:01.884+0000 [id=94]    INFO    hudson.util.Retrier#start: Attempt #1 to do the action check updates server
2021-04-30 12:50:07.198+0000 [id=52]    INFO    jenkins.InitReactorRunner$1#onAttained: Completed initialization
2021-04-30 12:50:07.235+0000 [id=31]    INFO    hudson.WebAppMain$3#run: Jenkins is fully up and running
2021-04-30 12:50:08.802+0000 [id=94]    INFO    h.m.DownloadService$Downloadable#load: Obtained the updated data file for hudson.tasks.Maven.MavenInstaller
2021-04-30 12:50:08.931+0000 [id=94]    INFO    h.m.DownloadService$Downloadable#load: Obtained the updated data file for hudson.tasks.Ant.AntInstaller
2021-04-30 12:50:09.167+0000 [id=94]    INFO    h.m.DownloadService$Downloadable#load: Obtained the updated data file for hudson.plugins.gradle.GradleInstaller
2021-04-30 12:50:09.297+0000 [id=94]    INFO    h.m.DownloadService$Downloadable#load: Obtained the updated data file for hudson.plugins.groovy.GroovyInstaller
2021-04-30 12:50:09.428+0000 [id=94]    INFO    h.m.DownloadService$Downloadable#load: Obtained the updated data file for hudson.plugins.sonar.MsBuildSonarQubeRunnerInstaller
2021-04-30 12:50:09.551+0000 [id=94]    INFO    h.m.DownloadService$Downloadable#load: Obtained the updated data file for hudson.plugins.sonar.SonarRunnerInstaller
2021-04-30 12:50:10.129+0000 [id=94]    INFO    h.m.DownloadService$Downloadable#load: Obtained the updated data file for hudson.tools.JDKInstaller
2021-04-30 12:50:10.130+0000 [id=94]    INFO    hudson.util.Retrier#start: Performed the action check updates server successfully at the attempt #1
2021-04-30 12:50:10.133+0000 [id=94]    INFO    hudson.model.AsyncPeriodicWork#lambda$doRun$0: Finished Download metadata. 8,258 ms
2021-04-30 12:50:11.879+0000 [id=104]   INFO    hudson.model.AsyncAperiodicWork#lambda$doAperiodicRun$0: Started Update IdP Metadata from URL PeriodicWork
2021-04-30 12:50:11.880+0000 [id=104]   INFO    hudson.model.AsyncAperiodicWork#lambda$doAperiodicRun$0: Finished Update IdP Metadata from URL PeriodicWork. 1 ms
jglick commented 3 years ago

just hangs

Capture a thread dump. (kill -QUIT $pid or jstack $pid)

jtnord commented 3 years ago

The mesos cloud won't create correctly for me so I had to comment it out - but this is the config I used. jenkins_casc.yaml.txt

jtnord commented 3 years ago

FTR works for me with the same file and plugins in docker also. docker run --env CASC_JENKINS_CONFIG=/var/jenkins_home/jenkins_casc.yaml --publish 8080:8080 --volume C:\workarea\wars\CASC_HANG_DOCKER:/var/jenkins_home jenkins/jenkins:lts

lenaing commented 3 years ago

On an empty Jenkins configuration, first run is OK but second run hangs. On an existing configuration, it just hangs.

Please find attached 3 Threads dumps on the same process.

This part seems to indicate a thread waiting for the RoleStrategy plugin, I'll try to switch back to 1.47 on this same instance to check if it starts up:

"ConfigurationAsCode.init" #28 daemon prio=5 os_prio=0 tid=0x00007fb630207000 nid=0x976 runnable [0x00007fb68015e000]
   java.lang.Thread.State: RUNNABLE
        at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1934)
        at com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2373)
        at com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2356)
        at com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:108)
        at com.github.benmanes.caffeine.cache.LocalLoadingCache.get(LocalLoadingCache.java:54)
        at io.jenkins.plugins.casc.impl.DefaultConfiguratorRegistry.lookup(DefaultConfiguratorRegistry.java:82)
        at io.jenkins.plugins.casc.impl.DefaultConfiguratorRegistry.internalLookup(DefaultConfiguratorRegistry.java:119)
        at io.jenkins.plugins.casc.impl.DefaultConfiguratorRegistry.access$000(DefaultConfiguratorRegistry.java:42)
        at io.jenkins.plugins.casc.impl.DefaultConfiguratorRegistry$1.load(DefaultConfiguratorRegistry.java:90)
        at io.jenkins.plugins.casc.impl.DefaultConfiguratorRegistry$1.load(DefaultConfiguratorRegistry.java:87)
        at com.github.benmanes.caffeine.cache.LocalLoadingCache.lambda$newMappingFunction$2(LocalLoadingCache.java:141)
        at com.github.benmanes.caffeine.cache.LocalLoadingCache$$Lambda$243/160341044.apply(Unknown Source)
        at com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$14(BoundedLocalCache.java:2375)
        at com.github.benmanes.caffeine.cache.BoundedLocalCache$$Lambda$296/1426841286.apply(Unknown Source)
        at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853)
        - locked <0x00000005c0079828> (a java.util.concurrent.ConcurrentHashMap$ReservationNode)
        at com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2373)
        at com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2356)
        at com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:108)
        at com.github.benmanes.caffeine.cache.LocalLoadingCache.get(LocalLoadingCache.java:54)
        at io.jenkins.plugins.casc.impl.DefaultConfiguratorRegistry.lookupOrFail(DefaultConfiguratorRegistry.java:67)
        at io.jenkins.plugins.casc.ConfigurationContext.lookupOrFail(ConfigurationContext.java:112)
        at io.jenkins.plugins.casc.impl.configurators.DataBoundConfigurator.tryConstructor(DataBoundConfigurator.java:140)
        at io.jenkins.plugins.casc.impl.configurators.DataBoundConfigurator.instance(DataBoundConfigurator.java:76)
        at io.jenkins.plugins.casc.BaseConfigurator.configure(BaseConfigurator.java:267)
        at io.jenkins.plugins.casc.impl.configurators.DataBoundConfigurator.configure(DataBoundConfigurator.java:82)
        at org.jenkinsci.plugins.rolestrategy.casc.RoleBasedAuthorizationStrategyConfigurator.instance(RoleBasedAuthorizationStrategyConfigurator.java:58)
        at org.jenkinsci.plugins.rolestrategy.casc.RoleBasedAuthorizationStrategyConfigurator.instance(RoleBasedAuthorizationStrategyConfigurator.java:34)
        at io.jenkins.plugins.casc.BaseConfigurator.configure(BaseConfigurator.java:267)
        at io.jenkins.plugins.casc.impl.configurators.HeteroDescribableConfigurator.lambda$doConfigure$16668e2$1(HeteroDescribableConfigurator.java:277)
        at io.jenkins.plugins.casc.impl.configurators.HeteroDescribableConfigurator$$Lambda$320/1357238254.apply(Unknown Source)
        at io.vavr.CheckedFunction0.lambda$unchecked$52349c75$1(CheckedFunction0.java:247)
        at io.vavr.CheckedFunction0$$Lambda$301/1017878535.apply(Unknown Source)
        at io.jenkins.plugins.casc.impl.configurators.HeteroDescribableConfigurator.doConfigure(HeteroDescribableConfigurator.java:277)
        at io.jenkins.plugins.casc.impl.configurators.HeteroDescribableConfigurator.lambda$configure$2(HeteroDescribableConfigurator.java:86)
        at io.jenkins.plugins.casc.impl.configurators.HeteroDescribableConfigurator$$Lambda$319/1361207555.apply(Unknown Source)
        at io.vavr.control.Option.map(Option.java:392)
        at io.jenkins.plugins.casc.impl.configurators.HeteroDescribableConfigurator.lambda$configure$3(HeteroDescribableConfigurator.java:86)
        at io.jenkins.plugins.casc.impl.configurators.HeteroDescribableConfigurator$$Lambda$302/1628738105.apply(Unknown Source)
        at io.vavr.Tuple2.apply(Tuple2.java:238)
        at io.jenkins.plugins.casc.impl.configurators.HeteroDescribableConfigurator.configure(HeteroDescribableConfigurator.java:83)
        at io.jenkins.plugins.casc.impl.configurators.HeteroDescribableConfigurator.check(HeteroDescribableConfigurator.java:92)
        at io.jenkins.plugins.casc.impl.configurators.HeteroDescribableConfigurator.check(HeteroDescribableConfigurator.java:55)
        at io.jenkins.plugins.casc.BaseConfigurator.configure(BaseConfigurator.java:351)
        at io.jenkins.plugins.casc.BaseConfigurator.check(BaseConfigurator.java:287)
        at io.jenkins.plugins.casc.ConfigurationAsCode.lambda$checkWith$8(ConfigurationAsCode.java:750)
        at io.jenkins.plugins.casc.ConfigurationAsCode$$Lambda$280/2074231074.apply(Unknown Source)
        at io.jenkins.plugins.casc.ConfigurationAsCode.invokeWith(ConfigurationAsCode.java:686)
        at io.jenkins.plugins.casc.ConfigurationAsCode.checkWith(ConfigurationAsCode.java:750)
        at io.jenkins.plugins.casc.ConfigurationAsCode.configureWith(ConfigurationAsCode.java:735)
        at io.jenkins.plugins.casc.ConfigurationAsCode.configureWith(ConfigurationAsCode.java:611)
        at io.jenkins.plugins.casc.ConfigurationAsCode.configure(ConfigurationAsCode.java:297)
        at io.jenkins.plugins.casc.ConfigurationAsCode.init(ConfigurationAsCode.java:289)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:104)
        at hudson.init.TaskMethodFinder$TaskImpl.run(TaskMethodFinder.java:175)
        at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:296)
        at jenkins.model.Jenkins$5.runTask(Jenkins.java:1131)
        at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:214)
        at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:117)
        at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

jenkins_jstack1.log jenkins_jstack2.log jenkins_jstack3.log

lenaing commented 3 years ago

I confirm that by using 1.47 I do not have any issue with the aformentioned instance. I'll try to remove the RoleStrategy CASC part to check if this is the blocking part, but this is not something we want to deactivate :)

jtnord commented 3 years ago

if it is in the RoleBasedAuthorizationStrategyConfigurator then it is likely dependant on your configuration which was redacted.

Could you create a reproducable test case with https://github.com/rroemhild/docker-test-openldap as your AuthorizationStrategy ao that you can supply the full config unredacted?

lenaing commented 3 years ago

I'll try to do so. In the meantime, please find attached the CASC Part related to the Authorization Strategy. I just switched our users to generic ones.

jenkins_authorization_strategy.yaml.txt

jglick commented 3 years ago

Clearly caused by #1594. Can we just revert at least the Caffeine-related part of that patch until this is better studied?

lenaing commented 3 years ago

I removed the Authorization Strategy part and it still fails to load. Find attached another Thread Dump. jenkins_jstack4.log

Here is the new locked part :

"ConfigurationAsCode.init" #28 daemon prio=5 os_prio=0 tid=0x00007f7168207000 nid=0x97a runnable [0x00007f7176424000]
   java.lang.Thread.State: RUNNABLE
        at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1934)
        at com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2373)
        at com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2356)
        at com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:108)
        at com.github.benmanes.caffeine.cache.LocalLoadingCache.get(LocalLoadingCache.java:54)
        at io.jenkins.plugins.casc.impl.DefaultConfiguratorRegistry.lookup(DefaultConfiguratorRegistry.java:82)
        at io.jenkins.plugins.casc.impl.DefaultConfiguratorRegistry.internalLookup(DefaultConfiguratorRegistry.java:119)
        at io.jenkins.plugins.casc.impl.DefaultConfiguratorRegistry.access$000(DefaultConfiguratorRegistry.java:42)
        at io.jenkins.plugins.casc.impl.DefaultConfiguratorRegistry$1.load(DefaultConfiguratorRegistry.java:90)
        at io.jenkins.plugins.casc.impl.DefaultConfiguratorRegistry$1.load(DefaultConfiguratorRegistry.java:87)
        at com.github.benmanes.caffeine.cache.LocalLoadingCache.lambda$newMappingFunction$2(LocalLoadingCache.java:141)
        at com.github.benmanes.caffeine.cache.LocalLoadingCache$$Lambda$260/1237706199.apply(Unknown Source)
        at com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$14(BoundedLocalCache.java:2375)
        at com.github.benmanes.caffeine.cache.BoundedLocalCache$$Lambda$300/296894824.apply(Unknown Source)
        at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853)
        - locked <0x000000079de609d8> (a java.util.concurrent.ConcurrentHashMap$ReservationNode)
        at com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2373)
        at com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2356)
        at com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:108)
        at com.github.benmanes.caffeine.cache.LocalLoadingCache.get(LocalLoadingCache.java:54)
        at io.jenkins.plugins.casc.impl.DefaultConfiguratorRegistry.lookupOrFail(DefaultConfiguratorRegistry.java:67)
        at io.jenkins.plugins.casc.ConfigurationContext.lookupOrFail(ConfigurationContext.java:112)
        at io.jenkins.plugins.casc.impl.configurators.DataBoundConfigurator.tryConstructor(DataBoundConfigurator.java:140)
        at io.jenkins.plugins.casc.impl.configurators.DataBoundConfigurator.instance(DataBoundConfigurator.java:76)
        at io.jenkins.plugins.casc.BaseConfigurator.configure(BaseConfigurator.java:267)
        at io.jenkins.plugins.casc.impl.configurators.DataBoundConfigurator.configure(DataBoundConfigurator.java:82)
        at io.jenkins.plugins.casc.impl.configurators.HeteroDescribableConfigurator.lambda$doConfigure$16668e2$1(HeteroDescribableConfigurator.java:277)
        at io.jenkins.plugins.casc.impl.configurators.HeteroDescribableConfigurator$$Lambda$324/1888781293.apply(Unknown Source)
        at io.vavr.CheckedFunction0.lambda$unchecked$52349c75$1(CheckedFunction0.java:247)
        at io.vavr.CheckedFunction0$$Lambda$305/2069169341.apply(Unknown Source)
        at io.jenkins.plugins.casc.impl.configurators.HeteroDescribableConfigurator.doConfigure(HeteroDescribableConfigurator.java:277)
        at io.jenkins.plugins.casc.impl.configurators.HeteroDescribableConfigurator.lambda$configure$2(HeteroDescribableConfigurator.java:86)
        at io.jenkins.plugins.casc.impl.configurators.HeteroDescribableConfigurator$$Lambda$323/1494742461.apply(Unknown Source)
        at io.vavr.control.Option.map(Option.java:392)
        at io.jenkins.plugins.casc.impl.configurators.HeteroDescribableConfigurator.lambda$configure$3(HeteroDescribableConfigurator.java:86)
        at io.jenkins.plugins.casc.impl.configurators.HeteroDescribableConfigurator$$Lambda$306/2046340190.apply(Unknown Source)
        at io.vavr.Tuple2.apply(Tuple2.java:238)
        at io.jenkins.plugins.casc.impl.configurators.HeteroDescribableConfigurator.configure(HeteroDescribableConfigurator.java:83)
        at io.jenkins.plugins.casc.impl.configurators.HeteroDescribableConfigurator.check(HeteroDescribableConfigurator.java:92)
        at io.jenkins.plugins.casc.impl.configurators.HeteroDescribableConfigurator.check(HeteroDescribableConfigurator.java:55)
        at io.jenkins.plugins.casc.BaseConfigurator.configure(BaseConfigurator.java:344)
        at io.jenkins.plugins.casc.BaseConfigurator.check(BaseConfigurator.java:287)
        at io.jenkins.plugins.casc.ConfigurationAsCode.lambda$checkWith$8(ConfigurationAsCode.java:750)
        at io.jenkins.plugins.casc.ConfigurationAsCode$$Lambda$293/1577146948.apply(Unknown Source)
        at io.jenkins.plugins.casc.ConfigurationAsCode.invokeWith(ConfigurationAsCode.java:686)
        at io.jenkins.plugins.casc.ConfigurationAsCode.checkWith(ConfigurationAsCode.java:750)
        at io.jenkins.plugins.casc.ConfigurationAsCode.configureWith(ConfigurationAsCode.java:735)
        at io.jenkins.plugins.casc.ConfigurationAsCode.configureWith(ConfigurationAsCode.java:611)
        at io.jenkins.plugins.casc.ConfigurationAsCode.configure(ConfigurationAsCode.java:297)
        at io.jenkins.plugins.casc.ConfigurationAsCode.init(ConfigurationAsCode.java:289)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:104)
        at hudson.init.TaskMethodFinder$TaskImpl.run(TaskMethodFinder.java:175)
        at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:296)
        at jenkins.model.Jenkins$5.runTask(Jenkins.java:1131)
        at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:214)
        at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:117)
        at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
jtnord commented 3 years ago

1596

jglick commented 3 years ago

What Java version? A fully updated 8, or something else?

jglick commented 3 years ago

https://github.com/ben-manes/caffeine/commits/master/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java has a bunch of activity after 2.9.0. Unclear to me what is at fault here.

ben-manes commented 3 years ago

Caffeine does not support recursive computations, as this is not allowed by ConcurrentHashMap. In jdk8 this will livelock, while later releases will throw an exception if detected. The faq discusses a workaround, but it is not wonderful.

Guava does not support this either, but will more often work and can detect only when the same entry is recursed to. It is susceptible to deadlocks if A=>B and B=>A. If a stack overflow occurs, it can lead to ReentrantLock being corrupted and never be released.

The alternative is to not compute through the cache and use a racy load. Even better if this recursion can be removed.

jglick commented 3 years ago

Recursion is probably difficult to avoid in this case. Suggest #1596 today, then switch to using simpler “racy load” logic without a cache library if that is possible. Or it may be possible to use HashMap.computeIfAbsent and just keep a lock on the cache singleton—I would imagine only one thread is using this stuff anyway (should all be inside ConfigurationAsCode.init at least during Jenkins startup, and later CasC reload presumably does not need to run concurrently).

jtnord commented 3 years ago

then switch to using simpler “racy load” logic without a cache library if that is possible

I'm not sure why things are using the cache but I guess because we can throw away loads of things after Jenkins has started up rather than keep them around.

That said the type of thing we are configuring is likely already in memory as is the thing we are using to configure it (as Configurator is an ExtensionPoint).

ben-manes commented 3 years ago

Note that HashMap.computeIfAbsent also does not support recursive computations. In JDK8 that will corrupt the hash table, while in later releases it will throw an exception.

The workaround in Caffeine is to mimic Guava, which the FAQ uses AsyncCache as a simple approach. Instead of computing through the map, the entry holds a future value and it is computed outside of the map. This can be done without threads with the following pseudo code.

var supplier = cache.get(key, k -> Suppliers.memoize(() -> load(key)));
return supplier.get();

where Guava's Supplier.memoize uses double-checked locking to compute the value once.

If A=>B=>A then this will loop with the lock held, causing a stack overflow or other issues. In Guava it resulted in a deadlock, so I detected this case using Thread.holdsLock. If you go down this approach and care about that, the something like this could be done.

That said, cache recursion is tricky so you might decide it is too errorprone to support.

jglick commented 3 years ago

Recursion is probably difficult to avoid in this case.

Changed my mind after actually looking at the code here. #1597 may do the trick, at the expense of sometimes doing a bit more work.

jglick commented 3 years ago

(and many thanks to @ben-manes for the near-real-time tech support)

timja commented 3 years ago

Is someone who can reproduce this able to test the hpi from https://repo.jenkins-ci.org/incrementals/io/jenkins/configuration-as-code/1.50-rc1193.ce2ce287cf96/ please?

jtnord commented 3 years ago

Would still love a reproducible test case for this.. must be somethign about the configuration and plugin that triggers the re-entry

lenaing commented 3 years ago

@timja I hope to have some time today to check this HPI, I'll keep you posted. @jtnord I'll try to make a minimal test case early next week but I have to disable a lot of specific setup beforehand, so I don't promise anything ;)

ben-manes commented 3 years ago

I updated the FAQ entry to show an example of using Suppliers.memoize to lazily (and atomically) compute the value outside of the cache's atomic scope. This may or may not be useful to you, but was cleaner than the async approach previously described on that wiki page. Guava's utility is a trivial implementation of double-checked locking, so that is easy to extract if you don't want the dependency. Of course doing this requires being very careful to set it up perfectly.

LoadingCache<Integer, Supplier<Integer>> cache = Caffeine.newBuilder()
    .build(x -> Suppliers.memoize(() -> (x == 1) ? 1 : x * factorial(x - 1)));

int factorial(int x) {
  var memoizer = cache.get(x);
  return memoizer.get();
}
lenaing commented 3 years ago

Here are the results of my tests :

Test case to reproduce : plugins.txt jenkins.yaml.txt You'll need to change the IP to reach your LDAP test server

# In a shell
docker run --rm -p 10389:10389 -p 10636:10636 rroemhild/test-openldap
# In another shell
mkdir -p jenkins_home
# Initialize Jenkins and run it once to copy plugins in Jenkins Home
docker run -ti --entrypoint "/bin/bash" -p 8080:8080 -p 50000:50000 -v ${PWD}/jenkins_home:/var/jenkins_home -v ${PWD}/plugins.txt:/usr/share/jenkins/ref/plugins.txt -v ${PWD}/jenkins.yaml:/var/jenkins_home/jenkins.yaml jenkins/jenkins:lts -c "/usr/local/bin/install-plugins.sh < /usr/share/jenkins/ref/plugins.txt && echo 2 > /usr/share/jenkins/ref/jenkins.install.UpgradeWizard.state && /usr/local/bin/jenkins.sh"
# Subsequent runs that triggers the lock
docker run -ti -p 8080:8080 -p 50000:50000 -v ${PWD}/jenkins_home:/var/jenkins_home  -v ${PWD}/jenkins.yaml:/var/jenkins_home/jenkins.yaml jenkins/jenkins:lts

I can't trigger the lock each time however, but I get it fairly frequently with this configuration (~2 times out of 20 tries).

jetersen commented 3 years ago

Hopefully resolved with v1.50 :)