jenkinsci / jms-messaging-plugin

https://plugins.jenkins.io/jms-messaging
12 stars 37 forks source link

Jobs not subscribing after unsubscribing from time to time #227

Open ukulekek opened 2 years ago

ukulekek commented 2 years ago

Hello, our Jenkins jobs are being unsubscribed without subscribing back from time to time. I have found two different reasons for not subscribing back in logs. We are using the latest 1.1.24 version of the plugin.

1:

2021-11-04 18:12:26.639+0000 [id=1256]  INFO    c.r.j.p.c.m.ActiveMqMessagingWorker#unsubscribe: Unsubscribing job 'ci-brew-docker-dispatcher' from the 'Consumer.rh-jenkins-ci-plugin.60aa7a6
b-df3b-5d9b-b08d-198beeb4db36.VirtualTopic.eng.brew.build.complete.>' topic.
2021-11-04 18:12:26.727+0000 [id=1256]  SEVERE  h.i.i.InstallUncaughtExceptionHandler$DefaultUncaughtExceptionHandler#uncaughtException: A thread (CIBuildTrigger-ci-brew-docker-dispatcher-1-
ActiveMqMessagingProvider/1256) died unexpectedly due to an uncaught exception, this may leave your Jenkins in a bad way and is usually indicative of a bug in the code.
java.lang.StackOverflowError
        at org.acegisecurity.context.SecurityContext$1.getAuthentication(SecurityContext.java:46)
        at org.acegisecurity.context.SecurityContext$2.getAuthentication(SecurityContext.java:60)
        at org.acegisecurity.context.SecurityContext$1.getAuthentication(SecurityContext.java:46)
        at org.acegisecurity.context.SecurityContext$2.getAuthentication(SecurityContext.java:60)
        at org.acegisecurity.context.SecurityContext$1.getAuthentication(SecurityContext.java:46)
...

No messages from the job appeared after that. We resolved it by disabling and enabling the job back.

2:

2021-11-10 00:53:32.818+0000 [id=33]    INFO    c.r.j.plugins.ci.CIBuildTrigger#stopTriggerThreads: Removed thread lock for 'ci-dispatch-github_pull_request_opened'.
2021-11-10 00:53:32.818+0000 [id=33]    INFO    c.r.j.plugins.ci.CIBuildTrigger#startTriggerThreads: Starting thread (83) for 'ci-dispatch-github_pull_request_opened'.
2021-11-10 00:53:32.830+0000 [id=33]    INFO    jenkins.model.Jenkins$5#runTask: Took 514ms for Loading item ci-dispatch-github_pull_request_opened by pool-6-thread-6
2021-11-10 00:53:33.020+0000 [id=83]    INFO    c.r.j.p.c.m.FedMsgMessagingWorker#subscribe: Subscribing job 'ci-dispatch-github_pull_request_opened' to org.fedoraproject.prod.github.pull_request.opened topic.
2021-11-10 00:53:33.021+0000 [id=83]    INFO    c.r.j.p.c.m.FedMsgMessagingWorker#subscribe: Successfully subscribed job 'ci-dispatch-github_pull_request_opened' to topic 'org.fedoraproject.prod.github.pull_request.opened'.
2021-11-10 00:53:36.225+0000 [id=28]    WARNING    j.a.s.AdvancedQueueSorterQueueListener#onEnterBuildable: onEnterBuilding() called without prior call to onEnterWaiting() for 'ci-dispatch-github_pull_request_opened'
2021-11-10 00:53:36.226+0000 [id=28]    WARNING    j.a.s.AdvancedQueueSorterQueueListener#onEnterBuildable: onEnterBuilding() called without prior call to onEnterWaiting() for 'ci-dispatch-github_pull_request_opened'
2021-11-10 01:53:33.124+0000 [id=83]    INFO    c.r.j.p.c.m.FedMsgMessagingWorker#receive: No message received for the past 3600000 ms, re-subscribing for job 'ci-dispatch-github_pull_request_opened'.
2021-11-10 01:53:33.131+0000 [id=83]    INFO    c.r.j.p.c.m.FedMsgMessagingWorker#unsubscribe: Un-subscribing job 'ci-dispatch-github_pull_request_opened' from org.fedoraproject.prod.github.pull_request.opened topic.
2021-11-10 01:53:33.162+0000 [id=83]    INFO    c.r.j.p.c.m.FedMsgMessagingWorker#subscribe: Subscribing job 'ci-dispatch-github_pull_request_opened' to org.fedoraproject.prod.github.pull_request.opened topic.
2021-11-10 01:53:33.166+0000 [id=83]    INFO    c.r.j.p.c.m.FedMsgMessagingWorker#subscribe: Successfully subscribed job 'ci-dispatch-github_pull_request_opened' to topic 'org.fedoraproject.prod.github.pull_request.opened'.
2021-11-10 02:06:03.269+0000 [id=7312]    INFO    c.r.j.plugins.ci.CIBuildTrigger#stopTriggerThreads: Stopping thread (83) for 'ci-dispatch-github_pull_request_opened'.
2021-11-10 02:06:03.269+0000 [id=7312]    INFO    c.r.j.p.c.m.FedMsgMessagingWorker#prepareForInterrupt: Un-subscribing job 'ci-dispatch-github_pull_request_opened' from org.fedoraproject.prod.github.pull_request.opened topic.
2021-11-10 02:06:03.270+0000 [id=83]    INFO    c.r.j.p.c.m.FedMsgMessagingWorker#receive: We have been interrupted...
2021-11-10 02:06:03.270+0000 [id=83]    INFO    c.r.j.p.c.m.FedMsgMessagingWorker#unsubscribe: We are being interrupted. Skipping unsubscribe...

Again, no messages appeared after that. With this one, even if it looks that unsubscribing was skipped, however, it wasn't as the job hasn't been triggered after that. In this case, disable/enable didn't work, there was no trigger thread running. Only Jenkins restart did help.

Version report

Jenkins and plugins versions report:

Jenkins: 2.289.3
OS: Linux - 3.10.0-1062.4.3.el7.x86_64
---
ColumnPack-plugin:1.0.3
ColumnsPlugin:1.0
PrioritySorter:4.0.0
ace-editor:1.1
ansicolor:1.0.0
ant:1.11
antisamy-markup-formatter:2.1
apache-httpcomponents-client-4-api:4.5.13-1.0
authentication-tokens:1.4
badge:1.8
bootstrap4-api:4.6.0-3
bootstrap5-api:5.1.1-1
bouncycastle-api:2.25
branch-api:2.7.0
build-failure-analyzer:2.1.0
build-name-setter:2.2.0
build-pipeline-plugin:1.5.8
build-timeout:1.20
build-view-column:0.3
build-with-parameters:1.5.1
buildresult-trigger:0.17
built-on-column:1.1
caffeine-api:2.9.2-29.v717aac953ff3
checks-api:1.7.2
cloudbees-disk-usage-simple:0.10
cloudbees-folder:6.16
command-launcher:1.6
compact-columns:1.13
conditional-buildstep:1.4.1
configuration-as-code:1.53
configure-job-column-plugin:1.0
console-column-plugin:1.5
console-tail:1.1
copy-to-slave:1.4.4
copyartifact:1.46.2
credentials:2.6.1
credentials-binding:1.27
cron_column:1.4
cvs:2.19
dashboard-view:2.17
depgraph-view:1.0.5
description-column-plugin:1.3
description-setter:1.10
disk-usage:0.28
display-url-api:2.3.5
docker-commons:1.17
docker-workflow:1.26
dtkit-api:3.0.0
dumpling:2.2
durable-task:1.39
dynamicparameter:0.2.0
echarts-api:5.1.2-11
email-ext:2.83
email-ext-recipients-column:1.0
envinject:2.4.0
envinject-api:1.7
extended-read-permission:3.2
external-monitor-job:1.7
extra-columns:1.24
ez-templates:1.3.4
flexible-publish:0.16.1
font-awesome-api:5.15.4-1
gerrit-trigger:2.35.1
git:4.8.2
git-client:3.9.0
git-server:1.10
github:1.34.1
github-api:1.133
greenballs:1.15.1
groovy:2.4
groovy-postbuild:2.5
handlebars:3.0.8
icon-shim:3.0.0
instant-messaging:1.46
ircbot:2.40
jackson2-api:2.12.4
javadoc:1.6
jdk-tool:1.5
jenkins-multijob-plugin:1.36
jms-messaging:1.1.24
jobConfigHistory:2.28.1
jobtype-column:1.3
jquery:1.12.4-1
jquery-detached:1.2.1
jquery-ui:1.0.2
jquery3-api:3.6.0-2
jsch:0.1.55.2
junit:1.53
kerberos-sso:1.7
ldap:2.7
locale:1.4
lockable-resources:2.11
mailer:1.34
mapdb-api:1.0.9.0
mask-passwords:3.0
material-theme:0.4.1
matrix-auth:2.6.8
matrix-project:1.19
maven-plugin:3.13
metrics:4.0.2.8
momentjs:1.1.1
monitoring:1.88.0
multiple-scms:0.8
nested-view:1.21
nodelabelparameter:1.9.2
nodenamecolumn:1.2
okhttp-api:3.14.9
ownership:0.13.0
pam-auth:1.6
parameterized-trigger:2.41
pipeline-build-step:2.15
pipeline-graph-analysis:1.11
pipeline-input-step:2.12
pipeline-milestone-step:1.3.2
pipeline-model-api:1.9.2
pipeline-model-declarative-agent:1.1.1
pipeline-model-definition:1.9.2
pipeline-model-extensions:1.9.2
pipeline-rest-api:2.19
pipeline-stage-step:2.5
pipeline-stage-tags-metadata:1.9.2
pipeline-stage-view:2.19
pipeline-utility-steps:2.10.0
plain-credentials:1.7
plugin-util-api:2.4.0
popper-api:1.16.1-2
popper2-api:2.10.1-1
postbuildscript:3.0.0
progress-bar-column-plugin:1.0
project-build-times:1.2.1
prometheus:2.0.10
python:1.3
rebuild:1.32
redhat-ci-plugin:1.5.11
resource-disposer:0.16
role-strategy:3.2.0
run-condition:1.5
saferestart:0.3
schedule-build:0.5.1
scm-api:2.6.5
script-security:1.78
scriptler:3.3
scripttrigger:0.34
sectioned-view:1.25
shiningpanda:0.24
skip-certificate-check:1.0
snakeyaml-api:1.29.1
sonar:2.13.1
sonargraph-plugin:1.6.4
ssh-agent:1.23
ssh-credentials:1.19
ssh-slaves:1.33.0
sshd:3.1.0
structs:1.23
subversion:2.14.5
support-core:2.76
swarm:3.28
test-stability:2.3
theme-manager:0.6
thinBackup:1.10
throttle-concurrents:2.4
timestamper:1.13
tmpcleaner:1.3
token-macro:266.v44a80cf277fd
toolenv:1.2
translation:1.16
trilead-api:1.0.13
update-sites-manager:2.0.0
variant:1.4
windows-slaves:1.8
workflow-aggregator:2.6
workflow-api:2.46
workflow-basic-steps:2.24
workflow-cps:2.94
workflow-cps-global-lib:2.21
workflow-durable-task-step:2.40
workflow-job:2.41
workflow-multibranch:2.26
workflow-scm-step:2.13
workflow-step-api:2.24
workflow-support:3.8
ws-cleanup:0.39
xunit:3.0.3
xvnc:1.25
zmq-event-publisher:0.0.5
RHEL-7.7

Reproduction steps

I'm unable to reproduce the issue on purpose.

Results

Expected result:

The job is subscribed back.

Actual result:

The job keeps being unsubscribed.

olivergondza commented 2 years ago

@ukulekek, thanks for the report. Please attach full (or at least relevant part of the) StackOverflowError stacktrace - we need to know how did it manage to get into that infinite recursion.

ukulekek commented 2 years ago

@olivergondza the full StackOverflowError consists of the same two lines repeating 1024 times, starting and ending with:

2021-11-04 18:12:26.727+0000 [id=1256]  SEVERE  h.i.i.InstallUncaughtExceptionHandler$DefaultUncaughtExceptionHandler#uncaughtException: A thread (CIBuildTrigger-ci-brew-docker-dispatcher-1-ActiveMqMessagingProvider/1256) died unexpectedly due to an uncaught exception, this may leave your Jenkins in a bad way and is usually indicative of a bug in the code.
java.lang.StackOverflowError
        at org.acegisecurity.context.SecurityContext$1.getAuthentication(SecurityContext.java:46)
        at org.acegisecurity.context.SecurityContext$2.getAuthentication(SecurityContext.java:60)
        at org.acegisecurity.context.SecurityContext$1.getAuthentication(SecurityContext.java:46)
        at org.acegisecurity.context.SecurityContext$2.getAuthentication(SecurityContext.java:60)
...
        at org.acegisecurity.context.SecurityContext$1.getAuthentication(SecurityContext.java:46)
        at org.acegisecurity.context.SecurityContext$2.getAuthentication(SecurityContext.java:60)
        at org.acegisecurity.context.SecurityContext$1.getAuthentication(SecurityContext.java:46)
        at org.acegisecurity.context.SecurityContext$2.getAuthentication(SecurityContext.java:60)
olivergondza commented 2 years ago

Unfortunately, that does not help us much in finding out the cause of it. No idea why is the stacktrace shortened.

However, I found https://issues.jenkins.io/browse/JENKINS-65590 describing the same symptoms addressed by update in the API they use. It is something we eventually must do here as well, so it might be worth prioritizing it.

olivergondza commented 2 years ago

Filing API update: https://github.com/jenkinsci/jms-messaging-plugin/pull/228

It is strange we did not see this on 2.289.2 or 3.303.3. Is there any particular way to reproduce this?