jenkinsci / ec2-fleet-plugin

The EC2 Fleet plugin launches EC2 instances as worker nodes for Jenkins CI server, automatically scaling the capacity with the load.
https://plugins.jenkins.io/ec2-fleet/
Apache License 2.0
111 stars 81 forks source link

Unable to set label on node caused by NullPointerException #414

Closed nwilliams-lw closed 1 year ago

nwilliams-lw commented 1 year ago

Issue Details

Describe the bug We are having issues with the fleet plugin not starting instances even when there is demand. We see "No excess workload, provisioning not needed." even though we have a queue of jobs to run. Prior to upgrading to v3.0.1 from 2.7.1 this behaviour was fine.

I am not sure exactly why this is happening yet, so to start off I am trying to clear any warnings/errors from the plugin. This issue is one of the errors we now see from the fleet plugin.

From the attached stack trace it looks like the java code is trying to autobox a null value into an int.

To Reproduce

  1. Upgrade to ec2 fleet plugin 3.0.1 from 2.7.1

Logs Tip: See this guide to configure a logger in Jenkins UI. Please attach fine logs if you think they are relevant.

Oct 30 09:53:42 ip-172-37-3-168 jenkins[248973]: 2023-10-30 09:53:42.055+0000 [id=41]#011WARNING#011c.a.j.ec2fleet.EC2FleetCloud#warning: FleetCloud [dev] Unable to set label on node
Oct 30 09:53:42 ip-172-37-3-168 jenkins[248973]: java.lang.NullPointerException
Oct 30 09:53:42 ip-172-37-3-168 jenkins[248973]: #011at com.amazon.jenkins.ec2fleet.EC2FleetCloud.getInitOnlineCheckIntervalSec(EC2FleetCloud.java:301)
Oct 30 09:53:42 ip-172-37-3-168 jenkins[248973]: #011at com.amazon.jenkins.ec2fleet.EC2FleetCloud.addNewAgent(EC2FleetCloud.java:1001)
Oct 30 09:53:42 ip-172-37-3-168 jenkins[248973]: #011at com.amazon.jenkins.ec2fleet.EC2FleetCloud.access$200(EC2FleetCloud.java:74)
Oct 30 09:53:42 ip-172-37-3-168 jenkins[248973]: #011at com.amazon.jenkins.ec2fleet.EC2FleetCloud$2.run(EC2FleetCloud.java:830)
Oct 30 09:53:42 ip-172-37-3-168 jenkins[248973]: #011at hudson.model.Queue._withLock(Queue.java:1397)
Oct 30 09:53:42 ip-172-37-3-168 jenkins[248973]: #011at hudson.model.Queue.withLock(Queue.java:1271)
Oct 30 09:53:42 ip-172-37-3-168 jenkins[248973]: #011at com.amazon.jenkins.ec2fleet.EC2FleetCloud.updateByState(EC2FleetCloud.java:825)
Oct 30 09:53:42 ip-172-37-3-168 jenkins[248973]: #011at com.amazon.jenkins.ec2fleet.EC2FleetCloud.update(EC2FleetCloud.java:637)
Oct 30 09:53:42 ip-172-37-3-168 jenkins[248973]: #011at com.amazon.jenkins.ec2fleet.CloudNanny.doRun(CloudNanny.java:55)
Oct 30 09:53:42 ip-172-37-3-168 jenkins[248973]: #011at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:92)
Oct 30 09:53:42 ip-172-37-3-168 jenkins[248973]: #011at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:67)
Oct 30 09:53:42 ip-172-37-3-168 jenkins[248973]: #011at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
Oct 30 09:53:42 ip-172-37-3-168 jenkins[248973]: #011at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
Oct 30 09:53:42 ip-172-37-3-168 jenkins[248973]: #011at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
Oct 30 09:53:42 ip-172-37-3-168 jenkins[248973]: #011at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
Oct 30 09:53:42 ip-172-37-3-168 jenkins[248973]: #011at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
Oct 30 09:53:42 ip-172-37-3-168 jenkins[248973]: #011at java.base/java.lang.Thread.run(Thread.java:829)

Possibly related but after this we see messages like:

label [null]: No excess workload, provisioning not needed.

The null here looks suspicious.

Environment Details

Plugin Version? 3.0.1

Jenkins Version? 2.4143

Spot Fleet or ASG? ASG

Label based fleet? Yes

Linux or Windows? Linux

EC2Fleet Configuration as Code n/a

Anything else unique about your setup? No

screenshot of the cloud config vals:

Screenshot 2023-10-30 at 11 29 13
vineeth-bandi commented 1 year ago

It seems this bug is similar to the one observed in #411. We have decided to revert the changes that were part of this previous release as we were unable to reproduce and fully evaluate why these issues were appearing. Issue https://github.com/jenkinsci/ec2-fleet-plugin/issues/417 will be tracking these changes if we decide to reintroduce them. Feel free to move discussion to that issue, or reopen this issue if reverting these changes by updating to the newest release (version 3.0.2) still causes these issues.