ros-infrastructure / ros_buildfarm

ROS buildfarm based on Docker
Apache License 2.0
77 stars 95 forks source link

Avoid jenkinsapi trying to fetch all jobs. #1044

Closed nuclearsandwich closed 2 months ago

nuclearsandwich commented 2 months ago

There's a malinteraction with the jenkinsapi library and Python convention which has been causing extremely large HTTP requests whenever the truthiness of an instance of the Jenkins class is tested.

Jenkins defines a __len__ method which is used to determine truthiness. But this method triggers a fetch of every job on our Jenkins host which can take 2 - 5 minutes to return due to the sheer volume of jobs.

By avoiding this in favor of an explicit check of is False or is None as appropriate we can save taking a really big runtime hit and cut down on spurious failures due to timeouts.

By default the jenkinsapi Jenkins class will trigger an eager loading of all Jenkins jobs via the API. Our Jenkins server buckles under the weight of this and so we do not want to do this until we need to and ideally future changes will replace unfiltered polling completely. In order to prevent an initial poll when we first create the local Jenkins instance we also set the lazy kwarg.

nuclearsandwich commented 2 months ago

I think I either got all of the conditional logic correct or I reversed all of them.

cottsay commented 2 months ago

Is there a reason some comparisons are with None and others with False?

nuclearsandwich commented 2 months ago

Is there a reason some comparisons are with None and others with False?

Yeah, it's based on what the "fallback" value is in the surrounding code. In some cases the jenkins / _cached_jenkins value is unset and so None is what we expect to find when not a Jenkins class instance but in the _job.py in particular there's an inline conditional based on whether the job is generating a groovy script or running on a workstation that determines whether it connects to a remote Jenkins or not and the fallback value used there is False rather than None.

I would be fine refactoring those to use None as well if preferred.

The ROS 1 Release CI failures are a legitimate regression and should be addressed.

Agreed. I will look into them.

nuclearsandwich commented 2 months ago

Well https://github.com/ros-infrastructure/ros_buildfarm/pull/1044/commits/e935e5ea6f87526394a4ec2ea345723562a55b2a now has a different breakage but at least its one I can reproduce locally unlike the last one which I just had to suss out.

nuclearsandwich commented 2 months ago

1045 avoids this same problem but without attempting to clean up the tri-state logic. I'm going to proceed with that one.