ros-infrastructure / ros_buildfarm

ROS buildfarm based on Docker
Apache License 2.0
82 stars 96 forks source link

Don't tell apt which versions of debian packages to install #775

Closed sloretz closed 4 years ago

sloretz commented 4 years ago

It looks like two nightlies failed because the versions of debian packages that are test dependencies are determined at the start of the job, but before they were actually installed they had been updated in the upstream Focal repos. This PR makes it so only the names of the dependencies are given, not the versions.

http://build.ros2.org/view/Fci/job/Fci__nightly-cyclonedds_ubuntu_focal_amd64/58 https://build.ros2.org/view/Fci/job/Fci__nightly-fastrtps_ubuntu_focal_amd64/68/

Reading package lists...
Invoking 'apt-get install -q -y -o Debug::pkgProblemResolver=yes clang-format=1:9.0-49 clang-tidy=1:9.0-49 cppcheck=1.90-4build1 libcppunit-dev=1.15.1-2 libcunit1-dev=2.1-3-dfsg-2build1 libxml2-utils=2.9.10+dfsg-1ubuntu3 pydocstyle=2.1.1-1 pyflakes3=2.1.1-2 python3-babeltrace=1.5.8-1build1 python3-cryptography=2.8-3 python3-flake8=3.7.9-2 python3-ifcfg=0.18-2osrf~focal python3-lxml=4.5.0-1 python3-matplotlib=3.1.2-1ubuntu3 python3-mypy=0.761-1build1 python3-nose=1.3.7-5 python3-psutil=5.5.1-1ubuntu4 python3-pycodestyle=2.5.0-2 python3-pydot=1.4.1-3 python3-pygraphviz=1.5-4build1 python3-pytest-mock=1.10.4-3 python3-rosdistro-modules=0.8.0-1'
Reading package lists...
Building dependency tree...
Reading state information...
E: Version '2.9.10+dfsg-1ubuntu3' for 'libxml2-utils' was not found

https://packages.ubuntu.com/focal/libxml2-utils is now 2.9.10+dfsg-4build1 upstream.

cottsay commented 4 years ago

This will break the Dockerfile cache invalidation. We'll end up with stale packages in the builds, and will need to wait for the daily forced invalidation to see the new packages show up in CI builds.

Including the versions was very intentional.

I think the dev jobs do the same thing, though the version numbers get embedded directly in the Dockerfile instead of a separate install list like the CI jobs.

dirk-thomas commented 4 years ago

As @cottsay already described the current behavior is intentional and the version numbers must not be removed.

sloretz commented 4 years ago

This will break the Dockerfile cache invalidation. We'll end up with stale packages in the builds, and will need to wait for the daily forced invalidation to see the new packages show up in CI builds.

Roger. @cottsay @dirk-thomas Mind having a second look? IIUC the docker cache is being invalidated by the hash of the file given to the COPY command changing, and for that purpose it doesn't matter what's actually passed to apt-get install. This PR now includes version numbers in the file, but strips them out with sed.

dirk-thomas commented 4 years ago

I am not sure what exact case this patch is trying to address. I guess the race condition that the apt repository changes while this job is running (between the determination of the version numbers and the installation of these packages)?

The job basically requires that the state of the apt repo where dependencies are installed from stays stable. If that is not the same the logic which ran at the beginning of the job might be out of the sync with the logic run later in the job. So I am not sure this race is worth fixing since I assume there are still similar race conditions left.

sloretz commented 4 years ago

I am not sure what exact case this patch is trying to address. I guess the race condition that the apt repository changes while this job is running (between the determination of the version numbers and the installation of these packages)?

Yup. In the case of the nightly's, it's a long time between when the versions are determined (the very start of the job) and when the packages are installed (the start of tests being run).

The job basically requires that the state of the apt repo where dependencies are installed from stays stable. If that is not the same the logic which ran at the beginning of the job might be out of the sync with the logic run later in the job.

Why do the test dependencies need to be constant if they're not installed until later?

So I am not sure this race is worth fixing since I assume there are still similar race conditions left.

I suppose it's possible other race conditions exist, but this one caused two nightlies to fail last night.

cottsay commented 4 years ago

FWIW, It looks like this is basically the same approach that was taken for the devel jobs: https://github.com/ros-infrastructure/ros_buildfarm/blob/41d696cded31842dfed1dc90383b1ef9ac2c1959/ros_buildfarm/templates/snippet/install_dependencies.Dockerfile.em#L26-L34

I'm not over-the-moon that the version numbers in the install list won't necessarily reflect the package that gets installed, but I don't have any better suggestions for a short-term mitigation of this issue. At least the behavior will be in-line with the other job types.

I addressed this years ago in my RPM prototype buildfarm by keeping packages in the repository on disk for some amount of time after they were removed from the repository. It worked really well, but I had to implement the invalidation by hand, and I'm not sure that reprepro supports anything like this.

dirk-thomas commented 4 years ago

At least the behavior will be in-line with the other job types.

That sounds like a good reason to do the same. Unfortunately the logic seems to be duplicated between the job types to do the same thing.

I just want to make sure that it is clear that a changing apt state during the execution of a build isn't something the logic will necessarily be able to handle.