ros-infrastructure / buildfarm_deployment

Apache License 2.0
30 stars 39 forks source link

reprepro removing dependencies is going much slower #165

Closed tfoote closed 6 years ago

tfoote commented 6 years ago

reprepro removefilter is much slower now.


From another thread

It looks like the new version of reprepro underneath import_upstream is doing a few more things I pulled the console outputs and diffed with some notes here: https://gist.github.com/tfoote/63ef7af733c1c25be5d674369ef0220a

image

Also the entry point into the subprocess call seems to go slower and the calculating packages to get is slower for each instance.


As well as it looks like import_package goes much slower when invalidating packages. It takes approximately 1 second per instance of removefilter Invalidating an rqt_bag run took 29 seconds and invoked removefilter 24 times

rqt_bag_import_time

image

Invalidating cpp_common took 25 minutes and ran removefilter 1361 times.

cpp_common_import_time

image

A typical near leaf package takes 2-3 seconds.

Looking at the machine while this job is running. The load average is ~1 but there's minimal CPU, Memory or Disk IO going on. I'm not sure what's blocking the process. This used to go much faster.

Here are a few metrics.

One outlier is that NewRelic is reporting that we're using 20GB of ram??

repo_disk_usage repo_metrics repositories_20g

tfoote commented 6 years ago

Looking at the hosting it has 1800/3000 IOPS available. We could get a little bit more throughput, but from the stats it's not looking like IO is the limiting factor.

nuclearsandwich commented 6 years ago

Yesterday I took a look at the debian package built by https://github.com/ros-infrastructure/buildfarm_deployment/blob/master/dependencies/deb/build-reprepro.sh and the binary deb from zesty that @mikaelarguedas had deployed on the Trusty farm. I couldn't identify much in the way of substantive differences, certainly no source differences, which is to be expected since the debian package built by our script comes from the source package associated with the zesty binary package.

We could decide to deploy the zesty binary deb directly and see if that affects the reprepro behavior and startup times.

Beyond that the other mitigation discussed with @tfoote is to consolidate the removefilter invocations rather than running it once per package.

nuclearsandwich commented 6 years ago

Changes meant to address this issue (https://github.com/ros-infrastructure/reprepro-updater/pull/56) are deployed and working smoothly. Since we haven't rebuilt from a low-level package yet we don't know for certain that this is resolved or have exact timing information from the production farm.

nuclearsandwich commented 6 years ago

Got a relatively large sample from production. http://build.ros.org/job/Krel_import-package/258995/ (that link will 404 pretty quickly as the import package jobs fall off the cliff but output is archived here: https://gist.github.com/nuclearsandwich/a8f5be14bbc203fd38e8b80cf638ee94

Total build time is 3.8 seconds to invalidate 515 packages which is just 1.5 seconds above our new mode of 2.3seconds. Again, we no longer have concrete data for how long comparable jobs used to take but as long as the performance holds, and I have no reason to expect it to degrade, this should prevent the severe bogging down and timeouts that we were seeing when low-level packages were importing.

Invalidating cpp_common took 25 minutes and ran removefilter 1361 times.

For the different architectures of dynamic_reconfigure, which invalidated 515 packages runtimes varied between 3.2 and 3.8 seconds. Which is a marked improvement on above.