Speed up topological_order by using a per-package cache of all run_depends

seeraven commented 4 years ago

When using a large set of packages (e.g., more than 1000), catkin spends a lot of time calculating all the dependencies of all found packages. In my case, it was about 26 seconds. By caching the dependencies of the run_depends, it is now down to 4 seconds.

seeraven commented 4 years ago

Please also include reproducible steps how you tested / benchmarked the change.

I've created the repository https://github.com/seeraven/catkin_pkg_benchmark that contains an anonymous version of my workspace and simulates the steps taken by catkin. Actually catkin calls the topological_order_packages function multiple times, but the benefit of caching can be already seen when calling the function only once. On my workstation, the timings are:

Original catkin_pkg:
Run 1: Found 1460 packages in 0.4 s
Run 1: Topological order took 7.5 s
Run 2: Found 1460 packages in 0.4 s
Run 2: Topological order took 7.4 s
--------------------------------------------------------------

Modified catkin_pkg:
Run 1: Found 1460 packages in 0.3 s
Run 1: Topological order took 0.9 s
Run 2: Found 1460 packages in 0.3 s
Run 2: Topological order took 0.9 s
--------------------------------------------------------------

dirk-thomas commented 4 years ago

Thank you for the detailed benchmark - the improvements for larger workspace are great!

I also double checked that the actually returned list of ordered packages is the same before and after the patch - looking good.

ros-infrastructure / catkin_pkg

Speed up topological_order by using a per-package cache of all run_depends #280