ros-infrastructure / rosdoc2

Command-line tool for generating documentation for ROS 2 packages.
Apache License 2.0
29 stars 9 forks source link

Generated content uses a lot of disk space for a small subset of packages. #56

Closed tfoote closed 3 months ago

tfoote commented 1 year ago

We got an alert on disk space from our hosts. And a quick look at our documentation usage showed most packages well under 100MB. But there were quite a few with generated content much bigger.

Here's the top offenders in rolling

99M smacc2
100M rmf_traffic
102M rosidl_runtime_c
105M osrf_testing_tools_cpp
106M sm_dance_bot_warehouse_2
132M nav2z_client
133M eigenpy
135M sm_advanced_recovery_1
145M rviz_default_plugins
160M rmw
167M rcl
169M rmf_utils
319M hpp-fcl
869M rclcpp
1.2G sm_multi_stage_1
1.3G sm_pack_ml
1.8G proxsuite
2.0G vitis_common
7.1G fastrtps
22G total

It would be good to understand why these are blowing up and keep that from happening.

mikeferguson commented 5 months ago

Out of curiosity - I did some poking around here in rviz_default_plugins:

11M ./.doctrees/generated
25M ./.doctrees
3.2M    ./_static/css/fonts
3.4M    ./_static/css
44K ./_static/collapsible-lists/css
12K ./_static/collapsible-lists/js
64K ./_static/collapsible-lists
28K ./_static/js
3.6M    ./_static
2.0M    ./_sources/generated
2.0M    ./_sources
776K    ./generated/doxygen/html/search
12M ./generated/doxygen/html
6.2M    ./generated/doxygen/xml
18M ./generated/doxygen
292M    ./generated
326M    .

In the generated folder, we have 401 generated files, each of which is at least 692KB in size. That 692KB is the navigation menu on the side of the page - it's nearly the same for every one of those 401 files, other than specifying which portions are default open/closed.

tfoote commented 3 months ago

I can confirm that this appears to be the problem. There now appears to be 1.3MB of content in the treetoc for every page in rclcpp and most pages have a few dozen other lines for 890 generated files now for rolling.

Looking inside one of them appears to have some 5000 toctree entries spanning 9000 lines of content. With the few dozen other elements of content. It looks like this is coming from the exhale Clickable Higherarchies: https://exhale.readthedocs.io/en/latest/reference/configs.html#clickable-hierarchies

This is one of the core features/values from the system. But finding a way to include the content instead of duplicating it would be very valuable. There may be some options to explore related to this. Hopefully someone else has run into this issue before us.

grep -rI 'class="toctree' docs_output/rclcpp/ | wc -l
4921466
$ grep -rI 'class="toctree' docs_output/sensor_msgs/  | wc -l
66706
rkent commented 3 months ago

The sphinx-rtd-theme page has the following note:

Setting collapse_navigation to False and using a high value for navigation_depth on projects with many files and a deep file structure can cause long compilation times and can result in HTML files that are significantly larger in file size.

That matches our issue to some extent. I'd like to try adjusting those values to see what the effect is.