Document generation has been failing on the buildfarm for certain pkgs such as launch or rclpy despite running successfully locally.
In all cases, the error arises when the pkg has a module file with the same name as another python3 module. In the examples below, the python interpreter tires import logging which is provided by the dist pkg but the docs_build/my_pkg/ folder has logging.py copied over from src/my_pkg/my_pkg/logging.py.
23:41:26 import logging
23:41:26 File "/tmp/ws/docs_build/rclpy/rclpy/logging.py", line 18, in <module>
23:41:26 from rclpy.impl.implementation_singleton import rclpy_implementation as _rclpy
23:41:26 ModuleNotFoundError: No module named 'rclpy'
or
21:11:15 import logging
21:11:15 File "/tmp/ws/docs_build/launch/launch/logging/__init__.py", line 21, in <module>
21:11:15 import logging.handlers
21:11:15 File "/tmp/ws/docs_build/launch/launch/logging/handlers.py", line 79, in <module>
21:11:15 sys.modules[__name__] = _module_wrapper(sys.modules[__name__])
21:11:15 File "/tmp/ws/docs_build/launch/launch/logging/handlers.py", line 61, in __init__
21:11:15 for module in (logging, logging.handlers):
21:11:15 AttributeError: partially initialized module 'logging' has no attribute 'handlers' (most likely due to a circular import)
The shutil.copytree method invoked here copies the contents of the pkg_src_directory and not the directory itself. So the contents of src/launch/launch/launch/* are copied within docs_build/launch/launch/. Thus when the conf.py here invokes sphinx to generate documentation of launch/logging/handlers.py module, when encountering such imports, the python interpreter will try to import the copied over module leading to circular imports. The combination of this and 2) leads to the error on the buildfarm.
The docs_build/launch/ folder is inserted at index 0 to the system path within conf.py as seen here
Why it cannot be reproduced locally
I'm not too certain about the exact cause but my hunch is that its because of how rosdoc2 gets installed in the doc job. The build_rosdoc2.py script uses --use-deprecated=out-of-tree-build. The rosdoc2 executable gets installed to ~/.local/bin and is explicitly added to PATH envar within the same script here. The envar is then passed to the subprocess call. So maybe somehow the PATH envar contains a different order of paths when running locally vs on the buildfarm? If the path to docs_build/pkg/ is inserted before the path to pip dist modules, that could explain why the local modules are found first?
How this PR fixes the problem
It modifies the source dir path passed to the shutil.copytree function such that the module directory is copied over.
The downside of this approach is that all contents from the pkg's root dir is copied over given the nature of shutil.copytree. To minimize the impact of the copy, i've wrapped the copy in a if condition to only copy if the pkg's build type is ament_python.
(Credits to @nuclearsandwich for the instructions to run the doc job locally)
git clone https://github.com/ros-infrastructure/ros_buildfarm.git
cd ros_buildfarm
python3 -m pip install -e .
# Make a tmp dir for building doc
mkdir /tmp/rclpy-doc
generate_doc_script.py https://raw.githubusercontent.com/ros2/ros_buildfarm_config/ros2/index.yaml rolling default rclpy ubuntu jammy amd64 > /tmp/rclpy-doc/run.sh
# Modify the /tmp/rclpy-doc/run.sh script to clone this branch of rosdoc2. Update the command in "Build step 6"
/usr/bin/python3 -u $WORKSPACE/ros_buildfarm/scripts/wrapper/git.py clone https://github.com/ros-infrastructure/rosdoc2.git -b yadu/fix_out_of_tree_build rosdoc2
# run the script
cd /tmp/rclpy-doc
sh run.sh
# The doc job should succeed. If you revert to main branch of rosdoc2 it will fail with the error mentioned above.
The problem
Document generation has been failing on the buildfarm for certain pkgs such as
launch
orrclpy
despite running successfully locally.In all cases, the error arises when the pkg has a module file with the same name as another python3 module. In the examples below, the python interpreter tires import
logging
which is provided by the dist pkg but thedocs_build/my_pkg/
folder haslogging.py
copied over fromsrc/my_pkg/my_pkg/logging.py
.or
Detailed logs:
This happens because:
shutil.copytree
method invoked here copies the contents of thepkg_src_directory
and not the directory itself. So the contents ofsrc/launch/launch/launch/*
are copied withindocs_build/launch/launch/
. Thus when theconf.py
here invokes sphinx to generate documentation oflaunch/logging/handlers.py
module, when encountering such imports, the python interpreter will try to import the copied over module leading to circular imports. The combination of this and 2) leads to the error on the buildfarm.docs_build/launch/
folder is inserted at index 0 to the system path withinconf.py
as seen hereWhy it cannot be reproduced locally
I'm not too certain about the exact cause but my hunch is that its because of how
rosdoc2
gets installed in the doc job. The build_rosdoc2.py script uses--use-deprecated=out-of-tree-build
. Therosdoc2
executable gets installed to~/.local/bin
and is explicitly added toPATH
envar within the same script here. The envar is then passed to the subprocess call. So maybe somehow thePATH
envar contains a different order of paths when running locally vs on the buildfarm? If the path todocs_build/pkg/
is inserted before the path topip
dist modules, that could explain why the local modules are found first?How this PR fixes the problem
It modifies the source dir path passed to the
shutil.copytree
function such that the module directory is copied over. The downside of this approach is that all contents from the pkg's root dir is copied over given the nature ofshutil.copytree
. To minimize the impact of the copy, i've wrapped the copy in aif
condition to only copy if the pkg's build type isament_python
.How to test
(Credits to @nuclearsandwich for the instructions to run the doc job locally)