Open ruffsl opened 1 month ago
This pull request is in conflict. Could you fix it @ruffsl?
@ruffsl just FYI tried to run it and got:
0.367 E: Unable to locate package ros-rolling-nav2-minimal-tb3-sim 0.367 E: Unable to locate package ros-rolling-nav2-minimal-tb4-sim
@tonynajjar , yeah, looks like we have another un-released dependency back in our underlay.repos file:
@ruffsl new error
2.902 E: Failed to fetch http://packages.ros.org/ros2/ubuntu/pool/main/r/ros-rolling-control-msgs/ros-rolling-control-msgs_5.1.0-1noble.20240429.102647_amd64.deb 404 Not Found [IP: 140.211.166.134 80]
2.902 E: Failed to fetch http://packages.ros.org/ros2/ubuntu/pool/main/r/ros-rolling-hardware-interface/ros-rolling-hardware-interface_4.11.0-1noble.20240514.082551_amd64.deb 404 Not Found [IP: 140.211.166.134 80]
[2024-06-16T16:22:53.607Z] 2.902 E: Failed to fetch http://packages.ros.org/ros2/ubuntu/pool/main/r/ros-rolling-controller-interface/ros-rolling-controller-interface_4.11.0-1noble.20240514.083301_amd64.deb 404 Not Found [IP: 140.211.166.134 80]
2.902 E: Failed to fetch http://packages.ros.org/ros2/ubuntu/pool/main/r/ros-rolling-diff-drive-controller/ros-rolling-diff-drive-controller_4.8.0-1noble.20240514.114350_amd64.deb 404 Not Found [IP: 140.211.166.134 80]
2.902 E: Failed to fetch http://packages.ros.org/ros2/ubuntu/pool/main/r/ros-rolling-gz-common-vendor/ros-rolling-gz-common-vendor_0.1.0-1noble.20240503.181130_amd64.deb 404 Not Found [IP: 140.211.166.134 80]
2.902 E: Failed to fetch http://packages.ros.org/ros2/ubuntu/pool/main/r/ros-rolling-gz-msgs-vendor/ros-rolling-gz-msgs-vendor_0.1.0-1noble.20240503.181547_amd64.deb 404 Not Found [IP: 140.211.166.134 80]
[2024-06-16T16:22:53.607Z] 2.902 E: Failed to fetch http://packages.ros.org/ros2/ubuntu/pool/main/r/ros-rolling-gz-fuel-tools-vendor/ros-rolling-gz-fuel-tools-vendor_0.1.0-1noble.20240503.182511_amd64.deb 404 Not Found [IP: 140.211.166.134 80]
2.902 E: Failed to fetch http://packages.ros.org/ros2/ubuntu/pool/main/r/ros-rolling-gz-rendering-vendor/ros-rolling-gz-rendering-vendor_0.1.0-1noble.20240507.212408_amd64.deb 404 Not Found [IP: 140.211.166.134 80]
2.902 E: Failed to fetch http://packages.ros.org/ros2/ubuntu/pool/main/r/ros-rolling-gz-transport-vendor/ros-rolling-gz-transport-vendor_0.1.0-1noble.20240503.182514_amd64.deb 404 Not Found [IP: 140.211.166.134 80]
2.902 E: Failed to fetch http://packages.ros.org/ros2/ubuntu/pool/main/r/ros-rolling-gz-gui-vendor/ros-rolling-gz-gui-vendor_0.1.0-1noble.20240507.214434_amd64.deb 404 Not Found [IP: 140.211.166.134 80]
[2024-06-16T16:22:53.607Z] 2.902 E: Failed to fetch http://packages.ros.org/ros2/ubuntu/pool/main/r/ros-rolling-sdformat-vendor/ros-rolling-sdformat-vendor_0.1.0-1noble.20240503.181458_amd64.deb 404 Not Found [IP: 140.211.166.134 80]
2.902 E: Failed to fetch http://packages.ros.org/ros2/ubuntu/pool/main/r/ros-rolling-gz-physics-vendor/ros-rolling-gz-physics-vendor_0.1.0-1noble.20240503.182124_amd64.deb 404 Not Found [IP: 140.211.166.134 80]
2.902 E: Failed to fetch http://packages.ros.org/ros2/ubuntu/pool/main/r/ros-rolling-gz-sensors-vendor/ros-rolling-gz-sensors-vendor_0.1.0-1noble.20240507.214434_amd64.deb 404 Not Found [IP: 140.211.166.134 80]
2.902 E: Failed to fetch http://packages.ros.org/ros2/ubuntu/pool/main/r/ros-rolling-gz-sim-vendor/ros-rolling-gz-sim-vendor_0.1.0-1noble.20240507.215704_amd64.deb 404 Not Found [IP: 140.211.166.134 80]
[2024-06-16T16:22:53.607Z] 2.902 E: Failed to fetch http://packages.ros.org/ros2/ubuntu/pool/main/r/ros-rolling-joint-state-broadcaster/ros-rolling-joint-state-broadcaster_4.8.0-1noble.20240514.114403_amd64.deb 404 Not Found [IP: 140.211.166.134 80]
2.902 E: Failed to fetch http://packages.ros.org/ros2/ubuntu/pool/main/r/ros-rolling-ros-gz-bridge/ros-rolling-ros-gz-bridge_1.0.0-1noble.20240507.145005_amd64.deb 404 Not Found [IP: 140.211.166.134 80]
2.902 E: Failed to fetch http://packages.ros.org/ros2/ubuntu/pool/main/r/ros-rolling-ros-gz-image/ros-rolling-ros-gz-image_1.0.0-1noble.20240507.151109_amd64.deb 404 Not Found [IP: 140.211.166.134 80]
2.902 E: Failed to fetch http://packages.ros.org/ros2/ubuntu/pool/main/r/ros-rolling-ros-gz-sim/ros-rolling-ros-gz-sim_1.0.0-1noble.20240507.225051_amd64.deb 404 Not Found [IP: 140.211.166.134 80]
@tonynajjar , are you partially re-build the image from a prior cache? At present, the Dockerfile only apt updates once for the entire build.
This speeds up all the apt install steps, allows for later layers to be rebuilt offline if the local apt cache has already downloaded the debians, and ensures that all packages installed across the layers are originating from the same sync. But if there are debians versions you haven't downloaded locally, and not longer exist on the apt repo, then it's probably best rebuild the apt-update layer so all the following layers are on the same sync.
If the ros repos receive a new sync, then the apt list that was baked in the earlier layers can become stale, pointing to package version that the ros repos have since purged, as besides the ros snapshot repos, older packages are not yet archived.
So, we could either:
--no-cache
to ensure all packages are install form the same syncWhile I see there are snapshots for ROS 2 Jazzy, there doesn't seem to be any for Rolling:
We could also pin the rolling image by image ID/sha to automate cache busting via dependabot, though that needs some more work to complete the upstream docker build automation:
I think I may just go with the ENV ROS_SYNC_DATE=
approach in the meantime for the local Dockerfile.
I see, yes building without cache fixes it. On to the next error, basically all the nav2 packages are failing to build in the updateContentCommand because of this:
[2024-06-16T17:19:24.311Z] Failed <<< nav2_velocity_smoother [0.00s, exited with code 1]
[2024-06-16T17:19:24.311Z] ]0;colcon cache [12/39 done] [1 ongoing]]0;colcon cache [13/39 done] [0 ongoing]Starting >>> nav2_costmap_2d
[2024-06-16T17:19:24.312Z] --- stderr: nav2_costmap_2d
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/colcon_core/executor/__init__.py", line 91, in __call__
rc = await self.task(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/colcon_core/task/__init__.py", line 93, in __call__
return await task_method(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/colcon_cache/task/lock/dirhash.py", line 179, in lock
assert lockfile.lock_type == ENTRY_TYPE
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
---
It seems to be because of colcon cache lock
Looks like you may be trying to combine two different colcon cache lock files, either derived from get revision control hashes, or der hash that hashes the files directly. You could try deleting all the colcon cache lock files in the colcon build base path, or just delete the workspace volume, or rename it to make a new one, from the dev container config json.
On Sun, Jun 16, 2024, 12:24 PM Tony Najjar @.***> wrote:
I see, yes building without cache fixes it. On to the next error, basically all the nav2 packages are failing to build because of this:
[2024-06-16T17:19:24.311Z] Failed <<< nav2_velocity_smoother [0.00s, exited with code 1] [2024-06-16T17:19:24.311Z] �]0;colcon cache [12/39 done] [1 ongoing]��]0;colcon cache [13/39 done] [0 ongoing]�Starting >>> nav2_costmap_2d [2024-06-16T17:19:24.312Z] --- stderr: nav2_costmap_2d Traceback (most recent call last): File "/usr/lib/python3/dist-packages/colcon_core/executor/init.py", line 91, in call rc = await self.task(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/colcon_core/task/init.py", line 93, in call return await task_method(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/colcon_cache/task/lock/dirhash.py", line 179, in lock assert lockfile.lock_type == ENTRY_TYPE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError
am i doing something wrong?
— Reply to this email directly, view it on GitHub https://github.com/ros-navigation/navigation2/pull/4392#issuecomment-2171777898, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARP6RMOSDWDLLAEV6GYIY3ZHXC4PAVCNFSM6AAAAABIWYVBKGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZRG43TOOBZHA . You are receiving this because you were mentioned.Message ID: @.***>
rebuild dev container with --no-cache to ensure all packages are install form the same sync
With this you mean "Rebuild Container Without Cache"? it still seems to build with cache. Maybe because you're using image
instead of Dockerfile
in devcontainer.json
Regarding the colcon cache, I cleaned out a bunch of things and it works now. I'll keep an eye out if it reproduces as part of a "normal workflow".
Can we somehow have the option to not rebuild the packages to save time since the image is build quite often? For me that's a big plus. I guess commenting out the updateContentCommand from the devcontainer would do it? I even think this should be the default. What do you think?
bash: /usr/share/colcon_argcomplete/hook/colcon-argcomplete.bash: No such file or directory
FYI
bash: /usr/share/colcon_argcomplete/hook/colcon-argcomplete.bash: No such file or directory
Yeah, I filed a ticket for that earlier this week. Looks like it may be an upstream packaging issue for jazzy on noble:
Could you confirm by commenting on that ticket using the example?
I cleaned out a bunch of things and it works now. I'll keep an eye out if it reproduces as part of a "normal workflow".
My guess is that you tried re-using a colcon workspace built using the prior dev container setup. Prior, colcon cache was allowed to use whatever TaskExtensionPoint
it preferred, with the GitLockTask
given preference over DirhashLockTask
, as re-using git to check the source state for package directories is faster, and allows ignoring files on a per repo bases via their own .gitignore
config. However, it not as invariant, given that commit sha's can be different, even if the HEAD states are the same at a file system level. E.g. a change commit to a package followed by a revert commit for that change.
Internally, I've been using colcon-cache with projects that use git sub-modules, that more often encounter cases as mentioned. So to only use the dirhash approach, I've blocklisted the git task, so that changes are only tracked on a per file level, rather than a revision control history level. Not sure if it's still warranted here though, so I may revert this.
In any case, when you mix & match lockfile types from different tasks, colcon-cache raises an error for such inconsistencies.
Can we somehow have the option to not rebuild the packages to save time since the image is build quite often? For me that's a big plus.
You can build up to any stage in the Dockerfile by passing it as the target name for the bake command. All the stages in the Dockerfile currently have respective bake targets in the bake file. E.g: building only up to the tooler
stage will not invoke the build directives that then commence the colcon build
commands for the builder
stage:
docker buildx bake tooler
I guess commenting out the updateContentCommand from the devcontainer would do it?
If by "rebuilding", you mean rebuilding the dev container (rather than merely the docker image), then yes, you could also just comment out the updateContentCommand
life cycle script, or do what I do and just comment out the final colcon build
line in that script.
Then, the container on startup still prints out ENVs describing the state of validity between the cached colcon workspace vs the current source checkout of nav2, which I find useful as a reminder indicating what I need to rebuild because of what has changed since I last rebuilt the workspace inside the named volume currently mounted.
I even think this should be the default. What do you think?
I had it build the workspace by default to onboard novice students with as few steps on their part as possible. All they need to do is start the dev container rebuild and walk away for some coffee, while the script will attempt to cache what it can. Very helpful when someone is just starting out and simply wants to see nav2 in action via a gazebo simulation to know what is possible.
You and I or other experience maintainers can just manually edit the life cycle script to fit our personal preferences and dev container behaviors, then use something like git worktrees to keep track of and checkout our own customizations to just the .devcontainer
folder:
With this you mean "Rebuild Container Without Cache"? it still seems to build with cache. Maybe because you're using image instead of Dockerfile in devcontainer.json
Well, instead of specifying the Dockerfile in the dev container config, a static docker tag is used to specify what docker image is to be used as the bases for running the resulting dev container from. This static docker tag is built and tagged by the initializeCommand
life cycle script, that in tern calls the docker bake command.
This is primarily because most of the advance buildkit features are made more ergonomic to configure via bake files, however the dev container spec does not yet nativity support such bake files, so the initializeCommand
provides a suitable workaround, while also being much more customizable. E.g. we could add custom logic on how to rebuild the image under different conditions.
What exact stages we want to cache or bust can of course be further controlled via the bake file itself:
[planner_server-10] [WARN] [1720332420.098450618] [planner_server]: GridBased plugin failed to plan from (-2.00, -0.50) to (100.00, 100.00): "Goal Coordinates of(100.000000, 100.000000) was outside bounds"
@SteveMacenski , Is there some kind of floating point precision issue with the bounds check here? Just trying to get the new CI to roll over completely.
TBD