hmakelin / gisnav

Estimates airborne drone global position by matching video to map retrieved from onboard GIS server.
https://hmakelin.github.io/gisnav
MIT License
49 stars 21 forks source link

Failed to Install GISNav, Multiple packages found #92

Closed JiatengSun closed 6 months ago

JiatengSun commented 11 months ago

~/ws_sensor_combined/src$ rosdep install --from-paths . -y --ignore-src

ERROR: Rosdep experienced an error: Multiple packages found with the same name "gisnav_msgs":

rosdep version: 0.22.2

Traceback (most recent call last): File "/usr/lib/python3/dist-packages/rosdep2/main.py", line 146, in rosdep_main exit_code = _rosdep_main(args) File "/usr/lib/python3/dist-packages/rosdep2/main.py", line 446, in _rosdep_main return _package_args_handler(command, parser, options, args) File "/usr/lib/python3/dist-packages/rosdep2/main.py", line 501, in _package_args_handler pkgs = find_catkin_packages_in(path, options.verbose) File "/usr/lib/python3/dist-packages/rosdep2/catkin_packages.py", line 35, in find_catkin_packages_in packages = find_packages(path) File "/usr/lib/python3/dist-packages/catkin_pkg/packages.py", line 103, in find_packages raise RuntimeError('\n'.join(duplicates)) RuntimeError: Multiple packages found with the same name "gisnav_msgs":

hmakelin commented 11 months ago

Thank you for reporting the issue - I will probably have some time to look at this one next week. I will try to get back to you then.

My initial hunch would be that you perhaps have an older copy of gisnav_msgs (~/ws_sensor_combined/src/gisnav_msgs) in your colcon workspace? The message definitions used to be in a separate repo but were moved to the main repo a while ago. It could be that the GitHub pages instructions that were published for the old v0.64 tag of gisnav still instruct downloading the message definitions into src/gisnav_msgs which would make it incompatible with the gisnav repo master branch which puts them at src/gisnav/gisnav_msgs - I would have to check if that is the case.

If you want to build the newer message definitions you could try removing ~/ws_sensor_combined/src/gisnav_msgs from the workspace, colcon would then likely find them at ~/ws_sensor_combined/src/gisnav/gisnav_msgs.

JiatengSun commented 11 months ago

Thank you for reporting the issue - I will probably have some time to look at this one next week. I will try to get back to you then.

My initial hunch would be that you perhaps have an older copy of gisnav_msgs (~/ws_sensor_combined/src/gisnav_msgs) in your colcon workspace? The message definitions used to be in a separate repo but were moved to the main repo a while ago. It could be that the GitHub pages instructions that were published for the old v0.64 tag of gisnav still instruct downloading the message definitions into src/gisnav_msgs which would make it incompatible with the gisnav repo master branch which puts them at src/gisnav/gisnav_msgs - I would have to check if that is the case.

If you want to build the newer message definitions you could try removing ~/ws_sensor_combined/src/gisnav_msgs from the workspace, colcon would then likely find them at ~/ws_sensor_combined/src/gisnav/gisnav_msgs.

I figured the same thing, I tried to delete src/gisnav/gisnav_msgs and src/gisnav_msgs, I also tried to replace src/gisnav/gisnav_msgs with src/gisnav_msgs. All three methods give the same error: ERROR: the following packages/stacks could not have their rosdep keys resolved to system dependencies: gisnav_msgs: Cannot locate rosdep definition for [geographic_msgs] gisnav: Cannot locate rosdep definition for [tf_transformations]

hmakelin commented 11 months ago

I tried this in a fresh ROS 2 Humble container (docker run -it ros:humble bash) - seems like on a fresh system it is sufficient to do the following (no sudo since we are already root in the container):

apt-get update
mkdir -p colcon_ws/src && cd $_
git clone https://github.com/hmakelin/gisnav.git
cd gisnav/gisnav
rosdep update
rosdep install --from-paths . -y -r --ignore-src

Todo:

Notes:

JiatengSun commented 10 months ago

I tried this in a fresh ROS 2 Humble container (docker run -it ros:humble bash) - seems like on a fresh system it is sufficient to do the following (no sudo since we are already root in the container):

apt-get update
mkdir -p colcon_ws/src && cd $_
git clone https://github.com/hmakelin/gisnav.git
cd gisnav/gisnav
rosdep update
rosdep install --from-paths . -y -r --ignore-src

Todo:

* [ ]  The current instructions for a local installation are missing the `-r` option for `rosdep install` which will make it continue installing the other dependencies even when it finds out `gisnav_msgs` is not in the rosdep index (it will get built from the local workspace sources by `colcon build` later). The option is only missing from the instructions but not from `docker/gisnav/Dockerfile`.

Notes:

* If you are using ROS 2 Foxy you might also need to add the `--include-eol-distros` option to your `rosdep` commands since Foxy recently reached its end of life.

* Some ROS tutorials I have found also do an `apt dist-upgrade` and it's also done in the `docker/gisnav/Dockerfile` but it might not be required here

Interesting, I had always been trying to set up GISNAV on Ubuntu 20.04 (as suggested in the Documentation). I saw the choice between Foxy and Humble on the Documentation page, and always assumed if im on 20.04, I should be working with Foxy. So just to be clear, should I be setting up a Humble environment on a Ubuntu 20.04 machine?

Edit: I just tried to install Humble on 22.04, I think I have to build from source? Or should I switch to Ubuntu 22.04? I also tried your way of setting up GisNav which worked but when I try to keep following the documentation to Download the LoFTR submodule and weights, its giving me: error: pathspec 'LoFTR' did not match any file(s) known to git

hmakelin commented 10 months ago

Some more context for the Foxy/Humble situation:

The current GitHub pages guide (gisnav.org) for the latest release 0.64 recommends Ubuntu 20.04 and ROS 2 Foxy in line with PX4 recommendations from one year ago when the release was made - that was also my development system back then.

Since then Foxy has reached end of life, but we still need Foxy for the PX4 "Gazebo Classic" SITL simulation. This is because PX4's Gazebo Classic SITL simulation has the Typhoon H480 model with a simulated 3-axis gimbal and video stream that is difficult to replace if you want to make a nice mock GPS demo. PX4 seems to already support Ubuntu 22.04 with their 1.14 release and once the SITL simulation situation is somehow resolved I think GISNav should also go with that version of Ubuntu for the px4 Docker Compose service.

So while the GISNav ROS 2 package itself already works with Ubuntu 22.04 and ROS 2 Humble, for the PX4 SITL simulation we need an Ubuntu 20.04 / ROS 2 Foxy system.

With Docker containers this should not be a major issue - I think GISNav has made a lot of progress in moving all its supporting services from a monolithic single container into dedicated containers since that last 0.64 release was made. These services are all orchestrated via docker/docker-compose.yaml and docker/Makefile. For development I've noticed that it's convenient to run all the supporting services inside Docker containers (make start spins everything up in just a few seconds on my development machine once you have built, created and run your containers at least once) and then run GISNav locally (on the host machine) e.g. with ros2 launch gisnav px4.dev.launch.py

I'm currently working on a major refactoring on the gisnav-gpu branch but plan to do another pass through the documentation at the end of the refactoring - there's probably a lot of these issues hiding there.

hmakelin commented 10 months ago

I also tried your way of setting up GisNav which worked but when I try to keep following the documentation to Download the LoFTR submodule and weights, its giving me: error: pathspec 'LoFTR' did not match any file(s) known to git

The 0.64 release had LoFTR as a git submodule but it has been removed in the current master branch. You might be cloning the master branch and not checking out the 0.64.0 tag? You can follow the gisnav.org instructions for the 0.64.0 release, or if you want to grab the latest master branch you can try generating the latest docs with make docs. You need to have sourced your colcon workspace and built gisnav in it to be able to build the documentation.

On the current master branch the deep learning dependency is isolated into its own torch-serve Docker Compose service. The idea for this was to decouple the difficult-to-setup deep learning dependency chain into a separate service but it ended up introducing too much latency (serialization of each image via HTTPS) so in the gisnav-gpu branch I am removing the torch-serve service and bringing the LoFTR model back inside a ROS node so that we can pass the images around via shared memory. There the installation of the model is a bit easier since LoFTR now comes with kornia and does not have to be included as a submodule.

JiatengSun commented 10 months ago

Some more context for the Foxy/Humble situation:

The current GitHub pages guide (gisnav.org) for the latest release 0.64 recommends Ubuntu 20.04 and ROS 2 Foxy in line with PX4 recommendations from one year ago when the release was made - that was also my development system back then.

Since then Foxy has reached end of life, but we still need Foxy for the PX4 "Gazebo Classic" SITL simulation. This is because PX4's Gazebo Classic SITL simulation has the Typhoon H480 model with a simulated 3-axis gimbal and video stream that is difficult to replace if you want to make a nice mock GPS demo. PX4 seems to already support Ubuntu 22.04 with their 1.14 release and once the SITL simulation situation is somehow resolved I think GISNav should also go with that version of Ubuntu for the px4 Docker Compose service.

So while the GISNav ROS 2 package itself already works with Ubuntu 22.04 and ROS 2 Humble, for the PX4 SITL simulation we need an Ubuntu 20.04 / ROS 2 Foxy system.

With Docker containers this should not be a major issue - I think GISNav has made a lot of progress in moving all its supporting services from a monolithic single container into dedicated containers since that last 0.64 release was made. These services are all orchestrated via docker/docker-compose.yaml and docker/Makefile. For development I've noticed that it's convenient to run all the supporting services inside Docker containers (make start spins everything up in just a few seconds on my development machine once you have built, created and run your containers at least once) and then run GISNav locally (on the host machine) e.g. with ros2 launch gisnav px4.dev.launch.py

I'm currently working on a major refactoring on the gisnav-gpu branch but plan to do another pass through the documentation at the end of the refactoring - there's probably a lot of these issues hiding there.

Thank you for the detailed explanation, looking forward to the updated documentation!

I also tried your way of setting up GisNav which worked but when I try to keep following the documentation to Download the LoFTR submodule and weights, its giving me: error: pathspec 'LoFTR' did not match any file(s) known to git

The 0.64 release had LoFTR as a git submodule but it has been removed in the current master branch. You might be cloning the master branch and not checking out the 0.64.0 tag? You can follow the gisnav.org instructions for the 0.64.0 release, or if you want to grab the latest master branch you can try generating the latest docs with make docs. You need to have sourced your colcon workspace and built gisnav in it to be able to build the documentation.

On the current master branch the deep learning dependency is isolated into its own torch-serve Docker Compose service. The idea for this was to decouple the difficult-to-setup deep learning dependency chain into a separate service but it ended up introducing too much latency (serialization of each image via HTTPS) so in the gisnav-gpu branch I am removing the torch-serve service and bringing the LoFTR model back inside a ROS node so that we can pass the images around via shared memory. There the installation of the model is a bit easier since LoFTR now comes with kornia and does not have to be included as a submodule.

Of course! I forgot to checkout with the 0.64.0 tag. Thank you!

JiatengSun commented 10 months ago

Another issue came up, when pip installing requirements, pyproj-3.2.1 always raises errors. I previously dealt with this issue by installing an older version of cython (0.26.0) but I can't seem to install older versions of cython on this docker image (which might be a python version issue). I hope the revised document will also include the recommended version of Python!

JiatengSun commented 10 months ago

I have successfully got the docker running, but in QGC the bottom left window is saying waiting for video and its giving me:

qgc-1 | VideoReceiverLog: gst_element_factory_make() for data source failed qgc-1 | VideoReceiverLog: _makeSource() failed qgc-1 | VideoReceiverLog: Failed

hmakelin commented 10 months ago

I have successfully got the docker running, but in QGC the bottom left window is saying waiting for video and its giving me:

qgc-1 | VideoReceiverLog: gst_element_factory_make() for data source failed qgc-1 | VideoReceiverLog: _makeSource() failed qgc-1 | VideoReceiverLog: Failed

Thank you for finding this issue!

To study this I updated the QGC service to use Ubuntu Jammy instead of Focal and managed to get the video feed displaying in the QGC UI. In this case this was undesirable since with QGC occupying the UDP port, gscam was now unable to bridge that video feed into ROS which is what we actually want. So I might hold off on updating the QGC service to Jammy for now even though it technically fixes this bug.

Generally to troubleshoot these kinds of issues you could check that the video stream is available on your host with gstreamer. Run the following command when the PX4 SITL simulation is running (this is the same launch string that is used by the gscam service):

gst-launch-1.0 udpsrc port=5600 caps="application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)H264" ! rtph264depay ! avdec_h264 ! videoconvert ! autovideosink

You should be seeing the simulated video feed.

Some commands to help you verify the video stream is being published to ROS by gscam:

ros2 topic list | grep camera
ros2 topic echo /camera/image_raw
ros2 topic echo /camera/camera_info

If you feel like some GUIs or windows are not appearing, you might also want to check that your X server is exposed to your containers, you can do that with

cd docker
make expose-xhost

Or if you are feeling dangerous, you can also just do

xhost +
JiatengSun commented 9 months ago

After running: gst-launch-1.0 udpsrc port=5600 caps="application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)H264" ! rtph264depay ! avdec_h264 ! videoconvert ! autovideosink

I can see the video feed.

So to resolve this for now, should I updated the QGC service to use Ubuntu Jammy instead of Focal? How should I do this?

hmakelin commented 9 months ago

After running: gst-launch-1.0 udpsrc port=5600 caps="application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)H264" ! rtph264depay ! avdec_h264 ! videoconvert ! autovideosink

I can see the video feed.

So to resolve this for now, should I updated the QGC service to use Ubuntu Jammy instead of Focal? How should I do this?

I opened another MR to solve this issue here: https://github.com/hmakelin/gisnav/pull/94

To view the video stream in QGC one would just need to configure QGC to use port 5601 instead of 5600. I will next see if I can do that by default in the Docker container.

JiatengSun commented 9 months ago

Sorry but I'm still a little confused, does the bug affect gisnav demo? I successfully started the docker with "docker compose up....", should i run the demo using "make -C docker demo"?

hmakelin commented 9 months ago

Sorry but I'm still a little confused, does the bug affect gisnav demo? I successfully started the docker with "docker compose up....", should i run the demo using "make -C docker demo"?

This bug should not affect the demo, only the visibility of the video feed in the QGC UI. make -C docker demo should start all the required Docker Compose services for the demo. I've recently had some issues with ROS shared memory not working with Docker containers so you might also want to check this out if that's also the case for you https://www.gisnav.org/pages/developer_guide/offboard/troubleshooting.html#disable-sharedmemory-for-fast-dds

JiatengSun commented 9 months ago

Sorry but I'm still a little confused, does the bug affect gisnav demo? I successfully started the docker with "docker compose up....", should i run the demo using "make -C docker demo"?

This bug should not affect the demo, only the visibility of the video feed in the QGC UI. make -C docker demo should start all the required Docker Compose services for the demo. I've recently had some issues with ROS shared memory not working with Docker containers so you might also want to check this out if that's also the case for you https://www.gisnav.org/pages/developer_guide/offboard/troubleshooting.html#disable-sharedmemory-for-fast-dds

Does sudo docker compose up mapserver px4 micro-ros-agent qgc gisnav also start the demo? I notice make -C docker demo is composing everything again.

After running sudo docker compose up mapserver px4 micro-ros-agent qgc gisnav and running failure gps off in MavLink console, the drone started descending and I got this error from gisnav:

gisnav-1 | [gis_node-1] [WARN] [1706753676.873355326] [gisnav.gis_node]: Unexpected input argument types for _vehicle_geopose: nav_sat_fix (expected <class 'sensor_msgs.msg._nav_sat_fix.NavSatFix'>, got <class 'NoneType'>), pose_stamped (expected <class 'geometry_msgs.msg._pose_stamped.PoseStamped'>, got <class 'NoneType'>) gisnav-1 | [gis_node-1] [WARN] [1706753676.874889760] [gisnav.gis_node]: Unexpected input argument types for _camera_quaternion: geopose (expected <class 'geographic_msgs.msg._geo_pose_stamped.GeoPoseStamped'>, got <class 'NoneType'>) gisnav-1 | [gis_node-1] [WARN] [1706753676.875957447] [gisnav.gis_node]: Unexpected input argument types for _fov_and_principal_point_on_ground_plane: camera_quaternion (expected <class 'geometry_msgs.msg._quaternion.Quaternion'>, got <class 'NoneType'>), vehicle_pose (expected <class 'geometry_msgs.msg._pose_stamped.PoseStamped'>, got <class 'NoneType'>), camera_info (expected <class 'sensor_msgs.msg._camera_info.CameraInfo'>, got <class 'NoneType'>) gisnav-1 | [gis_node-1] [WARN] [1706753676.877023513] [gisnav.gis_node]: Unexpected input argument types for _orthoimage_size: camera_info (expected <class 'sensor_msgs.msg._camera_info.CameraInfo'>, got <class 'NoneType'>) gisnav-1 | [gis_node-1] [WARN] [1706753676.878332920] [gisnav.gis_node]: Unexpected input argument types for _request_orthoimage_for_bounding_box: bounding_box (expected <class 'geographic_msgs.msg._bounding_box.BoundingBox'>, got <class 'NoneType'>), size (expected typing.Tuple[int, int], got <class 'NoneType'>)

JiatengSun commented 9 months ago

I have managed to successfully run make -C docker demo But now i get this error and the UAV lands right after I enter failure gps off in MavLink console.

failure gps off WARN [failure] inject failure unit: gps (4), type: off (1), instance: 0 WARN [simulator_mavlink] CMD_INJECT_FAILURE, GPS off pxh> WARN [mc_pos_control] invalid setpoints WARN [mc_pos_control] Failsafe: blind land WARN [failsafe] Failsafe activated INFO [gimbal] Configured primary gimbal control sysid/compid from 1/1 to 0/0 INFO [tone_alarm] battery warning (fast) ERROR [timesync] Time jump detected. Resetting time synchroniser. and the terminal is outputing: gisnav-1 | [gis_node-1] [INFO] [1707028616.316356544] [gisnav.gis_node]: Sending GetMap request for bbox: BBox(left=-122.25607107767837, bottom=37.52134352310396, right=-122.25007690232293, top=37.52611766351677), layers: ['osm-buildings-dem']. gisnav-1 | [cv_node-2] [WARN] [1707028616.911203359] [gisnav.cv_node]: Could not estimate pose, status code 503 gisnav-1 | [cv_node-2] [WARN] [1707028616.912565405] [gisnav.cv_node]: Unexpected input argument types for _post_process_pose: pose (expected typing.Tuple[numpy.ndarray, numpy.ndarray], got <class 'NoneType'>) gisnav-1 | [cv_node-2] [WARN] [1707028616.913595017] [gisnav.cv_node]: Could not complete post-processing for pose estimation gisnav-1 | [cv_node-2] [WARN] [1707028617.598098401] [gisnav.cv_node]: Could not estimate pose, status code 503

Also, the RViz window, which I assume should be the window to display ortho image, is just a black stage with grids.

hmakelin commented 9 months ago

Nice to hear you got the simulation running!

Unfortunately I currently do not have a solution for the failsafes triggering when you turn off the simulated GPS. Some time after I made the demo video the PX4 EKF was updated so that it no longer worked with the old mock GPS messages - I suspect it's because the EKF now requires a velocity estimate here https://github.com/PX4/PX4-Autopilot/blame/main/src/modules/ekf2/EKF2.cpp#L2375-L2382 which GISNav's MockGPSNode does not compute. I've thought about adding a filter to compute the velocity estimate and variances from the position estimates based on some simple state model in GISNav but no concrete plans yet since these are downstream of what I'm now working on (I'm currently working on setting up Docker bridge networks to isolate related services so that they would not have to share the same host network to have a bit better security and modularity).

On some of the other issues you mentioned:

JiatengSun commented 9 months ago

Nice to hear you got the simulation running!

Unfortunately I currently do not have a solution for the failsafes triggering when you turn off the simulated GPS. Some time after I made the demo video the PX4 EKF was updated so that it no longer worked with the old mock GPS messages - I suspect it's because the EKF now requires a velocity estimate here https://github.com/PX4/PX4-Autopilot/blame/main/src/modules/ekf2/EKF2.cpp#L2375-L2382 which GISNav's MockGPSNode does not compute. I've thought about adding a filter to compute the velocity estimate and variances from the position estimates based on some simple state model in GISNav but no concrete plans yet since these are downstream of what I'm now working on (I'm currently working on setting up Docker bridge networks to isolate related services so that they would not have to share the same host network to have a bit better security and modularity).

On some of the other issues you mentioned:

* For v0.65 the RViz window should display the tf2 transformations tree. There is also an RVizNode that was used earlier to publish the drone flight path in RViz but it will not work for v0.65.

* Those warnings about unexpected input argument types you mentioned earlier are expected and you will probably get a lot of them - they are generated by the runtime type narrowing decorator https://www.gisnav.org/pages/api_documentation/private/decorators.html#gisnav._decorators.narrow_types

I see, so if I run v0.65, should the demo be working? What is your suggestion if I want a working demo to play with?

hmakelin commented 9 months ago

The solution is probably a combination of modifying the PX4 EKF2 related parameters to try to increase tolerances (responsibly and in moderation) to different kinds of variation and delays in the GPS signal, and updating what goes in the MockGPSNode outgoing message. The latter I think requires more thought and will ultimately depend on what version of PX4 you will be targeting in the future. I am unfortunately not working on this yet since I expect there to be multiple releases of PX4 before GISNav is mature enough to spend more development effort on the integration.

JiatengSun commented 8 months ago

I see, but since the demo used to work, is it possible to roll back to an older version of PX4 to fix this issue for now?

hmakelin commented 8 months ago

I see, but since the demo used to work, is it possible to roll back to an older version of PX4 to fix this issue for now?

I created https://github.com/hmakelin/gisnav/issues/109 to track this issue since it is not related to installation. Please see my response there.

hmakelin commented 6 months ago

Release 0.67.0 should fix these installation issues and improve the situation with the failsafes triggering in SITL simulation. So I am closing this issue but of course feel free to open a new issue if similar issues appear again. I also recommend rebuilding any Docker images you might have. Thank you for reporting!