ros-infrastructure / catkin_pkg

Standalone Python library for the catkin build system.
https://github.com/ros/catkin
Other
47 stars 89 forks source link

parse_package_string should check dependency conditions before checking for redundant tags #303

Closed abrandemuehl closed 3 years ago

abrandemuehl commented 3 years ago

Currently I'm on Ubuntu 18.04, ROS melodic and python-catkin-pkg is version 0.4.23-100. The issues I've been running into have just started to show up with new ignition version of 0.9.4-2bionic for the packages listed below.

It looks to me like parse_package_string doesn't ever evaluate or check the conditions that are put on the package's dependency tags. Downstream this results in an exception of InvalidPackage being raised on ros_ign_gazebo, ros_ign_bridge and ros_ign_image (which all use condition attributes) when I try to generate code for messages using geneus. This results in some deadlock, which instigated me to investigate in the first place.

I've logged some of the stuff that I've been trying and finding in catkin_tools: https://github.com/catkin/catkin_tools/issues/379

My current theory for my issues is that because geneus uses find_packages which calls parse_package_string directly, it fails when condition attributes are used in a redundant way (different depend tags with conditions but the same package name), there is an error that causes whatever deadlock I'm experiencing in find_packages.

There is also an issue of why raising InvalidPackage causes a deadlock instead of throwing an exception properly, but I don't have many leads on that front.

cottsay commented 3 years ago

Could you please post a way to reproduce the error you're seeing, and maybe a backtrace or other error message?

Looking at parse_package_string, it does perform some dependency redundancy detection, but it does so in a way that is safe in the presence of conditionals. In the presence of a condition attribute, dependencies are only flagged as redundant if their condition attributes are identical strings. I tested this locally, and it appears to hold.

Since evaluation of conditionals is a potentially expensive operation that isn't actually necessary if dependency information isn't required by the caller (maybe they only want the version number or author email address or something), I don't believe that evaluating the conditionals implicitly in the call to parse_package_string is really the best approach.

Additionally, parse_package_string doesn't have the conditional context used for evaluation. The topological ordering code implicitly uses the environment variables for this, but some applications (like bloom) need to specify context specifically, so there would need to be an API change to allow the context to be specified.

Bottom line: If the caller wants the conditionals to be evaluated, could they not just call Package.evaluate_conditions on the package that was returned by parse_package_string?

abrandemuehl commented 3 years ago

You can reproduce it by installing the new ignition ros packages https://github.com/ignitionrobotics/ros_ign that use the condition attribute (ros_ign_gazebo, ros_ign_image, ros_ign_bridge) that have redundant depend flags if conditions are ignored. Then try to build any ros package that generates euslisp with gen_eus (which calls find_packages). That build will hang and go to zero CPU usage if the parallel code in find_packages gets activated (when you have more than 100 packages to go through) because the InvalidPackage exception gets raised in the multiprocessing pool map call.

I don't think calling Package.evaluate_conditions afterwards would work in this case because the code will raise an exception inside parse_package_string because of a duplicate tag. You can see that the packages I mentioned https://github.com/ignitionrobotics/ros_ign/blob/melodic/ros_ign_gazebo/package.xml#L15 have duplicates if you don't evaluate the conditions.

Here's the stack trace in the hanging spot.

(Pdb) bt
  /usr/lib/python2.7/pdb.py(1314)main()
-> pdb._runscript(mainpyfile)
  /usr/lib/python2.7/pdb.py(1233)_runscript()
-> self.run(statement)
  /usr/lib/python2.7/bdb.py(400)run()
-> exec cmd in globals, locals
  <string>(1)<module>()
  /opt/ros/melodic/lib/geneus/gen_eus.py(39)<module>()
-> import geneus
  /opt/ros/melodic/lib/python2.7/dist-packages/geneus/geneus_main.py(137)genmain()
-> pkg_map = get_pkg_map()
  /opt/ros/melodic/lib/python2.7/dist-packages/geneus/geneus_main.py(56)get_pkg_map()
-> pkgs = packages.find_packages(ws)
  /usr/local/lib/python2.7/dist-packages/catkin_pkg/packages.py(89)find_packages()
-> packages = find_packages_allowing_duplicates(basepath, exclude_paths=exclude_paths, exclude_subspaces=exclude_subspaces, warnings=warnings)
  /usr/local/lib/python2.7/dist-packages/catkin_pkg/packages.py(160)find_packages_allowing_duplicates()
-> pool.join()
  /usr/lib/python2.7/multiprocessing/pool.py(479)join()
-> p.join()
  /usr/lib/python2.7/multiprocessing/process.py(148)join()
-> res = self._popen.wait(timeout)
  /usr/lib/python2.7/multiprocessing/forking.py(154)wait()
-> return self.poll(0)
> /usr/lib/python2.7/multiprocessing/forking.py(135)poll()
-> pid, sts = os.waitpid(self.pid, flag)

Here's the error that's raised if the parallel code is turned off

InvalidPackage: Error(s) in package '/opt/ros/melodic/share/ros_ign_bridge/package.xml':
Error(s):
- The generic dependency on 'ignition-msgs6' is redundant with: build_depend, build_export_depend, exec_depend
- The generic dependency on 'ignition-transport9' is redundant with: build_depend, build_export_depend, exec_depend
ERROR:  Error(s) in package '/opt/ros/melodic/share/ros_ign_bridge/package.xml':
Error(s):
- The generic dependency on 'ignition-msgs6' is redundant with: build_depend, build_export_depend, exec_depend
- The generic dependency on 'ignition-transport9' is redundant with: build_depend, build_export_depend, exec_depend
cottsay commented 3 years ago

Here's the stack trace in the hanging spot

Thanks - this helps a lot. This seems like a real bug to me, but I can't reproduce it using Python 2 or Python 3.

Here's the error that's raised if the parallel code is turned off

I was able to reproduce this error only when using a catkin_pkg version prior to 0.4.23. This issue was fixed in that release by #294. Please run grep __version__ /usr/local/lib/python2.7/dist-packages/catkin_pkg/__init__.py to see what version you have installed there. Judging from the line numbers in that backtrace, it's prior to 0.4.18.

abrandemuehl commented 3 years ago

Looks like I do have an old version, 0.4.9. I guess this is a problem then for whoever maintains the ubuntu repository then?

gavanderhoorn commented 3 years ago

Are buildfarm distributed Python versions ever installed in /usr/local/lib?

@abrandemuehl: have you installed the upstream version perhaps? No, that would be 0.3.9 according to apt-cache policy.

cottsay commented 3 years ago

I guess this is a problem then for whoever maintains the ubuntu repository then?

Typically stuff installed to /usr/local is not a system package. I'm guessing it was installed by pip.

gavanderhoorn commented 3 years ago

Yes, I had the same idea.

root@2e3977fd7fab:/# apt-cache policy python-catkin-pkg
python-catkin-pkg:
  Installed: 0.4.23-100
  Candidate: 0.4.23-100
  Version table:
 *** 0.4.23-100 500
        500 http://packages.ros.org/ros/ubuntu bionic/main amd64 Packages
        100 /var/lib/dpkg/status
     0.3.9-1 500
        500 http://archive.ubuntu.com/ubuntu bionic/universe amd64 Packages
root@2e3977fd7fab:/# dpkg -L python-catkin-pkg
/.
/usr
/usr/bin
/usr/bin/catkin_create_pkg
/usr/bin/catkin_find_pkg
/usr/bin/catkin_generate_changelog
/usr/bin/catkin_package_version
/usr/bin/catkin_prepare_release
/usr/bin/catkin_tag_changelog
/usr/bin/catkin_test_changelog
/usr/lib
/usr/lib/python2.7
/usr/lib/python2.7/dist-packages
/usr/lib/python2.7/dist-packages/catkin_pkg-0.4.23.egg-info
/usr/lib/python2.7/dist-packages/catkin_pkg-0.4.23.egg-info/PKG-INFO
/usr/lib/python2.7/dist-packages/catkin_pkg-0.4.23.egg-info/dependency_links.txt
/usr/lib/python2.7/dist-packages/catkin_pkg-0.4.23.egg-info/entry_points.txt
/usr/lib/python2.7/dist-packages/catkin_pkg-0.4.23.egg-info/requires.txt
/usr/lib/python2.7/dist-packages/catkin_pkg-0.4.23.egg-info/top_level.txt
/usr/share
/usr/share/doc
/usr/share/doc/python-catkin-pkg
/usr/share/doc/python-catkin-pkg/changelog.Debian.gz
/usr/share/doc/python-catkin-pkg/copyright
abrandemuehl commented 3 years ago

Okay. So to conclude, I just have an old version, installed directly via pip at some point

gavanderhoorn commented 3 years ago

:+1: for the backtraces btw and the sleuthing. Really nice.

gavanderhoorn commented 3 years ago

Okay. So to conclude, I just have an old version, installed directly via pip at some point

What does pip list | grep catkin output?

abrandemuehl commented 3 years ago

It listed catkin-pkg at 0.4.9 so I've uninstalled it and I'm seeing if I can reproduce now

EDIT: Looks like the problem has been solved. Thank you for your help!