ros-infrastructure / catkin_pkg

Standalone Python library for the catkin build system.
https://github.com/ros/catkin
Other
47 stars 91 forks source link

Fix parsing package.xml files with Unicode characters #252

Closed zultron closed 5 years ago

zultron commented 5 years ago

Unicode characters in package.xml break parse_package_string(). Here's an example traceback caused by a package where the author's name contained an umlaut:

Traceback (most recent call last):
  [...]
  File "/usr/lib/python2.7/dist-packages/catkin_pkg/package.py", line 540, in parse_package_string
    raise InvalidPackage('The manifest contains invalid XML:\n%s' % ex, filename)
catkin_pkg.package.InvalidPackage: The manifest contains invalid XML:
'ascii' codec can't encode character u'\xf6' in position 256: ordinal not in range(128)
dirk-thomas commented 5 years ago

Please include the package.xml file in question and your locale settings.

zultron commented 5 years ago

package.xml

$ locale
LANG=C.UTF-8
LANGUAGE=
LC_CTYPE="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_PAPER="C.UTF-8"
LC_NAME="C.UTF-8"
LC_ADDRESS="C.UTF-8"
LC_TELEPHONE="C.UTF-8"
LC_MEASUREMENT="C.UTF-8"
LC_IDENTIFICATION="C.UTF-8"
LC_ALL=C.UTF-8
zultron commented 5 years ago

This was totally my fault. My apologies. Closing PR.

gavanderhoorn commented 5 years ago

This was totally my fault. My apologies. Closing PR.

What was the cause in the end?

zultron commented 5 years ago

I'm tracking it down to a problem in rosdistro, where package.xml sources downloaded from GitHub are immediately decoded, and not re-encoded before sending them off to get parsed. PR is coming shortly.