ros-infrastructure / catkin_pkg

Standalone Python library for the catkin build system.
https://github.com/ros/catkin
Other
47 stars 89 forks source link

python_setup.py: fix build with setuptools v59.0.0 and newer #316

Closed shr-project closed 2 years ago

shr-project commented 2 years ago

Signed-off-by: Martin Jansa martin.jansa@lge.com

schlagenhauf commented 2 years ago

Newlines in the package description currently break the ros-noetic-desktop installation on Arch Linux, where python-setuptools 59.1.0 is in the system packages. Merging this PR would be greatly appreciated.

shr-project commented 2 years ago

I no longer have test environment for this, won't be able to update the PR properly now.

nuclearsandwich commented 2 years ago

As noted in #326 I think the approach taken there is preferable. All the same, thanks @shr-project and the other contributors for both raising the issue and proposing patches!

nuclearsandwich commented 2 years ago

@cottsay noted that there are packages with different linebreak strategies which would be adversely affected by #326, leading to both of us swinging back to prefer an implementation more like this patch. However this has kicked up a bit of muck and raised deficiencies with even the current handling of package descriptions before either patch.

One example he raised was the description

<description>
  This is a package which
  does stuff.

  It does it very efficiently.
  You should use it.
</description>

which 326 would truncate to

This is a package which

whereas this patch would transform it into

This is a package which   does stuff.    It does it very efficiently.   You should use it.

The reason for the extra spaces is that no work is done to handle the interior indentation. The current implementation strips all exterior whitespace from the description via _get_node_value but does not touch interior whitespace.

https://github.com/ros-infrastructure/catkin_pkg/blob/55db7cd10fe65f3e13f629de35fd4a7c3d96584f/src/catkin_pkg/package.py#L799

When figuring out what we should do we turned to both the XML specification which states:

2.10 White Space Handling

In editing XML documents, it is often convenient to use "white space" (spaces, tabs, and blank lines) to set apart the markup for greater readability. Such white space is typically not intended for inclusion in the delivered version of the document. On the other hand, "significant" white space that should be preserved in the delivered version is common, for example in poetry and source code. An XML processor MUST always pass all characters in a document that are not markup through to the application. A validating XML processor MUST also inform the application which of these characters constitute white space appearing in element content.

...

2.11 End-of-Line Handling XML parsed entities are often stored in computer files which, for editing convenience, are organized into lines. These lines are typically separated by some combination of the characters CARRIAGE RETURN (#xD) and LINE FEED (#xA). To simplify the tasks of applications, the XML processor MUST behave as if it normalized all line breaks in external parsed entities (including the document entity) on input, before parsing, by translating both the two-character sequence #xD #xA and any #xD that is not followed by #xA to a single #xA character.

and REP-149

The description of the package. It can consist of multiple lines and may contain XHTML. But depending on where the description is used XML tags and multiple whitespaces might be stripped.

With both of these in mind, I see this as meaning that once we receive the description contents from the XML parser catkin_pkg as the "application" decides how to handle whitespace within the desciption and that according to REP-149 we should be treating it as a block of XHTML, which, like HTML should not treat any whitespace as significant outside of pre-formatted text blocks.

On top of that, we do know that https://index.ros.org is dumping the contents of <description/> directly into an html <div> tag so there is already at least one place where HTML formatting rules are being applied to the description.

To that end, I've opened #329 which starts adding test cases for xhtml-formatted descriptions. From that PR I expect we'll arrive at a standardized way to get a "plaintext" description which we can then use in setup.py files and other places where HTML-formatted descriptions are not expected.

nuclearsandwich commented 2 years ago

329 has been merged which adds a separate plaintext_description attribute for packages that includes a very minimal plaintext rendering of the xhtml description which is much better suited to being used in setuptools.

nuclearsandwich commented 2 years ago

326 has been updated to use the plaintext_description field as the base and will take the first line of the plaintext description. I think once that PR is merged, we can close this one as having been superseded.

Since most package.xml descriptions do not include <br> tags it's quite likely that the combined usage of the plaintext-rendered description and keeping only up to the first rendered newline will mean that most packages won't see a difference between #316 and #326 as the newlines in the raw description will not be seen by #326 at all.

I know this one took some time and had a lot of back and forth. My thanks to everyone for their contributions here as I think we not only arrived at a very good solution in #326 but also found an axis for improvement in our description handling everywhere.

cottsay commented 2 years ago

Closed via #326