ros-infrastructure / catkin_pkg

Standalone Python library for the catkin build system.
https://github.com/ros/catkin
Other
47 stars 89 forks source link

Validate license values #296

Open shr-project opened 3 years ago

shr-project commented 3 years ago

This PR was motivated by this issue in superflore: https://github.com/ros-infrastructure/superflore/issues/271 superflore tries to unify at least some values to more common license identifier, but because https://www.ros.org/reps/rep-0149.html#license-multiple-but-at-least-one isn't very strict about the values and people got quite creative here and the regular expressions in superflore are in some cases "inventing" extra information like mapping "LGPL" to "LGPL-2" and in worse cases plain wrong mapping "LGPLv3" to "LGPL-2".

Superflore or any other tool, cannot guess what LGPL version developer had in mind when it isn't specified in the package.xml. Because it would need to at least investigate the actual sources for license headers there.

Instead of trying to improve the license value later, we need to warn the developer when the provided value cannot be unambiguously parsed and ask him to improve it in the package.xml in the source repo. To help with that we can show the correct value (of corresponding SPDX identifier) we've assigned from the provided license value.

And to make sure that these assigned SPDX identifiers don't do any guessing or match to too wide range of values (like in superflore) lets map them by exact string from the values currently being used in any component currently listed in rosdistro index. That way these assignments are easy to maintain and not error-prone.

Then we can take the same set of exact assignments from here and add it "temporarily" in superflore, that way the licenses which are currently incorrectly guessed will be fixed while not regressing to completely free form of license values. This is implemented in https://github.com/ros-infrastructure/superflore/pull/279

rosdistro CI validation already checks for license tag existence, we can easily extend it to call this validation as well as implemented in: https://github.com/ros/rosdistro/pull/26601 I've used this validation to easily get a list of all license values from all components currently listed in rosdistro index.

Some commonly used values (and some more creative):

cottsay commented 3 years ago

This seems like it might be better suited for inclusion in catkin_lint. Any thoughts on that?

shr-project commented 3 years ago

This seems like it might be better suited for inclusion in catkin_lint. Any thoughts on that?

I've added it to catkin_pkg only because rosdistro CI github action uses it to validate version, so I've used the same mechanism for license validation, but if more people will notice the warnings from catkin_lint then definitely it should be there (as well).

dirk-thomas commented 3 years ago

@shr-project Atm the PR is in draft state and has no description. Therefore I am not going to review the patch. Please add a very detailed description about what the patch is doing, why it is doing so, what the side effects are, why you think those are justified, etc.

On a first look this adds new warnings to a lot of existing packages. This on its own for already released distros is an almost guaranteed no-go.

shr-project commented 3 years ago

@shr-project Atm the PR is in draft state and has no description. Therefore I am not going to review the patch. Please add a very detailed description about what the patch is doing, why it is doing so, what the side effects are, why you think those are justified, etc.

It was in draft state, because I wanted to add separate warning for license values which include multiple licenses (instead of using multiple separate license tags). I've added another commit for that now and added description.

Please let me know if it makes more sense to you now.

On a first look this adds new warnings to a lot of existing packages. This on its own for already released distros is an almost guaranteed no-go.

Are developers used to use https://github.com/fkie/catkin_lint ? Would showing warning about strange license texts in components from released distros be OK if done in catkin_lint?

cottsay commented 3 years ago

Would showing warning about strange license texts in components from released distros be OK if done in catkin_lint?

It's certainly not a hill I'm willing to die on, but in my opinion, the warning and error messages coming out of catkin_pkg should be more along the lines of syntax, structure, or schema-level problems in the manifests - problems that could reasonably prevent accurate processing of the manifest by catkin_pkg itself, or another system that depends on catkin_pkg. It isn't a perfect parallel, but I'll note that many Linux distributions document what precise representation should be used to refer to specific licenses, but few enforce this in the packaging system itself (i.e. deb/rpm as opposed to linters and test automation).

To be clear: I think this is an excellent thing to do, I'm just not convinced catkin_pkg is the right tool for the job. I would love to see it in catkin_lint, and I would love to see widespread adoption of catkin_lint in a GitHub Action.

shr-project commented 3 years ago

I'm working on very similar PR for catkin_lint as well now :).

I only work with ROS when working on meta-ros, I've never used ROS as a regular developer, so I don't know what's the usual work flow for people who write package.xml. If it's common that they run catkin_lint (or that Github Actions runs it for them), then I agreed that's definitely the place where it should be validated.

gavanderhoorn commented 3 years ago

@cottsay wrote:

I would love to see it in catkin_lint, and I would love to see widespread adoption of catkin_lint in a GitHub Action.

industrial_ci supports catkin_lint runs natively and can be used as a GH Action.

@shr-project wrote:

If it's common that they run catkin_lint

can't speak for everyone, but I have quite a few industrial_ci configurations which have CATKIN_LINT=true.