raimon49 / pip-licenses

Dump the license list of packages installed with pip.
MIT License
314 stars 45 forks source link

Packages with multiple LICENSES files not shown in --with-license-file #71

Open johnthagen opened 4 years ago

johnthagen commented 4 years ago
$ pip install -U pip pip-licenses
$ pip install opencv-python-headless==4.2.0.34
$ pip-licenses --from=mixed --with-license-report

The opencv-python-headless wheels include two license files, LICENSE.txt and LICENSE-3RD-PARTY.txt, but only LICENSE-3RD-PARTY.txt is displayed. Could support be added to display both license files since some packages like opencv have multiple files covering different parts of the library.


Format LicensePath LicenseText
Plain/Plain Vertical \n \n\n
JSON JSON List [] JSON List []
CSV ;
HTML
raimon49 commented 4 years ago

@johnthagen Yes, I am aware of the problem.

This issue needs careful consideration.The type may change in structured formats like JSON.

For example, I have the idea of introducing a new option --with-multiple-license-files to separate the display from the existing --with-license-file.

Do you have any other ideas?

johnthagen commented 4 years ago

Another idea could be to put multiple license files in the same column and separate them them with some kind of deliminator. That way there would still be single rows for each package, but LicenseFile and LicenseText could just contain concatenated information.

The issue I see with --with-multiple-license-files is that really getting all the license files needs to be the default state, or else people will invariably miss license files. For something like licensing, I personally feel it's important that the defaults be as all-encompassing as possible. Missing a license file could technically mean you are violating a license and wouldn't know if you didn't know the internals of pip-licenses.

Maybe #60 and this together could be part of a 3.0 release that tries to make the default experience for pip-licenses a little more "correct"?

raimon49 commented 4 years ago

Certainly, I think it is desirable for many users to change the behavior in the 3.0 release. It seems that the idea of connecting with a delimiter and outputting as the default is good.

johnthagen commented 4 years ago

One easy idea for the license file contents themselves could be to concatenate them together with a newline in-between.

johnthagen commented 4 years ago

Another popular package with multiple license files is Matplotlib: https://github.com/matplotlib/matplotlib/tree/master/LICENSE

(Note that currently matplotlib doesn't bundle it's LICENSE files correctly (https://github.com/matplotlib/matplotlib/issues/18296) but once that is corrected, it would be another example that pip-licenses could support if updated).

raimon49 commented 4 years ago

@johnthagen Thanks for all the ideas!

I have started work today on the release of version 3.0.0.

This release allows for breaking changes.

If you have ideas for changes to the --with-license-file option, please make a Pull Request for the release-3.0.0 branch.

Version 3.0.0 is expected to ship around the time Python 3.5 becomes EOL. That's 2020-09-13, according to the Ptyhon team's announcement.

johnthagen commented 4 years ago

@raimon49 After thinking about this issue, the only way I could think of to make this work while maintaining a "one row per package" in the table layout would be to concatenate LicenceFile and LicenseText results together with \n.

This would look something like:

my/license/path/LICENSE1
my/license/path/LICENSE2

and

MIT License
...
End of license

Second License
...
End of license 2

What do you think of this technique?

raimon49 commented 4 years ago

@johnthagen I agree with you about the Plain or Plain Vertical format.

The difficulty is the formatting, where \n are meaningful and structured, as in the following

We have to support users who use these formats.

johnthagen commented 4 years ago

@raimon49 For the LICENSE files themselves using --with-license-file, the files themselves already have lots of \n in them, so appending more LICENSE files together wouldn't change the status quo, would it?

johnthagen commented 4 years ago

We could also consider what you suggested --with-multiple-license-files, but the problem is that the end user really has no way of knowing ahead of time if there will be multiple LICENSE files or not. If they miss LICENSE files they likely can inadvertently violate the LICENSE requirements by not including all of the license files.

johnthagen commented 4 years ago

For HTML, JSON, and CSV, could we use a different deliminator or just use a raw \n rather than the actual newline ASCII code?

[
  {
    "licenses": ["BSD"],
    "name": "Django",
    "version": "2.0.2",
    "LicensePath": "/path/to/LICENSE1\n/path/to/LICENSE2"
  },
]

Then users can split on the deliminator. We could also use another character, such as ; or for non Plain/Plain Vertical?

[
  {
    "licenses": ["BSD"],
    "name": "Django",
    "version": "2.0.2",
    "LicensePath": "/path/to/LICENSE1;/path/to/LICENSE2"
  },
]
raimon49 commented 4 years ago

JSON can be represented as data in a list.

[
  {
    "licenses": ["BSD"],
    "name": "Django",
    "version": "2.0.2",
    "LicensePath": ["/path/to/LICENSE1", "/path/to/LICENSE2"],
    "LicenseText ": ["LICENSE Text 1", "ICENSE Text 2"]
  },
]

Users will find it useful to be able to pass the output to the | jq command.

Some kind of delimiter is required in CSV. But for example ; also appears in the Apache License 2.0. https://choosealicense.com/licenses/apache-2.0/

It's a very vexing issue...

johnthagen commented 4 years ago

JSON can be represented as data in a list.

Ah yes, that is even better for JSON, nice. Perhaps we should try to enumerate a list of workable solutions as we go. I will start a list in the first comment to track this.

johnthagen commented 4 years ago

It seems hard to imagine someone could use --with-license-file with CSV output effectively. The LICENSE files will have lots of \n that will destroy the formatting. Same could be said for Markdown, reST, and Confluence.

For example, when I first tried to use pip-licenses I wanted to use RST -> Sphinx. I quickly realized that RST using --with-license-file never actually worked as the tables were a complete mess. What we actually did instead (and works really nicely) is to:

$ pip-licenses --from=mixed --format=rst --output-file summary.rst
$ pip-licenses --from=mixed --format=plain-vertical --with-license-file --no-license-path --output-file with_license.txt

And then the RST and plaintext are ingested by Sphinx: the RST is rendered nicely, and the plain-vertical is just shown as monospace text. The end result is nice.

All that to say, maybe we should consider specifically not supporting certain formats with --with-license-file, basically any of the formats that will break from multi-line LICENSE files anyway. I think they are basically broken currently for --with-license-file anyway.

johnthagen commented 4 years ago

I propose --with-license-file be replaced with --with-license-files so that it is more accurate, and --with-license-files changed to only support formats that can support license files (they need to be able to handle \n without breaking the table/output).

Formats supported with --with-license-files:

<LICENSE File 2 Contents>



Other formats will result in a CLI error if combined with `--with-license-files`.