raimon49 / pip-licenses

Dump the license list of packages installed with pip.
MIT License
314 stars 45 forks source link

.egg directories not considered in get_pkg_included_file #97

Open philipaxer opened 3 years ago

philipaxer commented 3 years ago

Hi All,

I noticed that some license files are not correctly identified. This seems to happen because only .dist-info directories are considered and .egg are not tried.

This specifically assumes data will reside in .dist-info which is not always true.

pkg_dirname = "{}-{}.dist-info".format(
            pkg.project_name.replace("-", "_"), pkg.version)

In my .venv i have numpy-1.20.1-py3.9-win-amd64.egg which is not detected and skipped. Similarly others

regards Philip

philipaxer commented 3 years ago

Changing the function as follows fixes the issue (no time to create a pull request, sorry).

    def get_pkg_included_file(pkg, file_names):
        """
        Attempt to find the package's included file on disk and return the
        tuple (included_file_path, included_file_contents).
        """
        included_file = LICENSE_UNKNOWN
        included_text = LICENSE_UNKNOWN
        pkg_dirname = "{}-{}.dist-info".format(
            pkg.project_name.replace("-", "_"), pkg.version)
        patterns = []
        [patterns.extend(sorted(glob.glob(os.path.join(pkg.location,
                                                       pkg_dirname,
                                                       f))))
        for f in file_names]

        [patterns.extend(sorted(glob.glob(os.path.join(pkg.location,
                                                       'EGG-INFO',
                                                       f))))
        for f in file_names]

        for test_file in patterns:
            if os.path.exists(test_file):
                included_file = test_file
                with open(test_file, encoding='utf-8',
                          errors='backslashreplace') as included_file_handle:
                    included_text = included_file_handle.read()
                break
        return (included_file, included_text)
raimon49 commented 3 years ago

@philipaxer Thanks for the report. This issue will be resolved in the next patch version release.

raimon49 commented 3 years ago

The egg package is legacy and I haven't used it much.

Looking at the specs, it looks like there are two types.

There are two basic formats currently implemented for Python eggs:

  1. .egg format: a directory or zipfile containing the project’s code and resources, along with an EGG-INFO subdirectory that contains the project’s metadata
  2. .egg-info format: a file or directory placed adjacent to the project’s code and resources, that directly contains the project’s metadata.

@philipaxer Please provide the full path of the egg package you want to explore for the license file.

Of course, you don't need any private information that you don't want printed on your machine.

philipaxer commented 3 years ago

This is interesting, i recreated the venv and installed the packages. Now numpy shows as a dist-info package. Any idea when it will come up as EGG-INFO?

By going through my native site-packages, i can pick some examples, I am giving you the PATH which contains the LICENSE* (see note below) site-packages\lxml-4.6.2-py3.9-win-amd64.egg\EGG-INFO

and site-packages\pefile-2019.4.18-py3.9.egg-info

Interestingly, I cannot find any py3.9.egg-info directory which has LICENSE. The directory contains the following files:

$ ls -lha
total 109K
drwxr-xr-x 1 XYZ 1049089    0 Mar 21 19:11 ./
drwxr-xr-x 1 XYZ 1049089    0 Mar 25 10:26 ../
-rw-r--r-- 1 XYZ 1049089    1 Mar 21 19:11 dependency_links.txt
-rw-r--r-- 1 XYZ 1049089  404 Mar 21 19:11 installed-files.txt
-rw-r--r-- 1 XYZ 1049089 1.5K Mar 21 19:11 PKG-INFO
-rw-r--r-- 1 XYZ 1049089    7 Mar 21 19:11 requires.txt
-rw-r--r-- 1 XYZ 1049089  291 Mar 21 19:11 SOURCES.txt
-rw-r--r-- 1 XYZ 1049089   25 Mar 21 19:11 top_level.txt

XYZ@XYZ MINGW64 /c/Python39/Lib/site-packages/pefile-2019.4.18-py3.9.egg-info
$

Perhaps only option 1. from your list has the LICENSE as an explicit file.

raimon49 commented 3 years ago

OK, thanks for your information.

raimon49 commented 3 years ago

@philipaxer Hi, I tried to respond to the issues you reported.

I don't have the egg package installed in my environment, and it is a release candidate version.

Can you please report back if this version works well in your environment?

# Install the release candidate in your environment
$ pip install 'pip-licenses==3.3.2rc1'  

If it doesn't work well, please create a pull request in the following branch. You are always welcome to do so. https://github.com/raimon49/pip-licenses/tree/release-3.3.2

cdce8p commented 3 years ago

Thought I would add some background information.

By default no version of setuptools includes license files inside the .egg-info folder. It's up to the individual developer to do so. I would recommend installing wheel first. This will create the dist-info folder for each newly installed package which includes license files.

As for the .egg: AFAIK this format is deprecated and has been replaced by wheel.