pypa / setuptools

Official project repository for the Setuptools build system
https://pypi.org/project/setuptools/
MIT License
2.49k stars 1.18k forks source link

Generated files must be present before invocation to be included in package_data #1064

Open JonathonReinhart opened 7 years ago

JonathonReinhart commented 7 years ago

I've created the following repository to demonstrate the problem: https://github.com/JonathonReinhart/setuptools-package_data-issue

Copying what I've shown there:

This repository is an MCVE for the following issues:

Problem statement

When python setup.py sdist bdist_wheel is invoked from a clean starting point, data files (identified in package_data), which are generated during setup.py hooks, will not be included in the wheel.

Steps to reproduce

First:

Now that mypkg/generated_data is present:

Clean the directory, and run other variations of sdist and bdist_wheel (as mentioned above) and observe that the problem does not manifest.

Workarounds that do not work

JonathonReinhart commented 7 years ago

Sigh, it turns out setting include_package_data to true "fixes" the problem.

(again copied from my example project)

Solution

Setting include_package_data = True caused setup.py to behave as intended.

Note that the setuptools documentation is not particularly clear on why this works:

include_package_data If set to True, this tells setuptools to automatically include any data files it finds inside your package directories that are specified by your MANIFEST.in file. For more information, see the section below on Including Data Files.

package_data A dictionary mapping package names to lists of glob patterns. For a complete description and examples, see the section below on Including Data Files. You do not need to use this option if you are using include_package_data, unless you need to add e.g. files that are generated by your setup script and build process. (And are therefore not in source control or are files that you don’t want to include in your source distribution.)

...

In summary, ...

include_package_data Accept all data files and directories matched by MANIFEST.in.

package_data Specify additional patterns to match files and directories that may or may not be matched by MANIFEST.in or found in source control.

Nothing in this documentation describes the behavior I was seeing, however. In my case, it seemed like the files discovered by sdist (using MANIFEST.in) somehow overrode those that bdist_wheel would normally find.

For whatever reason, specifying include_package_data seems to have two, somewhat inverted purposes.

JonathonReinhart commented 7 years ago

command/build_py.py has the following ("positive") reference to include_package_data:

    def analyze_manifest(self):
        self.manifest_files = mf = {}
        if not self.distribution.include_package_data:
            return
        src_dirs = {}
        for package in self.packages or (): 
            # Locate package source directory
            src_dirs[assert_relative(self.get_package_dir(package))] = package

        self.run_command('egg_info')
        ei_cmd = self.get_finalized_command('egg_info')
        for path in ei_cmd.filelist.files:
           # ...

It seems that even though this is named analyze_manifest, invoking egg_info must be actually looking at more than just the manifest. It must also find files present in the packages "which may or may not be because they're listed in package_data.

Ugh. This is so complicated.

alexshpilkin commented 5 years ago

There are no less than three places where a generated file gets kicked out of SOURCES.txt (not the sdist itself, but still required): FileList.append refuses to add nonexistent files, then manifest_maker.prune_file_list weeds them out, then manifest_maker.write_manifest weeds them out again.

All ultimately chain up to FileList._safe_path.

alexshpilkin commented 5 years ago

Huh. Actually, SOURCES.txt is what matters — if a file gets there, it’ll get into the sdist if sdist.make_release_tree generates it.

The _safe_path magic sauce in FileList looks like it needs to be ripped out and replaced with bytes paths, to be honest, now that CPython 3.5 is EOL.