pypa / setuptools

Official project repository for the Setuptools build system
https://pypi.org/project/setuptools/
MIT License
2.49k stars 1.19k forks source link

package data in subdirectory causes warning #3340

Open isuruf opened 2 years ago

isuruf commented 2 years ago

setuptools version

62.3.2

Python version

3.10

OS

Debian with conda

Additional environment information

No response

Description

pyopencl has OpenCL files and some headers in a subdirectory pyopencl/cl and they are included as package_data so that the python module can find them.

package_data={
                    "pyopencl": [
                        "cl/*.cl",
                        "cl/*.h",
                        "cl/pyopencl-random123/*.cl",
                        "cl/pyopencl-random123/*.h",
                        ]
                    },

With new setuptools, there is a warning saying


    ############################
    # Package would be ignored #
    ############################
    Python recognizes 'pyopencl.cl' as an importable package, however it is
    included in the distribution as "data".
    This behavior is likely to change in future versions of setuptools (and
    therefore is considered deprecated).

    Please make sure that 'pyopencl.cl' is included as a package by using
    setuptools' `packages` configuration field or the proper discovery methods
    (for example by using `find_namespace_packages(...)`/`find_namespace:`
    instead of `find_packages(...)`/`find:`).

    You can read more about "package discovery" and "data files" on setuptools
    documentation page.

cc @inducer

Expected behavior

No warning

How to Reproduce

  1. clone https://github.com/inducer/pyopencl
  2. install numpy
  3. Run python setup.py install

Output

$ python setup.py install
running install
/home/idf2/miniforge3/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
/home/idf2/miniforge3/lib/python3.10/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
running bdist_egg
running egg_info
writing pyopencl.egg-info/PKG-INFO
writing dependency_links to pyopencl.egg-info/dependency_links.txt
writing requirements to pyopencl.egg-info/requires.txt
writing top-level names to pyopencl.egg-info/top_level.txt
reading manifest file 'pyopencl.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE'
writing manifest file 'pyopencl.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
/home/idf2/miniforge3/lib/python3.10/site-packages/setuptools/command/build_py.py:153: SetuptoolsDeprecationWarning:     Installing 'pyopencl.cl' as data is deprecated, please list it in `packages`.
    !!

    ############################
    # Package would be ignored #
    ############################
    Python recognizes 'pyopencl.cl' as an importable package, however it is
    included in the distribution as "data".
    This behavior is likely to change in future versions of setuptools (and
    therefore is considered deprecated).

    Please make sure that 'pyopencl.cl' is included as a package by using
    setuptools' `packages` configuration field or the proper discovery methods
    (for example by using `find_namespace_packages(...)`/`find_namespace:`
    instead of `find_packages(...)`/`find:`).

    You can read more about "package discovery" and "data files" on setuptools
    documentation page.

!!

  check.warn(importable)
running build_ext
shakfu commented 8 months ago

It does not seem good if docs have to recommend the definition of namespace packages for projects that are not using namespace packages.

This is the essence of the issue: the deprecation makes it necessary to use a more complex solution (find_namespace_packages) to a problem (bundling a data folder in one's package) that was previously relatively simple to resolve.

My case above is illustrative.

abravalheri commented 8 months ago

It does not seem good if docs have to recommend the definition of namespace packages for projects that are not using namespace packages.

@merwok, please note that if a distribution install a directory nested somewhere under sys.path and this directory that does not contain an __init__.py, then that is indeed a namespace package, and the project is, effectively, using namespace packages. You can use your REPL to check that the directory is importable and the REPL will show you something like <module '...' (namespace)> if you print the object you imported.

The assumption that certain directories that don't contain Python files are exempt of this behaviour is not backed by the implementation, and as far as I know, by the Python docs (the PEP introducing implicit namespaces pretty much says the opposite).


The warning message discussed in this issue has to deal with 2 situations:

  1. When the developer does want such directories to be included in the installation and is trying to rely on include_package_data to do so, but is forgetting to list such directories in packages (therefore, it is providing an incorrect configuration that fails match file structure and intent).
  2. When the developer does not want certain directories to be included and is specifying the correct configuration to do so (either by manually listing packages or by using find_namespace_packages(..) with the include/exclude options, or by consciously relying on the fact that find_packages(...) skip directories without __init__.py). At the same time, the developer is correctly specifying include_package_data such that files in other directories are included.

Currently there is a bug in setuptools that will incorrectly handle both situations.

For situation 1, setuptools will include packages that are not listed in packages, and therefore fail to fulfil the configuration that has been passed.

For situation 2, setuptools will not skip certain directories that are intentionally omitted from packages, and therefore also fail to fulfil the configuration that has been passed.

To fix this bug we need the warning message to bring awareness for users that fall into category 1: that their configuration does not match the intent, and for users that fall into category 2: that they will need to use a workaround while the fix has not been implemented yet.

merwok commented 8 months ago

the project is, effectively, using namespace packages

I understand that, but my viewpoint is about author intent. In the examples here the developers have regular packages, source files and data files. They need a clean way to have their packages packaged up and installed, and the data files also packaged up.

abravalheri commented 8 months ago

They need a clean way to have their packages packaged up and installed, and the data files also packaged up.

The clean way is to list any package they are using to host data files in the packages directive. This mental model is aligned with the Python implementation, docs, and work great with importlib.resources. There is no need for differentiating between types of directories, that is only conceptual overhead that does not really match the way Python works.