pypa / setuptools

Official project repository for the Setuptools build system
https://pypi.org/project/setuptools/
MIT License
2.38k stars 1.16k forks source link

package data in subdirectory causes warning #3340

Open isuruf opened 2 years ago

isuruf commented 2 years ago

setuptools version

62.3.2

Python version

3.10

OS

Debian with conda

Additional environment information

No response

Description

pyopencl has OpenCL files and some headers in a subdirectory pyopencl/cl and they are included as package_data so that the python module can find them.

package_data={
                    "pyopencl": [
                        "cl/*.cl",
                        "cl/*.h",
                        "cl/pyopencl-random123/*.cl",
                        "cl/pyopencl-random123/*.h",
                        ]
                    },

With new setuptools, there is a warning saying


    ############################
    # Package would be ignored #
    ############################
    Python recognizes 'pyopencl.cl' as an importable package, however it is
    included in the distribution as "data".
    This behavior is likely to change in future versions of setuptools (and
    therefore is considered deprecated).

    Please make sure that 'pyopencl.cl' is included as a package by using
    setuptools' `packages` configuration field or the proper discovery methods
    (for example by using `find_namespace_packages(...)`/`find_namespace:`
    instead of `find_packages(...)`/`find:`).

    You can read more about "package discovery" and "data files" on setuptools
    documentation page.

cc @inducer

Expected behavior

No warning

How to Reproduce

  1. clone https://github.com/inducer/pyopencl
  2. install numpy
  3. Run python setup.py install

Output

$ python setup.py install
running install
/home/idf2/miniforge3/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
/home/idf2/miniforge3/lib/python3.10/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
running bdist_egg
running egg_info
writing pyopencl.egg-info/PKG-INFO
writing dependency_links to pyopencl.egg-info/dependency_links.txt
writing requirements to pyopencl.egg-info/requires.txt
writing top-level names to pyopencl.egg-info/top_level.txt
reading manifest file 'pyopencl.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE'
writing manifest file 'pyopencl.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
/home/idf2/miniforge3/lib/python3.10/site-packages/setuptools/command/build_py.py:153: SetuptoolsDeprecationWarning:     Installing 'pyopencl.cl' as data is deprecated, please list it in `packages`.
    !!

    ############################
    # Package would be ignored #
    ############################
    Python recognizes 'pyopencl.cl' as an importable package, however it is
    included in the distribution as "data".
    This behavior is likely to change in future versions of setuptools (and
    therefore is considered deprecated).

    Please make sure that 'pyopencl.cl' is included as a package by using
    setuptools' `packages` configuration field or the proper discovery methods
    (for example by using `find_namespace_packages(...)`/`find_namespace:`
    instead of `find_packages(...)`/`find:`).

    You can read more about "package discovery" and "data files" on setuptools
    documentation page.

!!

  check.warn(importable)
running build_ext
abravalheri commented 2 years ago

Hi @isuruf thank you for bringing this subject for discussion.

This change (and the warning) was intentional, so in the future we will be able to solve a related problem with find_packages(..., exclude=...) and include_package_data.

Since PEP 420, there is no way for really differentiating a folder from a package in Python... People that install pyopencl will be able to successfully run import pyopencl.cl, which means that the interpreter does consider properly.cl a package. Nevertheless configuration file seems to want to consider cl a "data directory", which is not currently a well stablished concept in the ecosystem...

Would it possible for you guys to use find_namespace_packages() as indicated in the warning message?

jwodder commented 2 years ago

I'm facing the same problem, and I'm not clear on the nature of the change suggested by @abravalheri. Are you saying that directories in the package hierarchy that contain only data files and no Python code should be included in the project's list of packages despite not actually being Python packages?

abravalheri commented 2 years ago

Hi @jwodder, thank you very much for the input. Please find my comments bellow:

directories in the package hierarchy that contain only data files and no Python ... not actually being Python packages

I think that this assumption[^1] no longer holds.

According to PEP 420, package is a term that "refers to Python packages as defined by Python’s import statement". An evidence that a directory without Python code is treated by the Python import statement as a package is given by the following snippets:

rm -rf /tmp/example
mkdir -p /tmp/example/mypkg1
touch /tmp/example/mypkg1/file.txt
mkdir -p /tmp/example/mypkg2/subpkg
touch /tmp/example/mypkg2/__init__.py
touch /tmp/example/mypkg2/subpkg/file.txt
python3.10
>>> import sys
>>> sys.path.append("/tmp/example")
>>> import mypkg1
>>> mypkg1
<module 'mypkg1' (<_frozen_importlib_external._NamespaceLoader object at 0x7f3683fac2e0>)>
>>> mypkg1.__path__
_NamespacePath(['/tmp/example/mypkg1'])
>>> mypkg1.__loader__.is_package('mypkg1')
>>> import mypkg2.subpkg
>>> mypkg2.subpkg.__path__
_NamespacePath(['/tmp/example/mypkg2/subpkg'])
>>> mypkg2.subpkg.__loader__.is_package('mypkg2.subpkg')
True

Note in this example that both mypkg1 and mypkg2.subpkg can be imported, and are treated as regular packages (both are imported as module objects with the attribute __path__ set), despite not containing any Python file. I added a mypkg.subpkg to the example to demonstrate that it does not matter whether or not the directory is nested inside another "traditional package".

Moreover, the ecosystem also considers those folders, as demonstrated by the implementation of importlib.resources:

>>> import importlib.resources
>>> importlib.resources.contents('mypkg1')
['file.txt']
>>> importlib.resources.contents('mypkg2.subpkg')
['file.txt']

I understand that not all tools in the ecosystem embrace PEP 420 by default and that sometimes it might be frustrating to see tools evolving some interfaces evolving. That is the reason why I decided to add a deprecation warning first and a transition period, instead of simply changing things in a major version bump.

This change in setuptools was implemented due to a single, very pragmatic, objective: to fix internal inconsistencies (as presented in #3260), which are caused by this division between packages and directories that don't contain Python code (which is not a division recognized by Python's import system).

For the time being, I don't plan on removing the deprecation warning, since the only way I know how to fix the internal inconsistencies requires us to eliminate this arbitrary division. However if anyone in the community finds a different way of solving the problem and are willing to provide a backward-compatible PR, I am more than happy to consider an alternative.

[^1]: Namely the assumption that a directory with no Python code is not a package.

jwodder commented 2 years ago

@abravalheri So you are saying that "data-only packages" should be included in the project's list of packages? Is that the only change you're currently recommending to authors of such projects? Does it matter whether the data files are included via package_data or MANIFEST.in+include_package_data?

abravalheri commented 2 years ago

Yes, I recommend adding all the sub-directories to the list of packages, even if they only include data files. On the bright side that can be done with find_namespace_packages (setup.py) or find_namespace: (setup.cfg).

It should make no difference if you are using package_data or MANIFEST.in (although I am not sure if users are going to see the warning if they don't use include_package_data=True).

abravalheri commented 2 years ago

Do you have a suggestion about how to improve the warning message to make it more clear?

jwodder commented 2 years ago

@abravalheri I find the first paragraph of the warning to be a bit confusing. I would rewrite it from:

Python recognizes '...' as an importable package, however it is included in the distribution as "data". This behavior is likely to change in future versions of setuptools (and therefore is considered deprecated).

to something more like:

Python recognizes '...' as an importable package, but it is not listed in "packages". It is included in the distribution because it contains package data, but this behavior is likely to change in future versions of setuptools (and therefore is considered deprecated).

cdfarrow commented 2 years ago

I was following this issue because I was very confused (as a newbie) by the warning that came up. Can I suggest a further edit to the warning message: Currently {importable!r} is only added to the distribution because it may contain data files... --> Currently {importable!r} has been automatically added to the distribution because it may contain data files... This is my understanding of what is actually happening (i.e. automatic inclusion). If this suggested change is wrong, then I'm still not following how this works...

abravalheri commented 2 years ago

Thank you very much @cdfarrow, this suggestion looks good to me. I will implement it probably later this week.

mhkline commented 2 years ago

How about the case where we want *.py files in a subdirectory included as data files? This occurs in a couple projects for me that define a plug-in interface. Basically there are a couple python files in a package subdirectory that I provide as examples of plug-in implementation. These are also loaded through the common plug-in interface via importlib.

Will the inclusion as a namespace package still work in this case? Anything special to be aware of?

If it makes a difference, I'm using the MANIFEST.in method with include_package_data=True.

Also, I created this SO question about this issue when I first encountered it. I guess I should have come here first! :)

abravalheri commented 2 years ago

Hi @mhkline. The warning is not about the files themselves, but about the directory. Right now there is no concept of a "data directory" for the package ecosystem.

Since PEP 420, effectively all directories are packages regardless of containing a __init__.py file or not. With this warning, my intention is to align the expectations of the users with the behaviour we observe in Python.

If you want the directory to be included in the distribution, you can include it via the packages= configuration. find_namespace_packages() in setup.py or find_namespace: in setup.cfg will do that for you, and probably make the warning go away.

kandersolar commented 2 years ago

The above discussion covers how to go about building a correct distribution without using deprecated setuptools functionality. Thanks all for that!

My question here is more philosophical than practical. It seems like setup(packages=...) and MANIFEST.in are at least partially overlapping in functionality. In our MANIFEST.in we specify exactly what files and directories to include in distribution bundles. It seems redundant to have to specify similar information (the directories part) via the packages= configuration. Couldn't that list of packages theoretically be inferred from the MANIFEST.in contents?

abravalheri commented 2 years ago

Hi @kanderso-nrel , you can remove the packages= configuration and setuptools will try to infer which packages are provided.

There are a few limitations:

merwok commented 1 year ago

Yes, I recommend adding all the sub-directories to the list of packages, even if they only include data files.

I find this unexpected and unintuitive. Why make a namespace package when you have no need or intent to use it as a namespace package? IMO cleaner recommendations could be:

This topic may be worth discussing with other tools maintainers on the packaging forum.

abravalheri commented 1 year ago

Why make a namespace package when you have no need or intent to use it as a namespace package?

Hi @merwok , since the Python runtime will effectively consider any directory without a __init__.py a namespace package, if a dev organizes files in such a directory, the dev is effectivelly making use of a namespace package... If a dev have a use for directories without a __init__.py, the dev will automatically have an use for a namespace package...

Am I understanding corectly here? Do you object about listing existing namespace packages that already exist in a project in the packages configuration?

My suggestion was to simply align the configuration to the way Python effectively interprets things...

(Of course if the community wants to change how this works a discussion under the Packaging topic in the Python discourse is always a good approach).

merwok commented 1 year ago

My point is that the dev is not making use of a namespace package because it is not intended that pyopencl.cl can be extended by other distributions providing modules in that namespace. It really is a directory containing data files.

olemoign commented 1 year ago

I'm faced with the inverse situation: my packages= and MANIFEST.in are well defined, with exactly the files I want included in my wheel. There are files within the package that I don't want to see included (e.g. tests, sass files, non minified js, etc.). Now, setuptools adds them to the wheel, with no "Stop doing this, I know what I'm doing" flag that I can see. Am I missing something?

olemoign commented 1 year ago

I get the same behavior if, instead of specifying packages, I try to find then exclude the unwanted packages. :cry:

abravalheri commented 1 year ago

Hi @olemoign, can you share a link to your project source code (or a similar reproduction)? I would need to have a look on what is the configuration/file structure to understand what is going on. In theory if well specified MANIFEST.in and [options] packages= should prevent files from being added...

You can open a new discussion topic with the extra information to avoid "overloading/hijacking" this issue thread.

pombredanne commented 1 year ago

@abravalheri I stumbled on a bunch of build warning burried in CI logs that I was reviewing randomly and I am puzzled..... can you articulate what end-user benefit do you expect with this change? (e.g. package maintainers that rely on setuptools) ?

Personally I do not think such as warning can be easily seen. My wheels contains thousands of files and the warnings are just drowned in CI log files never looked at unless the build fails. So I am not convinced that this warning would have much effect.

You wrote:

Right now there is no concept of a "data directory" for the package ecosystem.

IMHO the current behaviour is the de-facto way that package maintainers understand and have grown to rely on. e.g. when you "include_package_data" anything (file or dir) in the tree of included packages is included.

Since PEP 420, effectively all directories are packages regardless of containing a init.py file or not. With this warning, my intention is to align the expectations of the users with the behaviour we observe in Python.

What is the Python behaviour there beyond the fact that files in the package tree are accessible? I could not find anything about data files or data directories mentioned in PEP 420.

Now, the proposed future behaviour does not seem entirely consistent: when there are data files in a directory with Python code (either a legacy init-style or "namespace" package) these are included but data files in a subdirectory of the same would be not included, e.g., some data files would need an intervention and some data files would not? Unless a subdir of a package dir is not a Python identifier (e.g. with a dash as in "foo-bar"), and then this is included without warning.

So if I understand the to-be behaviour correctly based on deprecation messages this would mean this (assuming in all cases that include_package_data is True):

I am not sure that this would contribute to a better and consistent user experience.

abravalheri commented 1 year ago

can you articulate what end-user benefit do you expect with this change? (e.g. package maintainers that rely on setuptools)?

Right now the configurations debated here work in a "grey area" and mess with the core Python concept of a package. By changing it, I hope to make things easier to understand, obvious and more well-defined. I also plan to make it possible for users to do things like packages=find_packages(exclude=["mypkg.tests*"]) while introducing a more clear behaviour of include_package_data=True.

IMHO the current behaviour is the de-facto way that package maintainers understand and have grown to rely on. e.g. when you "include_package_data" anything (file or dir) in the tree of included packages is included.

I am afraid that is not a given. Some package maintainers may understand things in this way, but the reality is that it is confusing and that there is not such a concept in the Python ecosystem of a "data directory". All directories are packages, and setuptools does have a configuration that captures this concept: packages. If a directory does not have a corresponding entry in packages, this should mean that the developer does not want that package to be included in the distribution.

What is the Python behaviour there beyond the fact that files in the package tree are accessible?

The Python behaviour is to consider such directories packages. You can even import them.

Now, the proposed future behaviour does not seem entirely consistent: when there are data files in a directory with Python code (either a legacy init-style or "namespace" package) these are included but data files in a subdirectory of the same would be not included, e.g., some data files would need an intervention and some data files would not?

I am failing to see the inconsistency. If we stop thinking for a moment in terms of "sub-directories", we can see that when include_package_data is True, all non-Python files that inhabit a package listed in the packages configuration will be included. If your "sub-directory" has a corresponding entry in packages, its files should be included also. Sub-directories are only different packages.

Unless a subdir of a package dir is not a Python identifier (e.g. with a dash as in "foo-bar"), and then this is included without warning.

To be sincere, I wanted also to have a warning for those cases. But I cannot sustain the argument that those packages should also be listed in the packages configuration, because they are not "importable" using an from ... import <name of the directory here> statement. Maybe in the future we can also make this change based on the fact that they are importable with importlib.import_module, but this is a weaker argument.

So if I understand the to-be behaviour correctly based on deprecation messages this would mean this (assuming in all cases that include_package_data is True) ...

I think it is easier to think in the following terms:

I am not sure that this would contribute to a better and consistent user experience.

Maybe we disagree here, but my intention is to have less "grey area" and make the configuration more obvious. I argue that requiring a corresponding entry in packages for all the directories that can be successfully imported is clearer and more predictable.


The key take away is that the existing configuration mechanisms for package/package data work in non-obvious ways and mix a lot of concepts. The intention of the change is to better delineate the contours of these concepts and make things more obvious.

The only way that I can think of to make this happen without completely changing the configuration system is to improve the boundaries between the package and the include_package_data configurations.

Alternatively, we could revamp the configuration system and think about other concepts that could be easier to deal with. I am very receptive of this idea, but it is not in my priorities right now. If anyone else would like to champion this, I think it would be a positive change.

milesgranger commented 1 year ago

@abravalheri

...I also plan to make it possible for users to do things like packages=find_packages(exclude=["mypkg.tests*"]) while introducing a more clear behaviour of include_package_data=True.

This is the situation I'm in. I've made an example repo here. We have tests inside the package, and want to (conditionally) exclude it from distribution. But seems patterns in exclude are ignored anyway getting the aforementioned warning and tests package is added.

Practically, we're evaluating moving tests out next to package and conditionally including them with some extra logic in setup.py, but curious if it's worth the effort.


Edit: clarify this is a bit confusing, due to the example of package discovery, specifically of this use case

abravalheri commented 1 year ago

@milesgranger you are correct. This is a bug (https://github.com/pypa/setuptools/issues/3260).

The problem is that we cannot resolve this bug without first deprecating and removing the behaviour described in this issue (you can see that there is a lot of people depending on it yet...).

I suppose you can have a workaround by one of the following:

Sorry for the trouble, if we change things right now, several projects in the ecosystem might break (so we have to go through the deprecation period).

jorisvandenbossche commented 1 year ago

@abravalheri thanks for the response! That are useful workarounds

I just wanted to give some feedback on the current warning about what caused us some confusion (colleague of @milesgranger, so based on the same example from the above comment). So we run into the issue that find_(namespace)_packages(exclude=["mypkg.tests*"]) doesn't work if also including package data. I certainly understand the reason for this now, but so the warning we currently see:

Python recognizes 'package.tests' as an importable package, but it is not listed in the packages configuration of setuptools.

'package.tests' has been automatically added to the distribution only because it may contain data files, but this behavior is likely to change in future versions of setuptools (and therefore is considered deprecated).

Please make sure that 'package.tests' is included as a package by using the packages configuration field or the proper discovery methods (for example by using find_namespace_packages(...)/find_namespace: instead of find_packages(...)/find:).

A few confusing aspects:

(and to be clear: I am a package author myself having to deal with trying to write informative warnings regularly, I certainly know it is not always easy or even possible to provide the perfect warning for each and every use case that might trigger it .. ;))

abravalheri commented 1 year ago

Thank you very much for the feedback @jorisvandenbossche. Indeed the scenario you are describing is very trick. The deprecation was introduced to specifically solve this problem, but we have to wait the deprecation period before the problem can be 100% solved... (backward bug-compatibility).

Let me try to clarify somethings in your last post, and then maybe we can work in a better warning message togheter?

The message suggests to use find_namespace_packages instead of find_packages, but you also get the same warning if you actually use find_namespace_packages (and using that also doesn't fix the specific issue we have)

Your case is special because the package package.tests is intentionally excluded, this warning message should not apply to you.

In your case something like the following should be better:

**NOTE**: If you are already explicitly excluding `package.tests` via
`find_namespace_packages(...)/find_namespace` or `find_packages(...)/find`,
you can try instead to use `include-package-data=False` in combination with 
a more fine grained `package-data` configuration. This limitation will likely
to be solved in the future once the deprecation period is over.

Would adding this paragraph help?

this module would actually be included as well simply because it is a python module.

If you are setting exclude (and include-package-data=False), this module would not be included, right?

if the directory at hand is a python module.

All directories are always Python packages, even if they don't include a Python file.

jorisvandenbossche commented 1 year ago

In your case something like the following should be better: .. Would adding this paragraph help?

Yes, that would certainly help. Although in our case we resorted to using exclude_package_data, which might be a bit easier to do (as you just need to specify there what you also have in exclude of find_packages) than using include_package_data=False and having to update how actual data gets included.

this module would actually be included as well simply because it is a python module.

If you are setting exclude (and include-package-data=False), this module would not be included, right?

In our case, yes, because we explicitly had exclude=["package.tests*"]. But when you don't use exclude, then this submodule would be included. Now, I see that in that case, you don't get this warning, so then indeed my point is moot (since there is no warning in that case, it can't cause the confusion that I imagined to have in that case ;))

Maybe part of the confusion also comes from "but it is not listed in the packages configuration of setuptools." in the first sentence. Maybe that also could include something like "not listed in .. or is excluded"? (I know that because it is excluded, it's not listed in the packages, but that might make it more obvious that this warning can have to do with trying to exclude a package)

if the directory at hand is a python module.

All directories are always Python packages, even if they don't include a Python file.

Yes, I understand that now. But I think that's not the mental model of most people reading this warning ..

BTW, it's also not fully clear to me what will be the future behaviour or what change. On the one hand the warning is saying that the package was included because it may contain data files and that this might change in the future (so in the future it might not be included by default?), while on the other hand you clearly say that any directory is a python module, so I would expect that in the future also any python module, i.e. any directory, would be automatically included?

vkottler commented 1 year ago

I tried switching over to find_namespace_packages, seemed to work. Now in another project I'm noticing that the package build is discovering all sorts of random things in my checkout - things not even in my package-source entry point.

What tells setuptools to not have find_namespace_packages looking for an entire repository checkout for Python files?

edit: I am already explicitly excluding tests and tests.* (source) but it's not clear how to configure the right behavior here

vkottler commented 1 year ago

I have spent a few minutes trying a src layout but I can't get an editable pip install to work.

edit:

pkg_name = "my-package"

# Obviously in "flat" layout you place files under the package's slug if its name includes a dash.
pkg_slug = pkg_name.replace("-", "_")

setup(
    ...
    packages=setuptools.find_namespace_packages(
        include=[pkg_slug, pkg_slug + ".*"],
        exclude=["tests", "tests.*"],
    ),
    ...
)

Something like this appears to work for me. Where a typical project is structured like:

README.md
setup.py
my_package/
    __init__.py
    __main__.py
    data/
        data1.yaml
        data2.yaml
    ...
tests/
    __init__.py
    test_entry.py
    ...

It's not really clear to me from the documentation what setuptools actually wants. I think where this software fails is in making it clearer what the nominal package structure is. It's nice that you can kind of do whatever you want and make it work but most of us shipping software out into the wild would rather just conform to something that "just works" and move on. I'm not really seeing that solution emerge out of the discussion or the documentation.

merwok commented 1 year ago

1: your project structure does not use src 2: you are missing pyproject.toml (I forget if pip requires it for editable installs) 3: you have a real package, why use find_namespace_packages?

vkottler commented 1 year ago

3: you have a real package, why use find_namespace_packages?

@merwok Suggest reading the thread if this is not clear to you. Having data directories in a package without __init__.py files is deprecated (if you use the regular find_packages method). It's now required to use find_namespace_packages to include package data directories as packages to squelch a deprecation warning.

A setup.py file is generally the requirement for editable installs (why? no idea). I resolved my own issue but anticipate more in the future as the Python ecosystem bike-shedding continues.

tim-mitchell commented 1 year ago

From my perspective, I have specified
package_data={'my_package': ['data_folder/*.*']} the intention is clear that I consider it a folder of data files of my_package, I don't really care that python considers it a namespace package. It would be really nice if setuptools added the namespace packages for me instead of giving a warning.

lrq3000 commented 1 year ago

I have been stumped for hours by this issue. I just want to add data folders to my package, and there is no clear instructions to do this. How can we specify in a pyproject.toml which folders we want to recursively include (and especially how to specify a nested folder from where to start looking)? The documentation does not provide any example for nested structures.

EDIT: Also this issue would be a potential solution if implemented: https://github.com/pypa/setuptools/issues/3341

EDIT2: For example, here is my repository producing (lots of) warnings: https://github.com/lrq3000/pyFileFixity/tree/ea447c548c9b736ea3c2c76bafa61bf1b51af4ca

And YES I can suppress the warnings with auto discovery by removing all the content of tool.setuptools.packages.find, but I do not want to rely on a beta feature! I want to manually specify my project's structure, I prefer to know what I am doing and to be explicit, especially for something as crucial as packaging, it needs to be very deterministic and future-proof.

lrq3000 commented 1 year ago

I found a solution, I don't even know if it's the correct one and I don't understand it:

Here was how my pyproject.toml package section was configured:

[tool.setuptools.packages.find]
where = [""]
include = ["mypkg"]

Here is what fixed the horde of warnings:

[tool.setuptools.packages.find]
where = [""]
include = ["mypkg*"]

Why is a wildcard necessary to include mypkg, when mypkg is the folder name of my package? If I had mypkg1 and mypkg2, I would understand, it would be like the doc example, but here it's just mypkg, so a wildcard should not be necessary?

Dmitri-Sintsov commented 1 year ago

I found a solution, I don't even know if it's the correct one and I don't understand it:

Here was how my pyproject.toml package section was configured:

[tool.setuptools.packages.find]
where = [""]
include = ["mypkg"]

Here is what fixed the horde of warnings:

[tool.setuptools.packages.find]
where = [""]
include = ["mypkg*"]

Why is a wildcard necessary to include mypkg, when mypkg is the folder name of my package? If I had mypkg1 and mypkg2, I would understand, it would be like the doc example, but here it's just mypkg, so a wildcard should not be necessary?

Maybe it should be include = ["mypkg", "mypkg.*"]? I also do not like non-Python data to be treated as packages with deprecations.

zzzeek commented 1 year ago

hey there -

why does this warning refer to find_namespace when the current documentation indicates that find_namespace is only for namespace packages ? can this documentation please be updated to indicate it now has another use case for packages that are explicitly not namespace packages also ?

setuptools provides find_namespace: (find_namespace_packages()) which behaves similarly to find: but works with namespace packages.

should read something like:

setuptools provides find_namespace: (find_namespace_packages()) which behaves similarly to find: but works with namespace packages; additionally, it allows one to indicate specific file paths to be included such as XYZ when including datafiles etc etc

or otherwise can some document please be added that exactly explains the situation the warning is detecting and how we are to treat this situation as a "namespace package". Projects with datafiles are ubiquitous and it's not reasonable to launch a new warning that refers to off-label use of some obscure feature of setuptools in a vague way as how to resolve.

lrq3000 commented 1 year ago

Maybe it should be include = ["mypkg", "mypkg.*"]? I also do not like non-Python data to be treated as packages with deprecations.

This allows the creation of the sdist with all the non-Python data files, but the wheel building fails with a ModuleNotFoundError: mypkg (using the pypa/build tool). Only include = ["mypkg*"] works for both the sdist and wheel building (and also non-Python data files are correctly included in both).

leycec commented 1 year ago

Insane lunatics who are publicly disembowelling setuptools while distraught children are franticly screaming, please stop publicly disembowelling setuptools while distraught children are franticly screaming.

This means you, @abravalheri – and everyone else with setuptools push authority who mistakenly believed that Masaki Kobayashi's seminal masterpiece "Harakiri" was not, in fact, a brutal dissection of authoritarian militancy under entrenched hierarchy but instead an exemplary paragon of software best practices in the enlightened post-Python 2.7 era.

So, @leycec. Bro. What's The Big Deal, Yo?

Saliently, replacing find_packages() with find_namespace_packages() is not a valid solution for most projects. Why? Because find_namespace_packages() erroneously matches all repository subdirectories1 in the root repository as importable packages.

1. ...possibly excluding dot directories, not like it particularly matters. </shrug>

Clearly, most repository subdirectories in the root repository are not importable packages; they're superfluous workflow subdirectories like {package_name}.egg-info/, build/, dist/, docs/, pip-wheel-metadata/, and the list just goes on and on. Moreover, the set of these subdirectories significantly changes over time and is almost entirely outside the control of downstream developers. Explicitly listing these subdirectories in the exclude parameter to find_namespace_packages() is thus infeasible. Therefore, replacing find_packages() with find_namespace_packages() is not a valid solution for most projects.

So, @leycec. Bro. How Did You Fix This?

Thanks. I'm so glad you asked. The obvious solution is just to abandon setuptools.

That's what everybody else has done. But I'm curmudgeonly. I have a grubby beard and live in a mildew-infested cabin in the Canadian wilderness. People like me are disinclined to do what we should. Instead, I did what I shouldn't.

I continue using setuptools despite its repeated outbursts of insanity. In this case, I compelled setuptools to obey my perfidious will via ludicrous boilerplate which I will now copy-and-paste into every Python project I maintain – much to the shared agony of junior developers and my wife, who must now suffer my delusions in silence. This is that boilerplate:

# In "setup.py":
PACKAGE_NAME = 'your_package_name_here'  # <-- edit this, you who are sadly reading this and now contemplating watching "Harakiri" on a Friday at 2:32AM despite knowing that to be a very bad idea

# Ludicrous boilerplate: I summon you!
import os, setuptools

# Do you remember when setuptools just worked? Because @leycec remembers.
_PACKAGE_NONDATA_NAMES = setuptools.find_packages(exclude=(
    'test',
    'test.*',
))
'''
List of the fully-qualified names of all **non-data Python packages** (i.e.,
directories containing the standard ``"__init__.py"`` file and zero or more
Python modules) to be installed, including the top-level application package and
all subpackages of this package but excluding the top-level test package and
all subpackages of that package.
'''

# This assumes your package data lives in a "data/" subdirectory of your package.
# If your package data lives elsewhere, it probably shouldn't.
_PACKAGE_DATA_NAMES = [
    f'{PACKAGE_NAME}.data.{package_data_name}'
    for package_data_name in setuptools.find_namespace_packages(
        where=os.path.join(PACKAGE_NAME, 'data'))
]
'''
List of the fully-qualified names of all **data Python pseudo-packages** (i.e.,
directories containing *no* standard ``"__init__.py"`` file and zero or more
data paths) to be installed.

Note that this is largely nonsensical. Ideally, a project subdirectory
containing *no* standard ``"__init__.py"`` file would be transparently treated
by both Python itself and :mod:`setuptools` as an unimportable non-package
directory rather than an importable package. Ideally, *only* directories
containing ``"__init__.py"`` files would be treated as importable packages.
Sadly, :pep:`420` (i.e., "Implicit Namespace Packages") fundamentally broke this
reasonable expectation by unconditionally forcing *all* project subdirectories
to be importable packages regardless of developer wants, needs, or expectations.

:mod:`setuptools` now "complies" with this nonsense by requiring that data
directories by explicitly listed as namespace packages. Of course, data
directories are *not* namespace packages -- but nobody in either the official
PyPA or CPython communities appears to care. If this is *not* done,
:mod:`setuptools` now emits one deprecation warning for each data subdirectory
and file resembling:

    Installing '{data_path}' as data is deprecated, please list it in `packages`.

Lastly, note that we could also avoid this unctuous list comprehension
altogether by simply replacing the above call to
:func:`setuptools.find_packages` with
:func:`setuptools.find_namespace_packages`. Then why do we not do so? Because
doing so would make things even worse. Why? Because then :mod:`setuptools` would
erroneously match *all* subdirectories of this root repository directory as
importable packages to be installed -- including obviously irrelevant root
subdirectories like ``"{package_name}.egg-info"``, ``".github"``, ``".github"``,
``"doc"``, and ``"pip-wheel-metadata"``. Since the set of all such
subdirectories frequently changes with upstream revisions beyond our control,
explicitly specifying this set by listing these ignorable subdirectories in an
``exclude`` parameter is infeasible. In short, this is the least bad thing.

See Also
----------
https://github.com/pypa/setuptools/issues/3340
    Upstream :mod:`setuptools` issue where :mod:`setuptools` casually admit to
    breaking their entire toolchain for no demonstrably good reason.
'''

# You're welcome.
setup(
    ...
    packages=_PACKAGE_DATA_NAMES + _PACKAGE_NONDATA_NAMES,
)

I didn't make insanity. I only break it over my arthritic knee.

So, @leycec. Bro. Could You Like Stop Talking?

The party ends abruptly when @leycec walks through the door. The silence is deafening. I'm pretty sure the silence gave me tinnitus. Since everyone fled, I'll say one last thing to the empty room:

Continually waving your hands about while screeching "PEP 420 made us stab ourselves in our eyeballs!!!!!" is no valid justification for stabbing everyone else in their eyeballs, too.

abravalheri commented 1 year ago

For those who have a long (potentially evolving) list of exclude in find_namespace_packages, using include is probably an easier solution.

di commented 1 year ago

Hi @leycec, I understand you're frustrated with the changes here, but please try be more more respectful and considerate in your communication in the future.

As a project of the PyPA, everyone in this issue tracker is expected to follow the Python Community Code of Conduct. The expectation is that everyone interacting here should be courteous when raising issues and disagreements.

Specifically, calling the setuptools maintainers "insane lunatics" is unacceptable: it's not constructive, it's not welcoming or inclusive, and it easily qualifies as harassment.

Additionally, your previous comments in this issue tracker ("Heads need rolling (especially those currently attached to the still-functioning torsos of managerial project leads)") is a clear example of violent language directed against another person, and is also unacceptable.

In short: we don't do this here. You are welcome to continue participating in this project, but if you continue to violate the code of conduct here, you will no longer be permitted to participate.

webknjaz commented 6 months ago

@abravalheri hey, I skimmed through the comments above and I think I may have a case that hasn't been brought up explicitly (apologies if I missed something!)

It's about directories that aren't supposed to end up in wheels and be installed, but should be in sdists. Specifically, multidict has C header files in a subdirectory of a Python importable package. And setuptools complains about it. These files are obviously included from C extension sources and are used by the compiler when building wheels from Git checkout or sdist. However, I wouldn't expect them to end up in the end-users' site-packages/ since they aren't technically data files in the sense that there's no need for them in runtime whatsoever — they are part of the sources for sdist and that's it.

In https://github.com/aio-libs/multidict/pull/829 somebody suggested listing that as a package, which doesn't appear to be a correct solution due to the reasons I mentioned above.

Do you have any go-to suggestion for suppressing the warning in such cases?

webknjaz commented 6 months ago

Do you have any go-to suggestion for suppressing the warning in such cases?

UPD: turned out that it was include_package_data=True that confused setuptools. There was really no need for it and removing the setting fixed the problem.

@abravalheri I think that adding a hint suggesting to delete include_package_data=True and replace it with a fine-grained package-data mapping would be useful.

In my case, a more straightforward suggestion would've made it clearer what to do, because I wasn't the one to set up initial packaging in the project and had to attempt figuring out how and why things were set up that way.

Another source of confusion was the fact that the error message was unhappy with multidict._multilib, suggesting to use helpers like find_namespace_packages() while it did list the packages explicitly with no discovery needed, via a static list: packages=["multidict"]. This also confused a contributor who attempted adding "multidict._multilib" to that list, not understanding immediately that C-headers aren't to be installed in site-packages/ (or distributed in wheels, for that matter).

shakfu commented 6 months ago

@abravalheri

I appreciate your efforts to solve what is a quite challenging situation due to recent changes (PEP 420). You said very clearly early in this issue: "Since PEP 420, there is no way for really differentiating a folder from a package in Python."

But this problematic new state is not one that developers should learn to live with, they should not be compelled to adopt insecure quick fixes. The packaging distinction between code and data should be preserved. Because to do otherwise, to conflate between the two, and to make data executable by default, is to introduce additional security risks and to basically undermine the concept of having an "api".

The proposed fix which resolves the deprecation is apparently to use find_namespace_packages. But this is just wrong. Since namespace packages are meant to contain importable executable code.

In my current project which wraps a third party library. The directory structure is slightly more complex to accommodate two build variants: a default dynamically-linked build and an optional statically-linked build variant.

The default build's code is in src/cyfaust and the static build code is in src/static/cyfaust. The project consists of cython extensions and a significant amount of data which is placed in a resources directory hierarchy in the project root and softlinked into each build variant's folder so it can be picked up as below:

.
├── docs
├── include
├── resources
│    ├── architecture
│    │  ├── AU
│    │  ├── VST
│    │  ├── api
│    │  ├── daisy
│    │  ├── faust
│    │  ├── juce
│    │  ├── max-msp
│    │  ├── sam
│    │  ├── teensy
│    │  └── vcvrack
│    └── libraries
│        └── examples
├── scripts
├── src
│    ├── cyfaust
│    │  └── resources -> ../../resources
│    └── static
│        └── cyfaust
│               └── resources -> ../../../resources
└── tests

Within the resources folder structure (which is specified by the library I'm wrapping) there are 775 files, 10.4 MB in total. Most of these are templated source files and all are part of the thirdparty library itself. But all are precisely data, not to be imported in the project package, only to be used for the purposes of code generation.

The relevant part of setup.py captures this package code, data directory distinction simply and effectively:

# ...

if STATIC:

    # forces cythonize in this case
    subprocess.call("cythonize cyfaust.pyx", cwd="src/static/cyfaust", shell=True)

    with open("MANIFEST.in", "w") as f:
        f.write("graft src/static/cyfaust/resources\n")
        f.write("exclude src/static/cyfaust/*.cpp\n")

    extensions = [
        mk_extension("cyfaust.cyfaust", 
            sources=["src/static/cyfaust/cyfaust.pyx"] + RTAUDIO_SRC,
            define_macros=DEFINE_MACROS,
        ),
    ]

    setup(
        name='cyfaust',
        version=VERSION,
        ext_modules=cythonize(
            extensions,
            language_level="3",
        ),
        package_dir = {"cyfaust": "src/static/cyfaust"},
        packages=['cyfaust'],
        include_package_data=True
    )

# ----------------------------------------------------------------------------

else:

    with open("MANIFEST.in", "w") as f:
        f.write("graft src/cyfaust/resources\n")
        f.write("exclude src/cyfaust/*.cpp\n")

    extensions = [
        mk_extension("cyfaust.interp", 
            sources=["src/cyfaust/interp.pyx"] + RTAUDIO_SRC,
            define_macros=DEFINE_MACROS,
        ),
        mk_extension("cyfaust.common", ["src/cyfaust/common.pyx"]),
        mk_extension("cyfaust.signal",["src/cyfaust/signal.pyx"]),
        mk_extension("cyfaust.box", ["src/cyfaust/box.pyx"]),
    ]

    setup(
        name='cyfaust',
        version=VERSION,
        ext_modules=cythonize(
            extensions,
            language_level="3",
        ),
        package_dir = {"cyfaust": "src/cyfaust"},
        packages=['cyfaust'],
        include_package_data=True
    )

Right now, and everything just works as expected. The only problem I have is the deprecation warning and the fact that I'm put off by the proposed solution to use find_namespace_packages for the reasons given above.

May I suggest that the deprecation warning be dropped or even abbreviated somewhat until the data / code packaging distinction can be restored and preserved by some new PEP or other. It makes no sense to promote solutions, in the interim, which incorrectly conflate the two.

S

abravalheri commented 6 months ago

Hi @shakfu, thank you very much for sharing your thoughts. Please see my comments below:

[...] quite challenging situation due to recent changes (PEP 420) [...] But this problematic new state is not one that developers should learn to live with [...]

Please note that the "directory is a package" behaviour introduced in PEP 420 is actually quite old. The PEP has been approved in 19/Apr/2012 and the specified behaviour implemented in Python 3.3, over 10 years ago which is a lot in "software development years". It is safe to assume that this behaviour is stable and that at some point in their carreers Python developers will indeed learn what a Python package means and how adding a directory nested somewhere under one of the entries of sys.path corresponds effectively to create a Python package that can be regularly imported as any other package via an import statement.

[...] they should not be compelled to adopt insecure quick fixes.

Could you please clarify in what sense the solutions presented in the warning message are insecure? The error message presents to the user a couple of suggestions (to be chosen accordingly to what fits better their use case) which include to manually add the missing entries to the packages configuration option, or to use a convenience function provided by setuptools. In both cases, the procedure is very mature and stable.

The proposed fix which resolves the deprecation is apparently to use find_namespace_packages. But this is just wrong. Since namespace packages are meant to contain importable executable code.

I understand that this is a popular interpretation of what the concept of packages (and/or namespace packages) might mean for Python and I see where it comes from. But I don't think this interpretation is backed by the Python implementation and the way it works...

You can import directories that don't contain .py files, and having packages for holding non-Python files is actually a very useful feature[^2]! I never found official documentation saying that packages/namespace packages are meant to contain importable executable code and cannot be used to contain only non-Python files, I don't think there is an official stance on that.

[^2]: They make it really easy to find non-Python files them runtime using importlib.resources. You can also implement extensions/plugin systems on top of it, and etc...

Within the resources folder structure (which is specified by the library I'm wrapping) there are 775 files, 10.4 MB in total. Most of these are templated source files and all are part of the thirdparty library itself. But all are precisely data, not to be imported in the project package, only to be used for the purposes of code generation.

If these non-Python files are not meant to be installed in the end-user's machine, I believe it is a matter of properly configuring packages/package_data/include_package_data/exclude_package_data/MANIFEST.in so that they are part of the sdist but not part of the wheel. Otherwise, if they end up nested somewhere under an entry of sys.path, they will be import packages, effectively.

If you really don't like the idea of having these directories as importable packages, then the alternative is to use data-files, which will translate into a special directory in the wheel file ({name}-{version}.data/data/). In turn, pip will stall them in a different location that will not be nested somewhere in sys.path. It is a lot of effort to align expectation and implementation, for most people it might just be worth to adapt their expectations.

May I suggest that the deprecation warning be dropped or even abbreviated somewhat until the data / code packaging distinction can be restored and preserved by some new PEP or other. It makes no sense to promote solutions, in the interim, which incorrectly conflate the two.

I think that simply dropping the warning would be unwise. The purpose of the warning is for developers to align their configuration to their expectations. If they want certain directories to be installed somewhere under sys.path, this effectively mean that they are asking setupotools to include certains packages/subpackages into the wheel. Within setuptools, that desire is captured by the packages configuration option.

We need users to start clarifying their configuration, because the next step is to fix other related bugs (see https://github.com/pypa/setuptools/issues/3340#issuecomment-1219321087), and it would be bad if suddenly some folders are missing from packages.

I don't know of anyone currently attempting to introduce a data / code distinction (and what that will mean for Python packages and import system) via a new PEP. Personally, I like the status quo and I think it works quite well. Since there is no concrete plans for such change in the ecosystem, there is no foreseeable risk of conflation, and we don't need to treat this situation as "interim". As far as we know this is the stable behaviour that we should be targetting to achieve after 10 years of transition.


There is another approach[^4] that I have absolute no problems in considering and actually would welcome with open arms: if a member of the community is willing to contribute (i.e. design, discuss, find consensus, implement, document, fix, support ...) a different way of configuring setuptools that is more conceptually self-evident and less prone to ambiguity than packages/package_data/include_package_data/exclude_package_data/MANIFEST.in[^3]. Extra requirements for such solution are: backward compatibility and easy maintenance.

It is a tough challenge which I don't have the resources to tackle myself, but I would be very grateful if someone else can.

[^3]: In some sense, the automatic discovery (when the user does not specify packages) that was introduced a couple of years ago is meant to be easier and less confusing. But automatic discovery is not a fit for all and edge cases still require playing with packages/package_data/include_package_data/exclude_package_data.

[^4]: But that is orthogonal to the warning and next steps discussed here.

abravalheri commented 6 months ago

@abravalheri I think that adding a hint suggesting to delete include_package_data=True and replace it with a fine-grained package-data mapping would be useful.

That is nice idea. Depending on what the developer is trying to achieve that will indeed solve the problem. Currently the message does mention include-package-data=False:

[...]
If you don't want {importable!r} to be distributed and are
already explicitly excluding {importable!r} via
`find_namespace_packages(...)/find_namespace` or `find_packages(...)/find`,
you can try to use `exclude_package_data`, or `include-package-data=False` in
combination with a more fine grained `package-data` configuration.

But probably that can be rephrased to be more clear. Would you like to give it a shot? (P.S.: I consider include-package-data=False a better text for the warning message than omitting, because in pyproject.toml the default value of this config is True, so it just saves the time to explain the nuances).

Another source of confusion was the fact that the error message was unhappy with multidict._multilib, suggesting to use helpers like find_namespace_packages() while it did list the packages explicitly with no discovery needed, via a static list: packages=["multidict"]. This also confused a contributor who attempted adding "multidict._multilib" to that list, not understanding immediately that C-headers aren't to be installed in site-packages/ (or distributed in wheels, for that matter).

I understand that, but unfortunately there is no single fix for all scenarios. For some people include_package_data=False is the fix, for others find_namespace_packages() will do what they are trying to achieve...

And that is the problem with the ambiguous[^2] configuration, setuptools should not have to guess what the user wants to do, that is why we are asking the user to tell us.

[^2]: "Ambiguous" while setuptools does not take a final action on the "directories with non-Python files are not packages" controversy... Once the warning manage to compel devs to clarify their config, we can move past that problem and actually allow people that are intentionally omitting entries in packages to use include-package-data.

shakfu commented 6 months ago

Hi @abravalheri, appreciate your extended and thoughtful reply to my early post (please see my comments below):

Please note that the "directory is a package" behaviour introduced in PEP 420 is actually quite old. The PEP has been approved in 19/Apr/2012 and the specified behaviour implemented in Python 3.3, over 10 years ago which is a lot in "software development years".

It is safe to assume that this behaviour is stable and that at some point in their carreers Python developers will indeed learn what a Python package means and how adding a directory nested somewhere under one of the entries of sys.path corresponds effectively to create a Python package that can be regularly imported as any other package via an import statement.

I've been working in Python since a little bit before version 1.5.2, so this is certainly not new to me. Nonetheless, I admittedly did not express myself clearly: I have no issue with the principle of "directory is a package" or that packages are importable. This is fundamental to Python since way back.

I'm also happy with having a data folder within a python package such that a module within that package can programmatically always find this data folder by retrieving its path relative to __file__. Indeed this is precisely what I'm doing in the project I described earlier: in the cyfaust.common module I have the following code:

def get_package_resources() -> tuple[str, str, str, str]:
    """provides the paths of package architecture and library folders."""
    resources = os.path.join(os.path.dirname(__file__), "resources")
    archs = os.path.join(resources, "architecture")
    libs = os.path.join(resources, "libraries")
    return ("-A", archs, "-I", libs)

PACKAGE_RESOURCES = get_package_resources()

After reading your prior reply, I appreciate that the current situation (and your understanding of it) is a lot more nuanced than I previously considered, and that the confusion is probably stemming from a couple of things here:

1. My failure to understand the full implications of PEP 420 Namespace Packages:

This PEP makes everything in a package potentially importable, even when that is not the intent!

For a hypothetical example in my project, a user of my package can (currently) import a python file that is only intended to be used as a template.

But due to PEP 420, this is possible:

from cyfaust.resources.architecture.templates import abctemplate 

Now, this can be prevented by renaming resources to a non-python friendly name such as pkg-resources.

So now this is impossible:

from cyfaust.pkg-resources.architecture.templates import abctemplate 

I did not realize that this was the default situation now... and it has nothing to do with setuptools. That's my bad.

2. The Documentation + the Deprecation + the Multiplicity of Configuration options + Different ways of doing the same thing are confusing:

In my case, this is how I tried to resolve the deprecation:

  1. My initial working setup.py:
setup(
    name='cyfaust',
    version=VERSION,
    ext_modules=cythonize(
        extensions,
        language_level="3",
    ),
    packages=['cyfaust'],
    package_dir = {"cyfaust": "src/cyfaust"},
    include_package_data=True
)

Then I saw the deprecation notice, and to figure out how to resolve it I read the relevant datafiles section of the setuptools documentation.

I chose the initial recommended include_package_data method.

This led to the following:

setup(
    name='cyfaust',
    version=VERSION,
    ext_modules=cythonize(
        extensions,
        language_level="3",
    ),
    packages=find_packages(where="src"),
    package_dir = {"cyfaust": "src/cyfaust"},
    include_package_data=True
)

And it works, it produces the same outcomes as my prior working case, but... the deprecation notice is still there.

So after some further reading and research on the web, the advice is use find_namespace_packages so I choose the "Subdirectory for Data Files" method

I naively just changed find_packages to find_namespace_packages since it is seemingly the simplest solution

setup(
    name='cyfaust',
    version=VERSION,
    ext_modules=cythonize(
        extensions,
        language_level="3",
    ),
    packages=find_namespace_packages(where="src"),
    package_dir = {"cyfaust": "src/cyfaust"},
    include_package_data=True
)

and then I get an error:

running build
running build_py
error: package directory 'static' does not exist

At this point the frustration of palpable. All I want to do is specify a subdirectory of my package as a data directory to be included in whole.

The more complex way in this method is to add a package_data entry, but filling it out seems laborious given that I have 775 files data files accross many directories with different types etc..

At this stage, I decide to just live with the deprecation!

You asked for suggestions on how to improve this api. Here are my two cents:

Ideally it should be as simple as something like this:

setup(
    name='cyfaust',
    version=VERSION,
    ext_modules=cythonize(
        extensions,
        language_level="3",
    ),
    packages = {
        'cyfaust': {
            'path': "src/cyfaust",
            'data_dirs': ["cyfaust/pkg-resources"],
        }
    }
 )

or just:

setup(
    name='cyfaust',
    version=VERSION,
    ext_modules=cythonize(
        extensions,
        language_level="3",
    ),
    packages=['cyfaust'],
    package_dir = {"cyfaust": "src/cyfaust"},
    include_package_data=True
)

In the above ideal cases, MANIFEST.in or similar is only configured if you need more granular control over what is included or excluded. It is not required in the simplest case where you want everything included.

shakfu commented 5 months ago

@abravalheri

As an addendum to my last post I finally found a way to remove the deprecation warning. I could not get it done with find_namespace_packages and the only way was to run os.walk in my resources folder and then manually add folders which had python friendly names to the packages list as per the below code. Note that this was the only solution I could arrive at to remove the deprecation warning. If there is a better way, please let me know otherwise, my setup.py file looks a little ridiculous. :worried:

setup(
    name='cyfaust',
    version=VERSION,
    ext_modules=cythonize(
        extensions,
        language_level="3",
    ),
    package_dir = {"cyfaust": "src/cyfaust"},
    packages = [
        "cyfaust",
            # NOTE: The entries below are not really packages but data folders
            # They have been added to this list to remove a recent deprecation
            # warning caused by the deprecatation of the category of data files
            # within packages. These are now just called packages 
            # (even if that is not the intention).
            # DO NOT IMPORT, even though they are importable!!
        "cyfaust.resources.libraries",
        "cyfaust.resources.libraries.examples",
        "cyfaust.resources.libraries.examples.ambisonics",
        "cyfaust.resources.libraries.examples.analysis",
        "cyfaust.resources.libraries.examples.autodiff",
        "cyfaust.resources.libraries.examples.autodiff.delay",
        "cyfaust.resources.libraries.examples.autodiff.gain",
        "cyfaust.resources.libraries.examples.autodiff.gain_dc",
        "cyfaust.resources.libraries.examples.autodiff.gain_exp",
        "cyfaust.resources.libraries.examples.autodiff.gain_pow",
        "cyfaust.resources.libraries.examples.autodiff.gain_pow_trig",
        "cyfaust.resources.libraries.examples.autodiff.gain_sq",
        "cyfaust.resources.libraries.examples.autodiff.mem",
        "cyfaust.resources.libraries.examples.autodiff.one_zero",
        "cyfaust.resources.libraries.examples.autodiff.recursion",
        "cyfaust.resources.libraries.examples.autodiff.tremolo",
        "cyfaust.resources.libraries.examples.delayEcho",
        "cyfaust.resources.libraries.examples.dynamic",
        "cyfaust.resources.libraries.examples.filtering",
        "cyfaust.resources.libraries.examples.gameaudio",
        "cyfaust.resources.libraries.examples.generator",
        "cyfaust.resources.libraries.examples.misc",
        "cyfaust.resources.libraries.examples.old",
        "cyfaust.resources.libraries.examples.old.rewriting",
        "cyfaust.resources.libraries.examples.phasing",
        "cyfaust.resources.libraries.examples.physicalModeling",
        "cyfaust.resources.libraries.examples.physicalModeling.fds",
        "cyfaust.resources.libraries.examples.physicalModeling.old",
        "cyfaust.resources.libraries.examples.pitchShifting",
        "cyfaust.resources.libraries.examples.psychoacoustic",
        "cyfaust.resources.libraries.examples.quantizing",
        "cyfaust.resources.libraries.examples.reverb",
        "cyfaust.resources.libraries.examples.smartKeyboard",
        "cyfaust.resources.libraries.examples.smartKeyboard.associatedEffects",
        "cyfaust.resources.libraries.examples.spat",
        "cyfaust.resources.architecture",
        "cyfaust.resources.architecture.api",
        "cyfaust.resources.architecture.api.doc",
        "cyfaust.resources.architecture.AU",
        "cyfaust.resources.architecture.AU.AUPublic",
        "cyfaust.resources.architecture.AU.AUPublic.AUBase",
        "cyfaust.resources.architecture.AU.AUPublic.AUEffectBase",
        "cyfaust.resources.architecture.AU.AUPublic.AUInstrumentBase",
        "cyfaust.resources.architecture.AU.AUPublic.Utility",
        "cyfaust.resources.architecture.AU.PublicUtility",
        "cyfaust.resources.architecture.AU.Source",
        "cyfaust.resources.architecture.AU.Source.AUSource",
        "cyfaust.resources.architecture.AU.Source.CocoaUI",
        "cyfaust.resources.architecture.daisy",
        "cyfaust.resources.architecture.faust",
        "cyfaust.resources.architecture.faust.au",
        "cyfaust.resources.architecture.faust.audio",
        "cyfaust.resources.architecture.faust.dsp",
        "cyfaust.resources.architecture.faust.gui",
        "cyfaust.resources.architecture.faust.gui.Styles",
        "cyfaust.resources.architecture.faust.midi",
        "cyfaust.resources.architecture.faust.unity",
        "cyfaust.resources.architecture.faust.vst",
        "cyfaust.resources.architecture.juce",
        "cyfaust.resources.architecture.juce.plugin",
        "cyfaust.resources.architecture.juce.standalone",
        "cyfaust.resources.architecture.sam",
        "cyfaust.resources.architecture.teensy",
        "cyfaust.resources.architecture.vcvrack",
        "cyfaust.resources.architecture.vcvrack.template",
        "cyfaust.resources.architecture.vcvrack.template.res",
        "cyfaust.resources.architecture.vcvrack.template.src",
        "cyfaust.resources.architecture.VST",
    ],
    include_package_data=True
)
abravalheri commented 5 months ago

@shakfu, have you tried something like the following?

common = {
    ...,
    "include_package_data": True,
    "exclude_package_data": {"": "*.cpp"},
}

if STATIC:
    setup(
        **common,
        ext_modules=[...],
        package_dir={"": "src/static"},
        packages=find_namespace_packages(where="src/static"),
    )
else:
    setup(
        **common,
        ext_modules=[...],
        package_dir={"": "src"},
        packages=find_namespace_packages(where="src", include=["cyfaust*"]),
    )

The example above requires MANIFEST.in to be configured so that all the files necessary for the build process are present in the sdist. Alternatively if you are OK with all the files in your VCS to also be present in your sdist you can use a plugin like setuptools-scm (which potentially simplifies a lot the configuration).

shakfu commented 5 months ago

@abravalheri Thanks for the tip, I'll try it out to see if I can improve it as per your suggestions.

Incidentally, because I have two build variant in my setup.py file, I generate the MANIFEST.in file in each case:

    with open("MANIFEST.in", "w") as f:
        f.write("graft src/cyfaust/resources\n")
        f.write("exclude src/cyfaust/*.cpp\n")
shakfu commented 5 months ago

@abravalheri

Incidentally, I deleted my prior post as it included an error that should not have been flagged here. In any case, the setup.py file is working, checked-in and the deprecation is gone. Thanks very much for your help!

merwok commented 5 months ago
  packages=find_namespace_packages(where="src", include=["cyfaust*"]),

It does not seem good if docs have to recommend the definition of namespace packages for projects that are not using namespace packages.