Closed DanielYang59 closed 1 month ago
Hi @DanielYang59 please note that if you want json
and other files included in the sdist
via include-package-data
you do need to add them either via a plugin like setuptools-scm
or MANIFEST.in
.
This condition is documented in https://setuptools.pypa.io/en/latest/userguide/datafiles.html#include-package-data:
First, you can use the
include_package_data
keyword ... ... files will be automatically installed with your package, provided:
- These files are included via the MANIFEST.in file, ...
- OR, they are being tracked by a revision control system such as Git, Mercurial or SVN, and you have configured an appropriate plugin ...
What include-package-data = true
does is to copy all data files in the sdist to the wheel, but these files do need to be included in the sdist
first.
The likely reason why the sdist on PyPI have the files is because you are using tool.setuptools.package-data
. This behaviour is also documented in https://setuptools.pypa.io/en/latest/userguide/datafiles.html#package-data:
Note that the data files specified using the
package_data
option neither require to be included within a MANIFEST.in file, nor require to be added by a revision control system plugin.
I believe include-package-data
behaviour to be consistent in setuptools since v58.5.3.
Hi @abravalheri thanks a loooooooooot for your quick response and the information (I should have read the documentation more carefully).
I could confirm using the following includes everything (seemingly, and I would need a closer inspection later) inside the src
into the package data:
[tool.setuptools]
include-package-data = true
[tool.setuptools.package-data]
"*" = ["*", ]
[tool.setuptools.packages.find]
where = ["src"]
include = ["pymatgen", "pymatgen.*"]
I believe include-package-data behaviour to be consistent in setuptools since v58.5.3.
For some reason sometimes changes in pyproject.toml
don't alter the built sdist/wheels, I guess I would need to completely clear the cache (I guess it would be a reasonable feature request to submit?).
Thanks again!
Hi @abravalheri thanks again for the comprehensive reply. I went through the Data Files Support page again and there're several points that are still not fully clear to me (bear with me for the newbie questions).
package-data
section and include-package-data
keyword in pyproject.tomlpackage-data
section is not declared, all non-.py
files would be considered package data if they're either listed in "MANIFEST.in" or tracked by version control system (with correct plugin).By default, include_package_data considers all non .py files found inside the package directory (src/mypkg in this case) as data files, and includes those that satisfy (at least) one of the above two conditions into the source distribution, and consequently in the installation of your package.
Or, we could explicitly declare "package-data" through the package-data
section regardless of they are inside the MANIFEST.in file or tracked by VCS.
What include-package-data = true does is to copy all data files in the sdist to the wheel, but these files do need to be included in the sdist first.
From the above docs, it seems to me the [tool.setuptools.package-data]
section is used to declare non-default files to be counted as "package data", and with include-package-data = true
, these "package data" are then included into the source distribution?
But wait, why should we declare "package data" if we don't want to pack them into the package (include-package-data = false
), perhaps there are other usage that I'm not aware of?
include_package_data = true
, those "package data" would then be copied to the wheels? Therefore, the include-package-data
controls the "package data" in both sdist and wheels (either both have "package data" if true, or neither)?the most common use case for data files distributed with a package is for use by the package, usually by including the data files inside the package directory.
Or:
then all the .txt and .rst files will be automatically installed with your package, provided:
As Python is an interpreted language, .py
files would remain as is, but Cython/C codes would be compiled as binary extensions (.so
/.pyd
). And additional package data files would also remain as is in the wheels?
Update: I think I understand now.
Just created a demo package: demo_project.zip
.
├── pyproject.toml
├── setup.py
└── src
└── demo
├── pkg_1
│ ├── __init__.py
│ ├── data_1.txt
│ └── module_1.py
└── pkg_2
├── __init__.py
├── cython_from_pmg.pyx
├── data_2.csv
└── module_2.py
5 directories, 9 files
And looks like include-package-data
is not doing anything, see the following table:
include-package-data | |||
---|---|---|---|
TRUE | FALSE | ||
package-data | set | wheel and sdist have txt/csv | wheel and sdist have txt/csv |
not set | wheel and sdist have no txt/csv | wheel and sdist have no txt/csv |
In other words, if the [tool.setuptools.package-data]
section is set, both sdist and wheel would contain data_*.csv/txt
, regardless how include-package-data
is set, and vice versa.
If I understand correctly, include-package-data
is not doing anything because: 1. I don't have a MANIFEST.in
file; 2. No VCS plugin is used, and therefore it's effectively as false
no matter how it's set.
By default, include_package_data considers all non .py files found inside the package directory (src/mypkg in this case) as data files, and includes those that satisfy (at least) one of the above two conditions into the source distribution, and consequently in the installation of your package.
And I don't think the following is quite right? i.e. include-package-data
doesn't really control whether data files from sdist would be copied to wheels (it always copies).
What include-package-data = true does is to copy all data files in the sdist to the wheel, but these files do need to be included in the sdist first.
And I don't think the following is quite right? i.e. include-package-data doesn't really control whether data files from sdist would be copied to wheels (it always copies).
What include-package-data = true does is to copy all data files in the sdist to the wheel, but these files do need to be included in the sdist first.
Assuming that you put a MANIFEST.in
or use something like setuptools-scm
and not use package-data
, than you can add include-package-data = false
and that will have an effect.
This is how these conditions work togheter (in terms of Boolean expressions):
Notation:
i
- include-package-data = true
e
- exclude-package-data
is set to remove data files
p
- packag-data
is set to include data files
m
- MANIFEST.in
exists and includes data files
w
- data file included in the wheel distribution
s
- data file included in the sdist distribution
x'
: the negated form of x
, in other words x' = not(x)
x.y
: logical and
between x
and y
x+y
: logical or
between x
and y
s = m + e'.p
w = i.e'.m + e'.p -- Which can be simplified => e' . (i.m + p)
-- and is also equivalent to => e' . (i.s + p)
This behaviour is stable since v58.5.3, there is an experiment here showing those results https://github.com/abravalheri/experiment-setuptools-package-data?tab=readme-ov-file#results-for-setuptools6060.
And I don't think the following is quite right? i.e. include-package-data doesn't really control whether data files from sdist would be copied to wheels (it always copies). What include-package-data = true does is to copy all data files in the sdist to the wheel, but these files do need to be included in the sdist first.
Assuming that you put a
MANIFEST.in
or use something likesetuptools-scm
and not usepackage-data
, than you can addinclude-package-data = false
and that will have an effect.
I could confirm it's right! i.e. with no [tool.setuptools.package-data]
section and package data declared in MANIFEST.in
:
include-package-data = false
, package data in sdist but not wheelsinclude-package-data = true
, package data in both sdist and wheelsBut I have a feeling that this is not clear from the documentation? Perhaps it's best to update that?
This is how these conditions work together (in terms of Boolean expressions):
Thanks for sharing! I must have seen this while searching in setuptools
issues, this is very complicated so great to have this cheat sheet!
But I have a feeling that this is not clear from the documentation? Perhaps it's best to update that?
Definitely. Would you like to propose a PR for that? I have been through this docs passage many times and we had a couple of contributors changing it back and forth, so it is a bit tricky to get it right (at this stage I am also very "desensitised" to the text). Probably a fresh pair or eyes would help.
With pleasure, I would open a PR for that soon.
Probably a fresh pair or eyes would help.
Fully understandable, perhaps it good for a newcomer (me) to play a part such that other new users could easily understand too, because as a newcomer it's clearer to me what might be missing from the docs.
Any thanks a ton for the detailed explanation and examples again!
This is how these conditions work togheter (in terms of Boolean expressions):
Notation:
i
-include-package-data = true
e
-exclude-package-data
is set to remove data filesp
-packag-data
is set to include data filesm
-MANIFEST.in
exists and includes data filesw
- data file included in the wheel distributions
- data file included in the sdist distribution
x'
: the negated form ofx
, in other wordsx' = not(x)
x.y
: logicaland
betweenx
andy
x+y
: logicalor
betweenx
andy
s = m + e'.p w = i.e'.m + e'.p -- Which can be simplified => e' . (i.m + p) -- and is also equivalent to => e' . (i.s + p)
I really appreciate this, which makes everything very clear and should be included into the docs (I believe). One minor comment though (I could be missing some background here), as "include or not" is a binary logic, can we use the standard logical connectives instead of creating definition ourselves? It was a bit hard to read, and defining +
as OR
is a bit counter-intuitive to me (I would read +
as AND in my brain ).
¬ for negation (NOT)
∧ for conjunction (AND)
∨ for disjunction (OR)
There are many popular notations for Boolean Algebra. I found this article that seems to summarise the main ones: https://www.allaboutcircuits.com/technical-articles/boolean-basics/. The notion of a "standard" notation in this scenario may be biased by personal experience or field of study.
Thanks a lot for the input, yes I thought there could be background that I'm not aware of :)
I could be missing some background here
setuptools version
setuptools==74.1.2
Python version
Python 3.12.5 (MacOS) Python 3.10.12 (Ubuntu-WSL2)
OS
MacOS Sonoma 14.6.1 and Ubuntu-WSL2 22.04 LTS
Additional environment information
Well I haven't installed
setuptools
on my fridge (yet), maybe laterDescription
I was trying to replace explicit declare of package data with
include-package-data = true
forpymatgen
, but noticedpython3 -m build
could generate source distribution with package data missing (even after I removeddist
andsrc/pymatgen.egg-info
). I believe no other files were controlling this process (noMANIFEST.in
, no related args insetup.py
).Here's the related section in
pyproject.toml
file (this is the complete file):Expected behavior
I would expect
python3 -m build
to behave consistently and generate wheels and sdist exactly as specified inpyproject.toml
file.The generated wheel (the wheel on pypi for example) and sdist should include multiple package data files (json and more):
How to Reproduce
Inspect the source distribution:
Inspect the wheel:
Output
Output as file (too many lines): build.log
Generated sdist (all .json, .csv, *.yaml...missing): pymatgen-2024.8.9.tar.gz
The same to the wheel (cannot upload as
.whl
is not supported).