Open sorcio opened 2 years ago
The current behavior also doesn't account for global .gitignore files, i.e. core.excludesFile and/or ~/.gitconfig.
I haven't thought hard enough but perhaps a guess at a solution would be using git ls-files
to collect the list of files and let git
manage its own behavior? In Rust there's a whole crate dedicated to all the fun involved in ignore files (used by e.g. fd
and rg
).
Flit uses git ls-files and I can confirm that it more closely matches user expectations. :)
I can add more ignore file support but will never use git
or hg
directly unless behind an option. I won't spend time on an option like that myself but will accept a PR.
Please keep in mind that a VCS checkout is not guaranteed and is in fact the rarest of all situations. If a project is only published on PyPI in CI/CD then that assumption is met but then fails as soon as a user installs the sdist or attempts are made to build for other ecosystems like Conda or Fedora that use either the sdist on PyPI or the tarball release artifact on GitHub.
Note the issue is not a feature request to strictly mimic the behavior of git. It describes a misbehavior in specific conditions, i.e. when ignore files are not at the root of the project. The problem mostly appears when using sdists so using git
in Hatchling is not a general solution.
Maybe Hatch has more context and is more likely to be used from a VCS checkout. So an option could be added there.
Supporting per-directory ignore files would be an additional feature. But it could be added as part of the same solution.
FWIW, I've almost shipped a package with a .mypy_cache
, because git magically ignored it. No, I don't have a global .gitignore anymore and I've stopped using .git/info/exclude
. I have no idea why – maybe git got smart!? But I can't rely on git complaining about files lying around and that's spooky.
Would it be possible to maybe statically add certain dirs that never should be shipped like .DS_Store
, .tox
, .nox
, .mypy_cache
.pytest_cache
, __pycache__
? build
and dist
also come to mind.
I know you're opposed to magic, but this would reduce the amount of package screwups significantly.
https://github.com/pypa/hatch/pull/493
build
is a setuptools thing
Can I somehow convince you to add .mypy_cache
and .pytest_cache
too?
Because as I wrote: my Git is ignoring it by default and I suspect I'm not the only one so this can lead to a lot of .[mypy|pytest]_cache
s shipped since there's no feedback except double-checking the sdists and I suspect I'm 1 of 5 people world-wide who do that.
Can you find out how Git is ignoring those? It's not for me
Yeah I've found out.
Both directories are created with an .gitignore
file inside of them that contains *
as an entry.
For mypy you may have to delete the directory and run it again because it doesn't seem like it's adding the files if the dir has been created by an older version.
I guess the trick for both would be to evaluate .gitignore
files recursively? It seems like this is become a common pattern (and it's kinda a good idea tbh).
This was the only Hatch-related pain point / surprise I encountered while migrating a large codebase (https://github.com/aesara-devs/aesara/pull/1384). It would be great to support recursive .gitignore
evaluation as per the previous comment.
Given that git ls-files
won't be supported here, I wonder if it would make sense as an additional feature in hatch-vcs? Setuptools_scm does provide this listing to setuptools, so it seems like a good fit for that plugin. Not sure how complicated that would be, though...
I was hit by this (unintended?) behaviour. By the way I noticed because the source dist was not reproducible, due to the cached files.
For now I add
[tool.hatch.build]
exclude = [
".*_cache",
]
in every pyproject.toml
...
While the best option would be to honor .gitignore
files also in child directories, I was wondering if it make sense to allow global exclude patterns in config.toml
?
More sensible default behavior would be desirable here. This bit me too, I don't keep a .gitignore in the sourcetree unless it's excluding stuff generated specifically by that project. Excludes for .venv, pyc etc all go in global gitignore.
In https://github.com/pytest-dev/pytest-freezer/issues/9 it was pointed out that the sdist on PyPI had a whole venv inside. Oops, embarrassing.. :)
For everyone running into these problems I’d like to propose https://github.com/hynek/build-and-inspect-python-package for your consideration to build and check your packages in a clean CI with introspection – and then ideally upload from CI using the new PyPI trusted publisher support.
This bug in Hatch has finally pushed me over the edge to take this on a few months ago and it’s basically just adding a workflow to your repo and a bit of configuration on PyPI’s side: https://github.com/python-attrs/attrs/blob/main/.github/workflows/pypi-package.yml
It’s very liberating.
@hynek Thank you for that, really nice. Just tried it out and found a couple of interesting things: it didn't work on a windows runner (meh?), and the PKG-INFO inside the sdist didn't get the source timestamp like all the other files (also meh?)
As for hatch, after trying+failing multiple times in a row to configure contents in the .tar.gz and the .whl correctly, I think I'm giving up on it for the moment. I'm all for explicit configuration, but the default configuration should not do things that are obviously wrong.
@wimglenn What is not working with your explicit configuration? The default file inclusion for each target is well documented:
Based on that your project seems to have perfectly fine defaults for wheels. Therefore, you should not be using the global [tool.hatch.build]
config but rather [tool.hatch.build.targets.sdist]
specifically.
I have to disagree here. It is not a perfectly fine default to ship files to PyPI that aren't part of the repo and aren't shown in git status
– with or without a global .gitignore
/ .git/info/exclude
. To me this is Hatch's biggest footgun by far.
I'm not sure where that response came from? I was talking specifically about that user's project.
I'm sorry, maybe I've misunderstood you both.
Just because I'm noticing this now,
Given that git ls-files won't be supported here
I'm not sure where this comes from,from what I see the last @ofek opined on the matter was
I can add more ignore file support but will never use git or hg directly unless behind an option. I won't spend time on an option like that myself but will accept a PR.
so AFAICT, @ofek correct me if I'm wrong, but the status of this issue is still "PR welcome to add an option, at least for part of the problem"?
FWIW I have a PoC patch against hatch-vcs that uses setuptools_scm to determine the file listing from git/hg: https://github.com/ofek/hatch-vcs/pull/40
It seems to be held up by a setuptools_scm refactor, but it can be used now.
so AFAICT, @ofek correct me if I'm wrong, but the status of this issue is still "PR welcome to add an option, at least for part of the problem"?
that is correct
@ofek There is nothing wrong with that explicit configuration: when using [tool.hatch.build.targets.sdist]
it does the correct thing.
My complaint was that the default configuration is not clever enough to know that packing .venv
and pycs inside the sdist is obviously wrong. Then I've fumbled, trying to fix the sdist and inadvertently messed up the wheel and hatch doesn't know that putting site-packages/pyproject.toml
and site-packages/README.rst
etc in the wheel is also obviously wrong.
User error? Yes. But also perhaps a bit too easy to make mistakes without noticing.
If .gitignore is found in any of the ancestors of the project root, the paths are interpreted relative to the wrong path. The issue is also propagated to sdists, because the .gitignore file is copied as-is to the source directory.
I've been scratching my head at something related to this... I think.
I'm working in a monorepo where subpackages live under a libs/<subpackage>
directory until my repo root. At the repo root, I have a .gitignore
and one of the subpackages also has a .gitignore
. In a different subpackage, I have the following in my pyproject.toml
file:
[tool.hatch.build.targs.sdist]
exclude = [
'.gitignore',
]
but when I run hatch build
from the subpackage directory, the top level .gitignore
file always shows up in my sdist. Since there's no .gitignore
in this subpackage, I didn't understand why the root .gitignore
got added to my .tar.gz
sdist file.
FTR, I've tried various combinations of exclude patterns and also tried [tool.hatch.build]
section but that top level .gitignore
always gets added.
Maybe that's intended behavior based on the docs note that .gitignore
cannot be excluded, but it was certainly confusing that a file from outside my subpackage directory -- where I explicitly run hatch build
from -- gets copied into my sdist.
when I run
hatch build
from the subpackage directory, the top level.gitignore
file always shows up in my sdist. Since there's no.gitignore
in this subpackage, I didn't understand why the root.gitignore
got added to my.tar.gz
sdist file.
Yes, it's the same issue. It's not literally that the file is copied (so it doesn't matter if it's excluded). Hatch reads the rules, and applies them to a newly-generated .gitignore file. The effect is the same. There are at least a couple issues with this, discussed above: the behavior is surprising; and rules meant for a different path are applied without correction, so it's also wrong.
Although I originally reported this, I'm unlikely to work on this issue because I'm not using Hatch anymore.
I was exploring the content of my project's sdist, and discovered this issue as well. In my case my editor's config, which is excluded from git globally, got sucked in. It had been configured this way for so long I didn't even remember about it.
While it's true that this behavior is documented under VCS section, I unconsciously made the assumption that only tracked files are selected. It seems it's not just me who made this assumption, so I'd appreciate if you could add a note there explaining this limitation.
There's also a contradictory information given in sdist builder's documentation:
When the user has not set any file selection options, all files that are not ignored by your VCS will be included.
It links to the mentioned VCS section, but still, this statement introduces some confusion. It's not VCS who determines if the file is ignored, but Hatch itself.
What's the status here? It seems to me that respecting .gitignore, but not embedded .gitignore is a major miss-feature. I agree that requiring git at run-time is not great so:
1) If we add recursive .gitignore, that would help a lot, and seems to be not that heavy a lift -- I'll try to take a look at the code soon, and see if I can put together a PR.
2) respecting global git setting would be good, but I think less critical -- maybe that's because I don't make much use of it -- and because a project with more than one person working on it really should put everything in the project itself, and it's not that hard to add it to excludes.
would a PR for (1) be considered?
NOTE: I took a quick look at the source, and can't find where in the code the filelist for the sdist is created -- so I'll need a pointer ...
.gitignore files don't have to live at the root of the working directory:
Hatchling assumes that there is at most one .gitignore file, and that it describes paths relative to the project root. This mismatches the behavior of git in a couple of ways:
1) If .gitignore is found in any of the ancestors of the project root, the paths are interpreted relative to the wrong path. The issue is also propagated to sdists, because the .gitignore file is copied as-is to the source directory.
Example
``` my-repo-root ├── .gitignore └── foobar ├── LICENSE.txt ├── README.md ├── ignore_me ├── foobar │ ├── __about__.py │ └── __init__.py ├── pyproject.toml └── tests └── __init__.py ``` .gitignore might contain the pattern `foobar/ignore_me`, which would be effective enough for git. But Hatchling would not be able to apply the pattern to exclude ignore_me. If a file with the same name happened to be in the inner `foobar` directory, it would be accidentally excluded.2) Per-directory .gitignore files are ignored.
I don't use Mercurial anymore, but I imagine that 1 would apply there as well. Mercurial restricts .hgignore files to live at the root of the repo, so there can't be multiple ones. But you could have a project in a sub-directory of a hg repo.
It would be nice if Hatch/Hatchling could do the same thing the VCS does.
I don't know if a solution for 1 would involve, at sdist building time, to rewrite .gitignore to make relative patterns (those with
/
not at the end) relative to the project directory (or write the modified patterns into the sdist pyproject.toml instead of including the .gitignore file).Per-directory .gitignores complicate things a bit. It implies that when traversing parents you cannot stop at the first .gitignore you find, because there can be a more general one up above. Traversal should stop when you reach the root of the working tree (i.e. when you find
.git
). But when you unpack an sdist you don't have a .git.And I don't even want to start thinking about submodules.