Closed jameshilliard closed 2 weeks ago
@ofek Think you could take a look at this?
FYI tests on master are currently broken same as here so I don't think this is introducing any regressions.
Hi there,
I just spent some time on trying to fix my project not to produce empty wheels until I stumbled over this issue.
Why I am writing here is because it was surprising me that such simple condition (at least as appearing to me) like "do not cross project root when scanning for certain files" really is a concern here.
My first thought was: Wouldn't it be enough to simply stop scanning upwards the file tree when we encounter a pyproject.toml
or a hatch.toml
? As I understand, those files mark the project root. My suggestion obviously is too strict if there is any valid use case for having (relevant) .gitignore
-files outside the project root. Are there any?
Markus
Wouldn't it be enough to simply stop scanning upwards the file tree when we encounter a
pyproject.toml
or ahatch.toml
?
Technically that would also prevent the issue I was seeing...however that would differ from git's normal gitignore behavior so it may not be desirable.
My suggestion obviously is too strict if there is any valid use case for having (relevant) .gitignore-files outside the project root. Are there any?
If the python project root is in a subdirectory of a git project root directory it's possible that someone may want to use the gitignore above the python project root subdirectory but within the git project root directory to ignore specific files, so there may be a use case here for wanting to try and replicate git's behavior more closely.
The logic of this check as I implemented it should only filter out the use of gitignore in cases where the python project root directory is itself entirely ignored(a case where it is effectively guaranteed that the python project is not tracked by git at all), while still allowing for files within the project root directory to be ignored if needed using gitignores above the project root.
I see that there are valid use cases for having a relevant .gitignore
outside the project root.
In my case, the situation was a bit special, since the python project I was working on was completely unrelated to any VCS. It was just a random project sitting in my user's home.
Ages ago, I decided (and almost forgot about) to put some of the config files in my home under version control with git (literally making my whole home a git repo). Now, to not have random stuff under version control, I used kind of inverse git-ignore logic with a .gitignore
like this:
*
!.gitignore
!.bashrc
!.profile
!.bash_aliases
!.bash_profile
!.gitconfig
!.gitpwd.sh
!.Xmodmap
!.vimrc
!.pinerc
!.Xdefaults
[...]
I absolutely admit that this is a very rare case, and I also know that there are better/other ways today to manage user configuration. I just wanted to point out that there might be use cases that involve .gitignore
files that are completely unrelated to the python project in question. It is not obvious and at least to me really unexpected that hatch
even looks at files outside the project root. That was the reason for me having hard times to figure out what's going on here, and still it feels somewhat strange that "some entity" like hatch crawls up the whole file system and considers any .gitignore
it finds.
What if in some completely unrelated .gitignore
is a pattern like this?
*.py
I assume this still would break things.
What would have helped me a lot to spot the issue in a straighter fashion:
hatch
that shows me what files are being consideredIn my case, the situation was a bit special, since the python project I was working on was completely unrelated to any VCS. It was just a random project sitting in my user's home.
This is designed to handle cases like that, it's designed for really any case where the project root is itself ignored. Does it not work for you?
I just wanted to point out that there might be use cases that involve
.gitignore
files that are completely unrelated to the python project in question.
I mean, that's effectively what this was designed for, in my case I have a fully ignored python project root build directory inside a meta build system.
It is not obvious and at least to me really unexpected that
hatch
even looks at files outside the project root.
Yeah, it's not very obvious, so much so that this sort of behavior of matching bogus parent gitignore files was even independently re-implemented in virtually every newer pep517 build backend in some form or another. I managed to fix most of them already but hatch and flit(but not flit-core so there's much less real world issues for downstream integrators) AFAIU are the only ones remaining with fixes pending for this type of bug.
What if in some completely unrelated
.gitignore
is a pattern like this?*.py
I assume this still would break things.
If the project root is ignored then this change should disable the use of .gitignore
files entirely so having additional more specific ignores like *.py
shouldn't be an issue, if you have a parent gitignore which does not ignore the project root then it's assumed that the project is actually tracked by git(which should replicate git's behavior) and thus the ignores will be used. I think this results in the least surprising and expected behavior.
This is designed to handle cases like that, it's designed for really any case where the project root is itself ignored. Does it not work for you?
Up to know, I just assumed that it would work :slightly_smiling_face:. But to confirm that, I ran a quick test by installing your version directly from git inside a virtualenv.
$ pip freeze | grep hatch
hatch @ git+https://github.com/jameshilliard/hatch.git@10d23087d03c7ba51a5564402701d01de0f6f6f6
hatchling==1.25.0
$ hatch --version
Hatch, version 1.12.1.dev18
$ hatch build -t wheel
dist/photo_importer-0.0.1-py3-none-any.whl
$ 7za l dist/photo_importer-0.0.1-py3-none-any.whl
Listing archive: dist/photo_importer-0.0.1-py3-none-any.whl
[...]
Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
2020-02-02 00:00:00 ..... 1559 593 photo_importer-0.0.1.dist-info/METADATA
2020-02-02 00:00:00 ..... 87 86 photo_importer-0.0.1.dist-info/WHEEL
2020-02-02 00:00:00 ..... 48 49 photo_importer-0.0.1.dist-info/entry_points.txt
2020-02-02 00:00:00 ..... 1097 642 photo_importer-0.0.1.dist-info/licenses/LICENSE.txt
2020-02-02 00:00:00 ..... 437 267 photo_importer-0.0.1.dist-info/RECORD
------------------- ----- ------------ ------------ ------------------------
2020-02-02 00:00:00 3228 1637 5 files
As you can see, there are no package files contained in the wheel (there should be a package named photo_importer
).
If I either remove my .gitignore
file from my home or set the option
[tool.hatch.build]
ignore-vcs = true
in pyproject.toml
I get a wheel with the expected contents:
Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
2020-02-02 00:00:00 ..... 127 121 photo_importer/__about__.py
2020-02-02 00:00:00 ..... 105 99 photo_importer/__init__.py
2020-02-02 00:00:00 ..... 8090 2592 photo_importer/phi.py
2020-02-02 00:00:00 ..... 1559 593 photo_importer-0.0.1.dist-info/METADATA
2020-02-02 00:00:00 ..... 87 86 photo_importer-0.0.1.dist-info/WHEEL
2020-02-02 00:00:00 ..... 48 49 photo_importer-0.0.1.dist-info/entry_points.txt
2020-02-02 00:00:00 ..... 1097 642 photo_importer-0.0.1.dist-info/licenses/LICENSE.txt
2020-02-02 00:00:00 ..... 680 405 photo_importer-0.0.1.dist-info/RECORD
------------------- ----- ------------ ------------ ------------------------
2020-02-02 00:00:00 11793 4587 8 files
It looks like your fix indeed does not work for my case. Did I miss something?
It looks like your fix indeed does not work for my case. Did I miss something?
Hmm, try with print(patterns)
above this line and see what it's showing.
$ pip freeze | grep hatch hatch @ git+https://github.com/jameshilliard/hatch.git@10d23087d03c7ba51a5564402701d01de0f6f6f6 hatchling==1.25.0
I think this is your issue, you forgot to actually install my change from this PR, the change is part of the hatchling
package not the hatch
package, this git repo contains both projects but I only made changes to hatchling.
I tried replicating your environment and it seems that was the issue, also I'm not sure which hatchling
package the hatch build -t wheel
command uses for the build backend but it seems to not be picking up the one installed in the active virtualenv for some reason, I testing wheel builds with the standard build frontend which seems to work as expected. Presumably if the pypi hatchling version contains this change it would also work.
Currently doesn't work:
hatch build -t wheel
Works:
python -m build --no-isolation .
Thanks for testing with respect to my setup.
I fixed my test environment and now have:
$ pip freeze | grep hatch
-e git+https://github.com/jameshilliard/hatch.git@10d23087d03c7ba51a5564402701d01de0f6f6f6#egg=hatch
-e git+https://github.com/jameshilliard/hatch.git@10d23087d03c7ba51a5564402701d01de0f6f6f6#egg=hatchling&subdirectory=backend
I installed in editable mode from a local working copy in order to be able to quickly switch branches.
Hereby I can confirm:
python -m build --no-isolation
workshatch build
does notI verified that none of the both calls above works when running from master
, so your fix is effective.
I investigated a bit more what code actually runs when using hatch build
.
Uninstalling hatchling
leads to import errors, so python seems to pick up the package from inside the virtualenv after all.
$ hatch build
╭───────────────────────── Traceback (most recent call last) ─────────────────────────╮
│ [...]/jameshilliard/hatch/src/hatch/cli/__init__.py:229 │
│ in main │
│ │
[...]
│ │
│ [...]/jameshilliard/hatch/src/hatch/project/core.py:334 │
│ in metadata │
│ │
│ 331 │ @property │
│ 332 │ def metadata(self): │
│ 333 │ │ if self._metadata is None: │
│ ❱ 334 │ │ │ from hatchling.metadata.core import ProjectMetadata │
│ 335 │ │ │ │
│ 336 │ │ │ self._metadata = ProjectMetadata(self.location, self.plugin_manag │
│ 337 │
╰─────────────────────────────────────────────────────────────────────────────────────╯
ModuleNotFoundError: No module named 'hatchling'
print(patterns)
, in contrast to when running python -m build
)In my estimation, using hatch build
is a very common use case, so we should make it work somehow too. I probably could dig deeper into the code next week to find out the differences between the actually executed code paths of the two build command flavors.
Ok, it took me a while, but here are my findings.
The reason why hatch build
does not pick up local changes you eventually applied to the hatchling backend simply is because it installs an isolated fresh version of hatchling
from upstream when setting up it's internal build environment.
I did not dig deeper into the code of hatch
or the build config options if there is any way to override this behavior.
As a test, I forcibly installed your patched version of hatchling
into the isolated build environment that hatch has created for my project. After doing so, hatch build
generates a wheel containing all files as expected :tada:.
As a conclusion:
hatchling
locally using hatch build
is a bit tricky and not straight forward. The best way is to manually invoke python -m hatchling build ...
, since this is what hatch build
does internally in case the build backend is set to hatchling.build
.hatch build
as well.Update:
Just after writing this, I found the following way to make hatch
automatically use your local version of the build backend.
Set the following in your pyproject.toml
:
[build-system]
requires = ["hatchling @ file:///path/to/your/local/hatch/backend"]
build-backend = "hatchling.build"
rebased
Do you have an example so that I could reproduce?
Do you have an example so that I could reproduce?
Just build any package using hatchling from an extracted sdist inside a gitignored directory, for example:
If our top level working git directory is:
/home/buildroot/buildroot/.git
With *
in this gitignore file:
/home/buildroot/buildroot/output/.gitignore
Then when building a sdist python package from the following build directory which is extracted from the package sdist:
/home/buildroot/buildroot/output/build/packagename
We will end up with an empty wheel file as all paths underneath this are ignored and incorrectly excluded by hatchling:
/home/buildroot/buildroot/output
rebased again
I think there is an edge case lurking here regarding matching. I feel like rather than checking if the absolute path matches the logic should check the relative path.
Hmm, so this may be a pre-existing bug, in buildroot we are in fact currently relying on hatchling not parsing a parent .gitignore
with an absolute path correctly in order to avoid triggering the bug this PR was intended to fix. Weirdly maturin appears to have completely independently implemented this exact same bug.
For example hatchling doesn't pick up a /output
exclude path in:
/home/buildroot/buildroot/.gitignore
But does pick up a *
exclude path in:
/home/buildroot/buildroot/output/.gitignore
This is despite to my understanding these having identical meaning by git itself.
What do you think I should do?
What do you think I should do?
Probably you would initially want to go through and root cause the reason why hatchling's gitignore logic is not following normal git rules(i.e. not picking up paths like /output
in the above example).
Maybe also try and make sure gitignore path handling is consistent in regards to relative/absolute paths everywhere in the codebase. The relative/absolute path handling in hatchling is at least for me somewhat tricky to reason about due to the complexity of the config.py code. Possibly this complexity is largely unavoidable but I'd assume there would be at least some opportunities there for refactoring in order to simplify things somewhat.
I think there is an edge case lurking here regarding matching.
BTW I'm quite confident the general approach I'm using to fixing the bogus parent matching bug in this PR is overall fairly reliable(since it's worked reliably for a while in other projects like poetry), although I'm likewise not entirely sure about all the edge case in regards to absolute/relative paths are being handled correctly here.
If our project root directory is matched by an exclude pattern we should assume the pattern is invalid as our project root is likely in an ignored build directory of another project.
Should fix: #1641
This is essentially a port of the same technique I used to fix a similar poetry core bug.