radiasoft / pykern

Apache License 2.0
5 stars 7 forks source link

Convert setup.py to pyproject.toml #448

Closed robnagler closed 7 months ago

robnagler commented 7 months ago

I have spent a good amount of time working with hatch.

A lot of it was learning about pyproject.toml and figuring out how to keep our chronological versioning system. A lot of pksetup.py can be gotten rid of, because we do testing externally to the Python packaging ecosystem. Much of what people seem to want is being able to test packages in a "pure python" way using virtual environments.

This is the first problem area I found. You can't easily test a build plugin with the way that hatch works. edd816d shows what you have to to identify the plugin in another package. This is probably a problem with the way all build backends work. It's not a huge problem, but it was hard to figure out why things are not visible. The build environment is in a separate virtual environment so that it can be pure, and this makes sense to a degree, but it makes it hard to install dependencies. pkg_resources was not being found while testing the plugin, and it took quite a while to see that the virtual environment was using part of the pyenv libraries, but not site-packages which was being removed from sys.path.

One probable-showstopper is that Hatch does not support does not support gitignore properly. See https://github.com/pypa/hatch/issues/1273#issuecomment-1973809882. I looked at the Poetry code, and it seems to do the right thing, but it needs to be tested. When you are editing a file and there is a .# emacs file, it will cause hatch build to crash. The tilde files end up in development wheels, too.

Another problematic issue (at least while writing plugins) is that the build environment is cached, and I had to run a find to figure out it was put here: ~/.local/share/hatch/env/virtual/pykern/QSZHMSgZ/pykern-build/lib/python3.9/site-packages/rschronver-20240308.225804.dist-info. This is non-obvious since there is ~/.cache/hatch. It seems like it would be better to store it locally with the Python project, instead of globally. There is no command to clear the cache or even to let you know where it is.

In general, I found the documentation for Hatch to be poor for writing plugins. This may not be a big issue, but I have been running many experiments to figure out how things work. Hatch (or pip) sucks up stderr and stdout so I ended up having to write to /dev/tty. This is particularly annoying.

Another thing to consider is whether we should switch to a src layout for files. Here are some good arguments some of which are based on bad policy in Python (including "." automatically in sys.path). It wouldn't be a big change, because I don't think we rely on this much.

pip install -e . uses pth files, which is probably fine. It doesn't use a single pth file like easy install, but the pth file does get in the path. I think this is suboptimal in comparison to egg links, which are not added to sys.path.

Creating a source distribution includes everything unless you explicitly exclude it (apart from the gitignore global problem). This means .github/workflows/python-ci.yml gets included, which is annoying. There may be ways to exclude globally.

There's a lot to learn with modern Python packaging. At this point, I think it's safe to say that Hatch isn't going to work. I'm going to look at Poetry, which is non-standard, but widely used. We can continue to use a pyproject.toml. I just need to figure out how to manage versions dynamically.

robnagler commented 7 months ago

@e-carlin had a good idea: survey other projects pyproject.toml files. setuptools as the backend is far and away the most used (ansible, arelle, django, home-assistant, invenio, jupyterhub, pytorch, scikit-learn, trio). setup.py still used by keras and zope. jupyterlab uses hatchling, and they have their own version plugin (tool.jupyter-releaser.options.version-cmd = "jlpm bumpversion --force --skip-commit"). jupyterhub uses tool.tbump which is 45 lines of boilerplate to bump a version number automatically.

Looks like I'll figure out setuptools.

ofek commented 7 months ago

Hey there! I just came across this issue and wanted to note a few things to help out.

You can't easily test a build plugin with the way that hatch works.

This is possible and I wrote a how-to page about that: https://hatch.pypa.io/latest/how-to/plugins/testing-builds/

As you mentioned, any such friction would be experienced by any backend.

One probable-showstopper is that Hatch does not support does not support gitignore properly. See https://github.com/pypa/hatch/issues/1273#issuecomment-1973809882.

This has been fixed and released https://github.com/pypa/hatch/pull/1317

Another problematic issue (at least while writing plugins) is that the build environment is cached, and I had to run a find to figure out

That is correct, the caching is for performance. You can control the location of environments but you're right that the build environments are intentionally somewhat hidden because my thought was that I didn't want to confuse people by dumping an environment in their directory that was for internal use. I am definitely open to improvements there! Would you mind opening a feature request or discussion?

In general, I found the documentation for Hatch to be poor for writing plugins. [...] Hatch (or pip) sucks up stderr and stdout so I ended up having to write to /dev/tty. This is particularly annoying.

I'm sorry about the state of the documentation. Every option and feature is documented but you are correct that there is a dearth of guides and tutorials. This is an active effort we are improving! https://github.com/pypa/hatch/issues/1245

Output shouldn't be swallowed, can you please explain what command you're running exactly?

There's a lot to learn with modern Python packaging. At this point, I think it's safe to say that Hatch isn't going to work.

Since the issue you mentioned is now fixed do you still feel that way? That is surprising to me and I want to improve Hatch/Hatchling as much as possible so that everyone's use case is satisfied. Please let me know 🙂

setuptools as the backend is far and away the most used [...] Looks like I'll figure out setuptools.

I would strongly recommend to not use download count as a heuristic in determining what is best for you. Please see this: https://hatch.pypa.io/latest/why/#build-backend

robnagler commented 7 months ago

Hey there! I just came across this issue and wanted to note a few things to help out.

Thanks for the thoughtful comments and feedback, @ofek!

You can't easily test a build plugin with the way that hatch works. This is possible and I wrote a how-to page about that: https://hatch.pypa.io/latest/how-to/plugins/testing-builds/

Thanks for this.

Since you seem to be looking for suggestions... Create a test support module. It should not be bound to pytest, and add pytest fixtures if you like. The main test support should be simple functions (contextmanagers are fine) that get called by a client independent of their test infrastructure so that we (the plugin developers) don't have to copy-and-paste, which is error prone and not fixable with new Hatch releases.

As you mentioned, any such friction would be experienced by any backend.

"Plugin" development is worse with setuptools in my limited experience.

BTW, there's no guarantee that tmp_path is a unique name. In PyKern, pkunit recreates the same directory. This has many advantages, which I won't go into here. This is another reason why the cache should off or local to the build dir by default.

This has been fixed and released https://github.com/pypa/hatch/pull/1317

Thanks again. Unfortunately, this is not quite right as this doesn't consider gitignore_global, which contains things in our world like *~ (emacs autosave files) and many others. That was the problem I ran into: hatch picked up a .#* file (emacs unsaved autosave) which caused the build to fail.

That is correct, the caching is for performance. You can control the location of environments but you're right that the build environments are intentionally somewhat hidden because my thought was that I didn't want to confuse people by dumping an environment in their directory that was for internal use. I am definitely open to improvements there! Would you mind opening a feature request or discussion?

https://github.com/pypa/hatch/issues/1333

I'm sorry about the state of the documentation. Every option and feature is documented but you are correct that there is a dearth of guides and tutorials. This is an active effort we are improving! https://github.com/pypa/hatch/issues/1245

Thanks, and I totally understand.

Output shouldn't be swallowed, can you please explain what command you're running exactly?

Sorry, I don't remember. I was testing an early version of the plugin. print or sys.stderr.write wasn't in the output so open("/dev/tty").write. These were diagnostics I was putting into hatch source itself to figure out what was going on.

Since the issue you mentioned is now fixed do you still feel that way? That is surprising to me and I want to improve Hatch/Hatchling as much as possible so that everyone's use case is satisfied. Please let me know 🙂

I appreciate your effort, truly. We are moving to pyproject.toml with setuptools, however. You are not going to satisfy us, because we have constraints/history you can't know or address. One of those constraints is time in my case. I can't spend any more time on this right now. We have not closed the door. We design our pyproject.toml files to be as minimal as possible so we can put policy in packages that don't need to be edited in all projects when policies change.

This, BTW and IMIHO, is probably my biggest complaint about PyPA: way too much boilerplate. A de minimis pyproject.toml file should be empty. You shouldn't even need a name, because for most projects that can be introspected easily. You might want to specify an author, but that could be part of a standard configuration which is referenced dynamically as opposed to being embedded in every project. Change happens, and boilerplate just gets in the way of that.

Also, "everyone's use case is satisfied" is nice thought, and it's going to be a very difficult goal.

There are many things about Hatch I really like, which is why I started with it over all the other package managers currently available. Setuptools is very complicated, and it doesn't have a real plugin architecture.

setuptools as the backend is far and away the most used [...] Looks like I'll figure out setuptools.

I would strongly recommend to not use download count as a heuristic in determining what is best for you. Please see this: https://hatch.pypa.io/latest/why/#build-backend

I didn't use download count. I searched pyproject.toml files of projects we use and other large projects we know.

Thanks again for your thoughtful work, and for hatching Hatch. Keep up the great work!