pypa / hatch

Modern, extensible Python project management
https://hatch.pypa.io/latest/
MIT License
6.04k stars 307 forks source link

META: Hatch documentation upgrade #1245

Open lwasser opened 9 months ago

lwasser commented 9 months ago

Following the discussion here, we discussed with @ofek @dahandv upgrading the hatch docs to include more how-to and tutorial style elements to help users get started with hatch.

This also related to this issue opened by @pfmoore about using the Diátaxis framework.

In this issue we can iterate around what the structure of tutorials vs how-tos should look like and what we wish to create / develop further to help hatch users. I'll attempt to track comments below and update the main outline here as the discussion progresses.

I'll also try to here and there scan issues and discussions to identify pain points and get users involved in the upgraded content reviews :) @dahandv

Note that we also are working on tutorials at pyOpenSci which we could link to / use as needed here. Here is a tutorial on publishing to PyPI using hatch.

I'm starting the discussion here but probably can't work out a full outline now. Please add comments about other tutorials / how to's that you'd like to see and i will update this header comment as needed. (or @ofek obviously you can always edit it too!).

Screenshot 2024-02-05 at 5 56 51 PM

Hatch How To's

Hatch Tutorials

DahanDv commented 8 months ago

@lwasser hey! Sorry for being off line for so long! I'm working on a reduced guide for diataxis so new contributions can get up to speed when they wish to contribute to any kind of documentation! (Instead of letting them figure this out themselves which can be a turnoff for some; the diataxis official guide is wordy and repetitive IMO!) I will link this here today (I hope!) your review and comments will be appreciated ❤️

lwasser commented 8 months ago

looking forward to seeing what you pull together @DahanDv !

polarathene commented 6 months ago

It would be good to document advice on usage within Docker (this was requested in the past).

Is just installing hatch via curl and running hatch --version expected to add over 400MB in disk usage? If python is available, one can install pipx to get hatch which is less heavy, but uv seems to be pulled with these two install methods too regardless if you'd use it? (IIRC in one case it was about 30MB while the other had about 90MB of data related to uv).

I had seen in the docs a brief note/admonition about standalone/installers not being able to detect/use an existing python install, thus pulling in a standalone version of python? (I had attempted to avoid this with a config.toml, but it didn't seem to help reduce weight)

If 150-400MB is to be expected, it might be worthwhile to raise some awareness there. At least with an endorsed approach for using hatch within a container, that expectation of disk weight would be clearer :)

lwasser commented 5 months ago

i am not sure if i can help here or not but chiming in. i just played with this quickly locally. when i created a docker container with python / pip in it it automatically increased the container size but about 330mb.

my question: if python is not installed on a user's system and you install hatch, will it by default now try to install python now that it supports uv?

i wonder if this should be another issue where folks chime in but also i wonder if anyone has worked with a docker container with some version of python already installed to see if there is a difference in the size of the container when running hatch --version (as a way to potentially tease out the need for python to be installed and how is't setup most efficiently in a container vs. hatch's default behavior).

please excuse this comment if it's totally off base. it does seem like docs around this would be useful!

ofek commented 5 months ago

I will respond to a few comments at the same time:

lwasser commented 5 months ago

@ofek to clarify

the example above referred to this issue comment.

which had this docker setup:

$ docker run -it ubuntu:22.04 bash

$ apt update && apt install -y curl
$ curl -sSfL https://github.com/pypa/hatch/releases/download/hatch-v1.10.0/hatch-1.10.0-x86_64-unknown-linux-gnu.tar.gz | tar -xz
$ mv hatch-1.10.0-x86_64-unknown-linux-gnu/usr/local/bin/hatch
$ du -shx /
144M    /

$ hatch --version
Hatch, version 1.10.0

$ du -shx /
558M    /

in this case a user is

So nothing is run - yet. Then the hatch binaries are moved into a new location so hatch can be called.

To me it makes sense based on what you wrote above that in this specific case, when you run hatch --version it will first download python. And that Python download accounts for the increase in size of the container.

The alternative approach would be for someone to create a docker container that first installs python or inherits from another container on dockerhub that contains python.

Is that interpretation correct? and if it is, would it make sense to create a small how to (or add doc enhancements elsewhere)? i'm happy to help create a very basic example of this that others could enhance / build off of.

lwasser commented 5 months ago

here is a repro example. i definitely saw it install python and hatch when i ran hatch --version. NOTE: i'm on a mac so using a different release distro below compared to the example referred to above! But a small cleanup step did reduce the size.


$docker run -it ubuntu:22.04 bash

root@bd1bf5df743c:/# apt update && apt install -y curl
root@bd1bf5df743c:/# mv hatch-1.10.0-aarch64-unknown-linux-gnu /usr/local/bin/hatch
root@bd1bf5df743c:/# du -shx
131M    .
root@bd1bf5df743c:/# hatch --version
Hatch, version 1.10.0
root@bd1bf5df743c:/# du -shx
361M    .
root@bd1bf5df743c:/# rm -rf /var/lib/apt/lists/*
root@bd1bf5df743c:/# du -shx
316M    .
ofek commented 5 months ago

Yes that is actually expected as I mention in my first bullet point. Hatch binaries are built with PyApp and bootstraps itself on the first run. If you already have Python available and want to cut down on disk space then I would recommend installing manually.

ofek commented 5 months ago

I might be able to shave some MBs off given a new release of the binaries and docs on enabling the option.

lwasser commented 5 months ago

fantastic. Ofek would a small "how to" or tutorial about creating a docker environment be useful in the docs? i am not a docker expert but i could atleast capture the information here for folks to use.

maybe @polarathene (if you are up for it) could review and provide input as well?

ofek commented 5 months ago

Yes that would be quite helpful! I wouldn't have time to add that new feature until after PyCon though.

polarathene commented 5 months ago

I looked into it a bit, here's my findings, hope it's helpful 👍

FWIW, keeping it simple and focused/familiar for most Docker users (that is those less experienced) is probably best. I wouldn't stress too much on size as you can see in the examples below you won't save too much with the added effort, but it's possible 👍

If you write something up and contribute a PR feel free to ping me and I'll try provide a review if I have the time :)


TL;DR:

There's also the route of having a Dockerfile added to this repo, and optionally a GH Actions workflow that automates publishing images to DockerHub / GHCR with the release CI. Most users would likely be happy using a base image with hatch, unless they need to install system packages and have a particular preference (often this is ubuntu or debian for the familiar apt command they'll come across online on sites like StackExchange/StackOverflow).


NOTE: du -shx reports the total size of the location in MiB (1024^2, not MB: 1000^2, which would be -sx --si_). So the M value in output is MiB.


Install approaches

Package Manager (122 MiB)

$ docker run --rm -it quay.io/fedora/fedora-minimal:41 bash
$ du -shx /
126M    /

$ dnf5 install -y --setopt=install_weak_deps=0 hatch
Transaction Summary:
 Installing:       75 packages
 Upgrading:         5 packages
 Replacing:         5 packages

Total size of inbound packages is 33 MiB. Need to download 33 MiB.
After this operation 122 MiB will be used (install 124 MiB, remove 2 MiB).

# Extra is from package manager cache:
$ du -shx /
297M    /

# Clean up package manager cache:
$ dnf5 clean all
Removed 12 files, 7 directories. 0 errors occurred.

# Thus total 122 MiB added weight:
$ du -shx /
248M    /

Standalone installer (4MiB installs to 400+ MiB)

$ docker run --rm -it quay.io/fedora/fedora-minimal:41 bash

# Fedora image already has curl, just needs tar + gzip to extract:
$ dnf5 install -y tar gzip && dnf5 clean all

# As the tar.gz contains only a single file, we can write the output to the preferred location directly:
$ curl -sSfL https://github.com/pypa/hatch/releases/download/hatch-v1.10.0/hatch-1.10.0-x86_64-unknown-linux-gnu.tar.gz \
   | tar -xzO > /usr/local/bin/hatch && chmod +x /usr/local/bin/hatch

# Before triggering install:
$ du -shx /
131M    /

# 410+ MiB added weight from install:
$ hatch --version && du -shx /
544M    /

Now as a Dockerfile, build the image for better insight into layer for hatch --version via the dive CLI tool to see where all that weight is coming from:

FROM quay.io/fedora/fedora-minimal:41
RUN dnf5 install -y tar gzip && dnf5 clean all
RUN curl -sSfL https://github.com/pypa/hatch/releases/download/hatch-v1.10.0/hatch-1.10.0-x86_64-unknown-linux-gnu.tar.gz \
   | tar -xzO > /usr/local/bin/hatch && chmod +x /usr/local/bin/hatch
RUN hatch --version
# In dir with `Dockerfile` above:
docker build --tag local/hatch .
docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock wagoodman/dive local/hatch

image

image

# Overview of the biggest sources of that weight:
61MB => /root/.cache/pyapp/distributions/14656550572188801628
32MB => /root/.cache/pyapp/uv
229MB => /root/.cache/pyapp/distributions/_14656550572188801628/python/lib/
- 189 MB => libpython3.12.so.1.0
- 26 MB => python3.12
91MB => /root/.cache/uv

Considering that's all in the /root/.cache dir and nowhere else, it's not obvious what is safe to remove without breaking any assumptions from hatch?

pipx install hatch (165 MiB, 48 MiB for pipx, 117 MiB for hatch + bundled uv)

$ docker run --rm -it quay.io/fedora/fedora-minimal:41 bash
# Install pipx with python3:
$ dnf5 install -y pipx && dnf5 clean all

Transaction Summary:
 Installing:       17 packages

Total size of inbound packages is 13 MiB. Need to download 13 MiB.
After this operation 48 MiB will be used (install 48 MiB, remove 0 B).

# Pre-install weight (ignoring pipx 48 MiB):
$ du -shx /
184M    /

$ pipx install hatch uv
$ export PATH="${PATH}:/root/.local/bin"

# Post-install weight:
$ du -shx /
365M    /

$ hatch --version
Hatch, version 1.10.0
$ uv --version
uv 0.1.42

# No change, huzzah!
$ du -shx /
365M    /

You could of course make uv available a few ways:

FROM quay.io/fedora/fedora-minimal:41
RUN dnf5 install -y pipx && dnf5 clean all
# Hatch bundles uv:
RUN pipx install hatch && rm -rf /root/.cache/
# Effectively what `pipx ensurepath` accomplishes to make the hatch command available:
ENV PATH="${PATH}:/root/.local/bin"
# One of many ways to use the internal uv installed with hatch:
RUN ln -s /root/.local/share/pipx/venvs/hatch/bin/uv /usr/local/bin/uv
# Verify both commands work:
RUN hatch --version && uv --version

image

image

Advanced: FROM scratch multi-stage (roughly 210 MiB total image size)

# syntax=docker.io/docker/dockerfile:1

FROM quay.io/fedora/fedora-minimal:41 AS base-stage

# The <<EOF (start) and later EOF (end) markers are HereDoc syntax
# Allows for a RUN directive to more nicely run multiple commands in a single layer
RUN <<EOF
  dnf5 --installroot /rootfs --use-host-config --setopt=install_weak_deps=0 install -y pipx
  dnf5 --installroot /rootfs --use-host-config --setopt=install_weak_deps=0 clean all

  # This works since bash was implicitly installed into the new root fs
  # NOTE: DNF was not included, so it is not available once we switch via chroot.
  # For DNS lookups like `pipx install` needs, we'll also need to provide `/etc/resolv.conf`
  cp /etc/resolv.conf /rootfs/etc/resolv.conf
  # chroot is a bit awkward in a Dockerfile, using SHELL directive or after the COPY on scratch
  # may be more convenient?
  chroot /rootfs bash -c 'pipx install hatch && rm -rf /root/.cache/'
  chroot /rootfs ln -s /root/.local/share/pipx/venvs/hatch/bin/uv /usr/local/bin/uv
EOF

FROM scratch
ENV PATH="${PATH}:/root/.local/bin"
COPY --link --from=base-stage /rootfs /
RUN hatch --version && uv --version

Throughout my examples I've used quay.io/fedora/fedora-minimal:41, this is a beta image where dnf5 is built-in. Previously on minimal images it'd be microdnf, but once Fedora 41 is released both the minimal image and regular fedora (eg: fedora:41) will have dnf5 as the usual dnf command (finally!). fedora-minimal has a smaller base, but it does make some compromises (for example try running btop, it needs a little extra nudge on your part), I think the UX (at least interactively?) goes down a bit, so I'd generally suggest the regular fedora images, and it should make little difference with this --installroot approach.

Like Fedora, the openSUSE TumbleWeed image is still on hatch 1.9.x, thus both hatch packages are 30 MiB shy of what they'd actually be with uv involved. When that lands you'll get a more minimal/simpler scratch, but honestly the size isn't that big of a win here:

# syntax=docker.io/docker/dockerfile:1

FROM opensuse/tumbleweed AS base-stage

RUN <<EOF
  zypper --releasever tumbleweed --installroot /rootfs --gpg-auto-import-keys refresh
  zypper --releasever tumbleweed --installroot /rootfs --non-interactive install --download-in-advance --no-recommends python311-pipx

  # Cleanup doesn't make a difference in this case (zypper keeps most cache on the main root), but this is how you'd do it:
  # NOTE: If you care about this base-stage image layers you could clear the main root cache without the `--releasever --installroot` args
  # zypper --releasever tumbleweed --installroot /rootfs-h --non-interactive clean --all

  # No need to worry about the /etc/resolv.conf if you're not doing any network stuff via chroot
  # At runtime of the container Docker will replace it to manage networking itself.
EOF

FROM scratch
COPY --link --from=base-stage /rootfs /
RUN hatch --version

NOTE: If you try to do the pipx install with the opensuse image you'll find that it fails with the rm and ln commands not existing. Those are packages that weren't needed for pipx, but are required to do those extra steps so you'd need to add them. Fedora on the other hand still installs those basic utility commands.

Alpine (roughly 180 MiB total image size)

Smallest by about 30-40 MiB, fairly simple but Alpine with musl does have some caveats to be mindful of.

# syntax=docker.io/docker/dockerfile:1

FROM alpine
RUN <<EOF
  apk add --no-cache pipx
  pipx install hatch && rm -rf /root/.cache
  ln -s /root/.local/share/pipx/venvs/hatch/bin/uv /usr/local/bin/uv
EOF
ENV PATH="${PATH}:/root/.local/bin"
RUN hatch --version && uv --version

For minimizing size

ofek commented 5 months ago

Thank you for the fantastic writeup!

As of https://github.com/pypa/hatch/releases/tag/hatch-v1.11.0, the binaries pull down distributions that already have Hatch installed which is about as small as I can make that. This is what the official GitHub action to install Hatch will use when I have time to do so.

There is also a new self cache command so after installation you would want to run hatch self cache dist --remove and now all that will exist will be the distribution with Hatch that is tied to the binary. The following is an example:

❯ docker run --rm -it ubuntu bash
root@c8f3aacf6229:/# apt update && apt install -y --no-install-recommends curl ca-certificates
root@c8f3aacf6229:/# du -shx
127M    .
root@c8f3aacf6229:/# curl -LO https://github.com/pypa/hatch/releases/latest/download/hatch-x86_64-unknown-linux-gnu.tar.gz
root@c8f3aacf6229:/# tar xzf hatch-x86_64-unknown-linux-gnu.tar.gz
root@c8f3aacf6229:/# ./hatch self restore
root@c8f3aacf6229:/# rm hatch-x86_64-unknown-linux-gnu.tar.gz
root@c8f3aacf6229:/# ./hatch self cache dist -r
root@c8f3aacf6229:/# du -shx
470M    .
ofek commented 5 months ago

Actually forget what I said please, I'm about to reduce that substantially.

ofek commented 5 months ago

Done!

image

lwasser commented 5 months ago

amazing!! ofek, with pycon travel coming up i won't be able to start a tutorial / how to until after i'm back! but also @polarathene you've provided an INCREDIBLE amount of information above and i suspect / know :) that you know a lot more about this topic than i do. would you like to start a tutorial and i can perhaps contribute? or would you like for me to start / try my best to reflect what you have found and then you can review/ contribute / add that way?

it just seems to me that there is so much information in this thread now, that we should capture it and turn it into a documentation page for others to discover!

lwasser commented 5 months ago

ofek that is a considerable reduction in image size!! so so awesome!!

polarathene commented 5 months ago

Cheers for the improvement @ofek ! 🥳 (EDIT: It seems there are some gotchas to consider vs a pipx install hatch)

The below notes are mostly for my benefit to come back to, but sharing with others if helpful. I'll summarize with a TLDR in a follow-up comment.

Collapsed for brevity (click to view) ## Layer insights: ![image](https://github.com/pypa/hatch/assets/5098581/3010375a-a5fb-46bc-abe4-b89df6377054) ![image](https://github.com/pypa/hatch/assets/5098581/e19a5b59-2991-4a83-a1af-fdf835d3257e) --- `hatch self restore` size was is almost equivalent to `hatch --version` (near 200MB added), just 4 MB less. `hatch self cache dist --remove` removes 47MB of that added weight from `~/.cache/pyapp`, so you can remove this dir afterwards or leave it with the empty content: ![image](https://github.com/pypa/hatch/assets/5098581/b7a1461f-22ad-42d3-94d1-c69b09cf5de2) Actual hatch lives as a python script at `/root/.local/share/pyapp/hatch/1303662642487178586/1.11.0/python/bin`, but still relies on the binary extracted from `curl` AFAIK to run (_as even with a local python install to run that script directly it is not happy_), so move the installer binary to a location like `/usr/local/bin/hatch` 👍 ### `.pyc` / pycache content The final `RUN` layer shows that the `hatch --version` command added about 3MB, and that it's due to running python creating various `.pyc` cache files like this: ![image](https://github.com/pypa/hatch/assets/5098581/a5ad5866-02bd-49c5-a91d-d134ace7cfc4) [`PYTHONPYCACHEPREFIX=/path/to/cache`](https://docs.python.org/3/using/cmdline.html#envvar-PYTHONPYCACHEPREFIX) is meant to allow customizing the cache dir for this content since Python 3.8, but for some reason in my `Dockerfile` ENV it wasn't having any effect 🤷‍♂️ (_it does for a system `pipx install uv`, so presumably this is due to `hatch` using the bundled Python?_) ## `Dockerfile` 3 examples, with the first a little bit better documented and avoiding `&&`. ```Dockerfile # syntax=docker.io/docker/dockerfile:1 FROM fedora:40 RUN < /usr/local/bin/hatch chmod +x /usr/local/bin/hatch # Finish installing hatch, then remove the redundant PyApp cache: hatch self restore hatch self cache dist --remove EOF ``` ```Dockerfile # syntax=docker.io/docker/dockerfile:1 FROM quay.io/fedora/fedora-minimal:41 RUN < /usr/local/bin/hatch && chmod +x /usr/local/bin/hatch hatch self restore && hatch self cache dist --remove EOF ``` ```Dockerfile # syntax=docker.io/docker/dockerfile:1 FROM ubuntu:24.04 RUN < /usr/local/bin/hatch && chmod +x /usr/local/bin/hatch hatch self restore && hatch self cache dist --remove EOF ``` **Technical details** if any some of the stuff I did is unfamiliar: - `syntax=docker.io/docker/dockerfile:1` is a good practice encouraged by docker on their docs. - The `< /path/filename`). This avoids needing another `$(uname -m)` or `mv`, and technically corrects permissions (UID is 1001, GID is 127), but writing the contents to a new file (`>`) lost the original executable bit (`+x`), which needs to be restored. **Total size** via `du -sx --bytes --si /` (_Hatch adds: 140MB `/root/.local/share/pyapp/hatch` + 4MB `/usr/local/bin/hatch`_): - `fedora-minimal:41` (263MB / `262 414 865`) + 18s image build without cache - `fedora:40` (366MB / `365 360 517`) + 22s image build without cache - `ubuntu:24.04` (230MB / `229 171 530`) + 45s image build without cache (_232MB + 67s for `ubuntu:22.04`, 236MB + 82s for `ubuntu:20.04`_) I tend to prefer Fedora as a base as it's faster and better UX with the package manager, but most users may have a better UX with Ubuntu images, especially when they need to add additional system packages (_this is sometimes inconvenient with Fedora for proprietary packages like nvidia or certain video codecs IIRC_). Ubuntu has the better image size in this case. It's smaller than `fedora-minimal:41` (_which I show for size + build speed comparison, but I encourage regular fedora base until `fedora-minimal` shares the same `dnf` command instead of `microdnf` / `dnf5`, which might happen by the final Fedora 41 release_). --- ### GH release URLs naming convention change from `v1.11.0` The above curl example is for the latest release on GH. If you want to version pin the release file dropped the version prefix since `1.11.0`, so not too relevant going forward (_at least hopefully it'll remain consistent from now on, omitting the version prefix is convenient for the `latest` approach_): ``` https://github.com/pypa/hatch/releases/latest/download/hatch-x86_64-unknown-linux-gnu.tar.gz https://github.com/pypa/hatch/releases/download/hatch-v1.11.0/hatch-x86_64-unknown-linux-gnu.tar.gz https://github.com/pypa/hatch/releases/download/hatch-v1.10.0/hatch-1.10.0-x86_64-unknown-linux-gnu.tar.gz ``` --- ### GH release variants You also have in addition to the glibc target (`-gnu`), a `-musl` one. For anyone interested on the glibc linking: ```console # These must resolve (and they usually should in a glibc focused distro): $ ldd /usr/local/bin/hatch linux-vdso.so.1 (0x00007ffc2d9e0000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f74e59d3000) librt.so.1 => /lib64/librt.so.1 (0x00007f74e6003000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f74e5ffe000) libm.so.6 => /lib64/libm.so.6 (0x00007f74e58f0000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f74e5ff9000) libc.so.6 => /lib64/libc.so.6 (0x00007f74e5703000) /lib64/ld-linux-x86-64.so.2 (0x00007f74e600d000) # Binary built with Rust 1.78 (latest) and a rather old Ubuntu which suggests `cross-rs` Docker image environment: $ readelf -p .comment /usr/local/bin/hatch String dump of section '.comment': [ 0] GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609 [ 35] rustc version 1.78.0 (9b00956e5 2024-04-29) # Probably built this way for broader compatibility by targeting a low glibc version, # cargo zigbuild is a more modern approach that can be used instead: # Command from my comment here: https://github.com/rust-cross/cargo-zigbuild/issues/231#issuecomment-1987845738 $ readelf -W --version-info --dyn-syms /usr/local/bin/hatch \ | grep 'Name: GLIBC' \ | sed -re 's/.*GLIBC_(.+) Flags.*/\1/g' \ | sort -t . -k1,1n -k2,2n | tail -n 1 2.18 # The equivalent for the static linked musl build (old GCC, March 2021): $ readelf -p .comment /usr/local/bin/hatch String dump of section '.comment': [ 0] GCC: (GNU) 9.4.0 [ 11] rustc version 1.78.0 (9b00956e5 2024-04-29) [ 3d] GCC: (GNU) 9.2.0 ``` - However the GH releases only publish `-musl` for `x86_64`, thus if you want to support ARM64 (`aarch64`), just use `-gnu`. - This also means `-musl` via this install method will only work for Alpine with `x86_64` (_not that you should be using Alpine for python deployments anyway_ 🤔 ) Also from `1.11.0` of hatch, there is "dist" variants, which the release page doesn't add clarification to - but extracting these results in 150MiB of content: `hatch` + `uv` + `hatchling` and a bundled Python 3.12. Perhaps related to the improvement @ofek mentioned above? ## Feedback So with the above improvement, `curl` is a great install option with about 140MB weight 🎉 (_100MB for bundled Python + 30MB for bundled `uv`_) It'd be neat if you could opt-out of the bundled Python and `uv` options if `hatch` can instead detect and use the ones available from the system after the boot strapping is done? `hatch` doesn't seem to be aware of it's own bundled distribution though, so I assume that isn't possible? ```console # Running this command installed $ hatch python find 3.12 Distribution not installed # Hatch doesn't consider this as a managed python install, it treats the bundle like a system one? $ hatch python show Available ┏━━━━━━━━━━┳━━━━━━━━━┓ ┃ Name ┃ Version ┃ ┡━━━━━━━━━━╇━━━━━━━━━┩ │ 3.7 │ 3.7.9 │ ├──────────┼─────────┤ │ 3.8 │ 3.8.19 │ ├──────────┼─────────┤ │ 3.9 │ 3.9.19 │ ├──────────┼─────────┤ │ 3.10 │ 3.10.14 │ ├──────────┼─────────┤ │ 3.11 │ 3.11.9 │ ├──────────┼─────────┤ │ 3.12 │ 3.12.3 │ ├──────────┼─────────┤ │ pypy2.7 │ 7.3.15 │ ├──────────┼─────────┤ │ pypy3.9 │ 7.3.15 │ ├──────────┼─────────┤ │ pypy3.10 │ 7.3.15 │ └──────────┴─────────┘ # Installing it adds another 160MB: $ hatch python install 3.12 Installed 3.12 @ /root/.local/share/hatch/pythons/3.12 The following directory has been added to your PATH (pending a shell restart): /root/.local/share/hatch/pythons/3.12/python/bin $ du -shx / 445M / ``` Not a major concern, and I may be unfamiliar with a way to configure that, but something to be aware of as if you want to `pip install ...` something, AFAIK that requires bringing in another python install (_either via distro system package, `hatch python install `, or implicitly via `pyproject.toml` / `hatch.toml`, etc_)... so the above is perhaps not as minimal / convenient as the `pipx` approach? ## Gotchas I assume once installing actual python packages or similar activity, another install of Python is going to add to the weight? `hatch` isn't able to use the one it's bundled? (_**EDIT:** Documented below, it's possible for virtual env to use the same Python bundled_) [Docs for `hatch shell`](https://hatch.pypa.io/latest/cli/reference/#hatch-shell) are a bit lacking here: ```console $ hatch shell --help Usage: hatch shell [OPTIONS] [ENV_NAME] Enter a shell within a project's environment. Options: --name TEXT --path TEXT -h, --help Show this message and exit. ``` From what I've seen elsewhere `--name` refers to a version of Python as listed under the `Name` column in `hatch python show`? - The `hatch python find` CLI help also was not that clear when referring to an expected arg of `NAME` btw. Including an example in the help output might be better UX, or just the associated web docs (_where it's also vague_). That would make it less guesswork that it's meant to be a value from `hatch python show`. - The CLI `--help` also [doesn't show defaults](https://hatch.pypa.io/latest/cli/reference/#hatch-python-show) like the web docs do. - The web docs could link to [this section](https://hatch.pypa.io/latest/plugins/environment/virtual/#internal-distributions) perhaps for the supported versions? While the CLI could mention they're listed in `hatch python show`? These two sections from the web docs are a little insightful about what I was after: - https://hatch.pypa.io/latest/plugins/environment/virtual/#options - https://hatch.pypa.io/latest/plugins/environment/virtual/#python-resolution There's an ENV **`HATCH_PYTHON`**, which doesn't appear to be documented elsewhere? (_I tried the docs search box_). It mentions a **value of `self`** can be used, which is not valid for `--name` or `--path` with `hatch shell`, but it is as an ENV. **This prevents install an extra copy of Python**. `hatch shell --name` does not appear to be a name related to a Python version however. ### Caution: Extra Python expected by default The first virtual environment adds about 20MB, subsequent ones around 8MB. If there is no other Python detected, `hatch` downloads a new one which seems to add another 150MB? You can avoid that with the `HATCH_PYTHON=self` ENV as mentioned above. ```console du -sx --bytes --si / 263M / # 3-4MB increase: $ hatch --version Hatch, version 1.11.0 $ du -sx --bytes --si / 266M / # Environment added, 18MB increase: $ cd /tmp && HATCH_PYTHON=self hatch shell $ du -sx --bytes --si / 284M / $ exit # No excess when using without the ENV: $ hatch shell $ du -sx --bytes --si / 284M / # Different location creates a new environment. # This time since ENV is omitted it's created by bringing in Python 3.12 again: $ cd /opt && hatch shell du -sx --bytes --si / 448M / ``` ### Inconsistency within virtual environment due to `PATH` ENV The curl install approach differs from `pipx` / package install in a notable way. - Perk: You can share `uv` command in the environment without any extra steps (like symlinking). - Con: You can't use `hatch` command within the environment, unless you provide an absolute path to the proper command (_`/usr/local/bin/hatch`, which was already discoverable in PATH_) - These differences apply regardless of `HATCH_PYTHON` (only affects the virtual env), the difference is due to an extra PyApp addition into the `PATH` ENV, thus `hatch` from that location has priority over your actual `hatch` binary 🤷‍♂️ ```console $ echo $PATH /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin $ cd /tmp && hatch shell --name 3.11 $ python --version Python 3.12.3 # Fails due to modified PATH: $ hatch --version bash: /root/.local/share/pyapp/hatch/1303662642487178586/1.11.0/python/bin/hatch: cannot execute: required file not found $ /usr/local/bin/hatch --version Hatch, version 1.11.0 # UV is available however: $ uv --version uv 0.1.44 # hatch environment and hatch install location are given precedence for resolving binaries: $ echo $PATH /root/.local/share/hatch/env/virtual/opt/y8366zdl/opt/bin:/root/.local/share/pyapp/hatch/1303662642487178586/1.11.0/python/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin ``` `dnf install python` adds 50 MB and can also be used by `HATCH_PYTHON` ENV instead of `self`. This ENV affects the linked python in the virtual env PATH, which adds a symlink to that location. It seems unnecessary though as when Python is already installed on the system already like this, `hatch` detects that and will use it by default. Contrasting to a `pipx` / package install, where python is externally available to hatch, it too will create using that Python by default. You'll find that the `PATH` ENV isn't altered in the same way, `hatch --version` will work in the environment while `uv` will not: ```console $ dnf install -y pipx && pipx install hatch $ cd /tmp && hatch shell $ hatch --version Hatch, version 1.11.0 $ uv --version bash: uv: command not found env | grep PATH PATH=/root/.local/share/hatch/env/virtual/tmp/6WcazSRI/tmp/bin:/root/.local/bin:/root/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin ``` I assume this difference isn't intentional?
polarathene commented 5 months ago

Summary of prior message

Still a tad long, see the prior message for more details.

GH Releases:

Standalone installer depends on external python despite bundling it's own:

Standalone installer prepends it's bin location to PATH ENV:

Docs (Web / CLI help) need love:

polarathene commented 5 months ago

@lwasser I've got a bit to juggle elsewhere, but I'd be happy to review a PR when I can spare the time.

I am not that experienced with Python, but I know Linux and Docker very well! If you've got any questions feel free to reach out 👍

I think most of the info I've covered above doesn't really need to go into the docs. It was more about exploring what options were available and the tradeoffs 😎

Dockerfile example

Decisions made:

# syntax=docker.io/docker/dockerfile:1

FROM ubuntu:24.04
RUN <<HEREDOC
  # Install pipx, then empty the apt cache:
  apt update && apt install -y --no-install-recommends pipx
  rm -rf /var/lib/apt/lists/*

  # Updates the USER `.bashrc` and `.profile` to append `${HOME}/.local/bin` to $PATH
  pipx ensurepath

  # Install hatch, then empty the pip cache:
  pipx install hatch && rm -rf "${HOME}/.cache/pip"

  # Hatch bundles UV, symlink to it to avoid needing `pipx install uv`:
  ln -s "${HOME}/.local/share/pipx/venvs/hatch/bin/uv" /usr/local/bin/uv
HEREDOC
Old approach for `RUN` ```Dockerfile FROM ubuntu:24.04 RUN apt-get update \ && apt-get install -y --no-install-recommends pipx \ && rm -rf /var/lib/apt/lists/* \ && pipx ensurepath \ && pipx install hatch \ && rm -rf "${HOME}/.cache/pip" \ && ln -s "${HOME}/.local/share/pipx/venvs/hatch/bin/uv" /usr/local/bin/uv ```
Fedora equivalent (very little difference) ```Dockerfile # syntax=docker.io/docker/dockerfile:1 FROM fedora:40 RUN <
Reference: Alternative - Standalone via curl **NOTE:** Current caveats apply: - `hatch` command does not work in a venv due to modified `PATH`. - `uv` is not symlinked for that same modified `PATH` reason that makes it available. ```Dockerfile # syntax=docker.io/docker/dockerfile:1 FROM ubuntu:24.04 RUN < /usr/local/bin/hatch # Permit this file to run / execute: chmod +x /usr/local/bin/hatch # Installs standalone hatch, then does some cleanup (remove PyApp cache): hatch self restore && hatch self cache dist --remove EOF ``` Fedora equivalent (_without the commentary_): - Larger image size (base) than Ubuntu (over 100MB), but faster to build. If you build multiple images for projects that share the same base image layer it's less of an issue. - This image already has curl already, so no packages to install. Unlike `fedora-minimal`, it already has `tar` + `gzip` too. - **TIP:** Since Hatch v1.11.0, the `tar.gz` files have normalized the compressed filename to `hatch`. You could alternatively use `tar -xz && mv hatch /usr/local/bin/hatch` instead, no `chmod +x` needed, but the original UID and GID may not be compatible for non-root customizations (_the GID changed with v1.11.0, UID remains at `1001`_). ```Dockerfile # syntax=docker.io/docker/dockerfile:1 FROM fedora:40 RUN < /usr/local/bin/hatch chmod +x /usr/local/bin/hatch hatch self restore && hatch self cache dist --remove EOF ```

Context

As the type of user that'd be interested in such docs when I was looking into Hatch, but also as a user new to Python that wants to run some Github projects in Docker containers - I wanted to know what install process for hatch was going to work best to minimize disk space vs plain pip install.

  • We've pretty much established pipx install is still the best choice right now (standalone installer has some caveats remaining, while distro packages are behind in releases to enjoy uv support).
  • The availability of the standalone installer (and it's apparent small size on GH releases) did make me wonder if I could use that without pipx or Python, so I might have tried it anyway to compare (and then get confused once actually using hatch due to the present issues outlined above). The docs could try emphasize pipx has the least amount of friction / surprises? 🤷‍♂️
  • I'll be trying Hatch at a later date with UV to run some PyTorch based projects, if I learn anything else from that worth sharing I'll chime in here 👍

An unresolved concern I have is going to be how to handle PyTorch. Deps in hatch.toml / pyproject.toml don't have a clear command to install/sync but instead require hatch shell / hatch env run to trigger that implicitly?

  • If I want to "warm" up the cache for UV in advance by installing the 4-5GB torch uses, this should be done in a separate RUN layer (or image/stage) before other deps to prevent this data being discarded when something else in the project is updated (hatch.toml, project source files) which could invalidate the layer.
  • I'm not sure how hatch (and the virtual environments it manages through UV) are involved in that, it's not something you'd really worry about outside of a container.
  • While Docker does have cache mounts which could help with builds (and allow a hatch.toml to be present without layer invalidation concerns) - this would prevent using hard links, thus incurring a copy across the mount boundary introduced. Not really a problem when the image is only being built for a single virtual environment using PyTorch, but if I want to have several that may be a concern.
  • This topic is perhaps more niche / advanced, so it doesn't need to be tackled with the initial Docker guidance, but if someone knows how to approach it that'd be good! Without the cache mount usage, I suppose I could have a separate dummy hatch.toml environment to bring these in (or directly run uv venv + uv pip install, without hatch involved?). The hardlinking feature should take care of the rest I think (if I manage a hatch.toml for each project, I think they can inherit the same PyTorch environment?). I'll try it when I can :)
# Related UV issue as below will need to handle different "local identifiers":
# https://github.com/astral-sh/uv/issues/3437#issuecomment-2102125794
[envs.default]
type = "virtual"
path = "venv-pytorch"
dependencies = [
  "torch==2.3+cu121",
  "torchvision",
  "torchaudio"
]
installer = "uv"

[envs.default.env-vars]
UV_INDEX_URL = "https://download.pytorch.org/whl/cu121"
lwasser commented 2 months ago

I just wanted to check back in here, y'all. I've been swamped with other volunteer commitments, and I won't be able to follow through with the docker PR. I hope that someone else can hop in and work on this, as this issue contains a lot of great information. We are having good success with using and teaching Hatch over at pyOpenSci, so I hope to continue to see the use of and documentation for Hatch grow!

jesshart commented 2 weeks ago

Hi all! I wanted to introduce myself to this topic as Ofek kindly pointed me here when I mentioned I would be interested in helping out with some documentation.

I work as a data scientist at a small company in Austin, TX and we adopted hatch as our project manager earlier this year after some research. We had been using conda but I ran into some major headaches when trying to deploy using conda + docker + AWS services. Since these AWS services were going to be a big part of how we deployed our solutions, we decided to switch our project manager.

Since I don't want to write an essay here, I will try to keep it short 😁. We decided on hatch and I have been experimenting with it ever since and really enjoy the features though I think documentation could be improved and so I am here to help.

I have not read this entire thread yet but I look forward to catching up and helping as I can (I also volunteer for too many projects 😬).