ofek / pyapp

Runtime installer for Python applications
https://ofek.dev/pyapp/
1.18k stars 27 forks source link

A potential manual solution to Monorepos with Path Dependencies #76

Closed Tremeschin closed 4 months ago

Tremeschin commented 8 months ago

Hi @ofek, thanks for the great tool!

I've come up with a solution proposal to a known problem I'm facing, and I would like to know your thoughts on it and if it's possible to implement it on Pyapp

I've tried to be as detailed as possible, and I'm open to any questions or suggestions, thanks!

Update: It worked!

I've hidden the R&D Process on this comment to only what matters below:



Context, Research and Development old Comment (Expand)
## Context I have a pretty standard [**Monorepo**](https://github.com/BrokenSource/BrokenSource/blob/Master/pyproject.toml) structure that provides a main package for all the other [**Projects**](https://github.com/BrokenSource/BrokenSource/tree/Master/Projects) but don't know them A project [**refers to the monorepo**](https://github.com/BrokenSource/ShaderFlow/blob/Master/pyproject.toml#L20) using path dependencies, and they even might [**refer to other projects**](https://github.com/BrokenSource/DepthFlow/blob/Master/pyproject.toml#L29) as well Note: I'm doing this to separate dependencies e.g. not all projects need `pytorch` It all works nicely under development mode.. until I want to build a release of any project In the past, I have implemented convoluted solutions using **Pyinstaller** or **Nuitka** which ended up _working to a certain extent_ but wasn't ideal (long story), so I decided to give **Pyapp** a try
## Problem As I saw somewhere, **Python** wheels aren't standardized for path dependenciesyet?, so whenever building a `pyproject.toml`, the wheel won't be installable on other machines as the builder's local path is hardcoded I don't really want to upload the code to **PyPI** as it is very specific to my use case, much like other monorepo opinions; even if that was the use case, **Poetry** can't have a versioned dependency on the main section and a path dependency on the dev section simultaneously of the same package Ultimately, this yields either spaghetti solutions or the lack of it
## What I have tried I've spent two days of intensive digging through the documentation and issues everywhere, trying many build backends such as **Poetry**, **Hatch**, **PDM** and proposed **Poetry plugins** or solutions, but ultimately I couldn't get it to work. **Raw Pyapp** was the closest I gotI know it's used in Hatch! ### Attempt 1: Source distribution I honestly don't remember much of what I tried yesterday, but I can say this wasn't ideal as including the packages as sdist isn't _"safe"_ and annoying to define the glob imports, also the monorepo package isn't on the subpath of the projects, and **Poetry** fails ### Attempt 2: Custom distribution Long story short, I zipped the **Poetry**'s Virtual Environment and set the proper relative paths on Pyapp variables for the executables, and used the full isolated mode, skip install. It fails as the **Python** included there is a symlink to system **Python**. Setting `poetry.virtualenvs.options.always-copy` to `true` didn't do it as well(Consider this as a bug report? To embed some proper **Python** distribution on top of a local one?) I'm not a fan of this solution as yours fetching and installation of the **Python** distribution feels more reliable and arguably universal ### Attempt 3: Hatch I ported the `pyprojects.toml` to **Hatch** syntax and force-included the main package `Broken` under `../../Broken` to the wheel. The embedding I'm using is `PYAPP_PROJECT_PATH` as the built wheel This failed as I didn't _"inherit"_ the dependencies of the main package, the Virtual Environment contained properly `ShaderFlow` and `Broken` package, but not the dependencies of `Broken` (the monorepo root's package) This solution feels non ideal as I had to unset safety flags on Hatch, like the allow direct references and, well, including _some_ other package on the wheel
## Proposed solution After all the digging, I think this could be solved by the following: 1. **Have the path dependencies** as `dev-dependencies` on the `pyproject.toml` of the project: ```toml [tool.poetry.dependencies] python = ">=3.10,<3.13" moderngl = "^5.8.2" # ... [tool.poetry.dev-dependencies] broken = {path="../../", develop=true} ``` Building a wheel for this project won't include the `broken` package, but it's ok
2. **Find all path dependencies** and build their wheel, recursively This isn't something you can implement on Pyapp, but a process users would need to define on their own A pseudo code / implementation would be something like this (I didn't run nor test the logic): ```python from pathlib import Path from dotmap import DotMap import toml def build_projects(path: Path, found: Set[Path]=None): path = Path(path).resolve() # Initialize empty set found = found or set() # Skip if already found if path in found: return # Skip if no pyproject.toml exists if not (path/"pyproject.toml").exists(): return # Load pyproject.toml dictionary pyproject = DotMap(toml.reads((path/"pyproject.toml").read_text())) # Iterate and find all path dependencies for name, dependency in pyproject.tool.poetry["dev-dependencies"].items(): # Find only path= dictionaries if isinstance(data, str): continue if not dependency.path: continue # Dependency is a path dependency = Path(data.path).resolve() found.add(dependency) # Build the wheel with pushd(dependency): subprocess.run(["poetry", "build", "--format", "wheel"]) # Recursively find wheels wheels(dependency, found) return found # Build all wheels projects = build_projects(Path.cwd()) # We can now get the wheels from the projects wheels = [next(project.glob("dist/*.whl")) for project in projects] ``` Why we need all of this? For the next step and the proposed solution


  1. Include all the built wheels as a installation dependency on Pyapp

We still use the main project's wheel on PYAPP_PROJECT_PATH settings when building, and we would include the other wheels on a new PYAPP_LOCAL_DEPENDENCIES setting or other name in your preference

# Include other local dependencies on the building step
os.environ["PYAPP_LOCAL_DEPENDENCIES"] = ":".join(wheels)

# Compile the project
subprocess.run(["pyapp", ...])

When Pyapp is installing the Virtual Environment, it would install the main project's wheel and the other local wheels as well


Why would it work?

By using the Path Dependencies projects as development dependencies, we:

  1. Have they in editable mode when developing
  2. Don't include the hard-coded path on the wheel;
  3. Include all the standard versioned dependencies it uses

By installing the wheel of the main project and all other Path Dependencies wheels, we:

  1. Would have all the standard dependencies installed on the Virtual Environment;
  2. Plus the other local packages's code;
  3. And their metadata for importlib

My intuition says that this would work very great !

Tremeschin commented 8 months ago

• So, (...)

I had the smart-dumb-est idea to build all the path dependencies to .whl, then move these .whls inside a Resources folder on the target project to compile, that gets imported and used by importlib.resources.files, so that when building the target project to .whl, all the path dependencies .whls are inside the built .whl

At runtime, if on a pyapp release, we iterate on all .whls on the resources and install them with pip

that's too many wheels 🧀

• Proof of concept code

You see the build and wheel-embedding function in this file, and the hacky code to install all the wheels at runtime can be seen on the project's __init__.py file Update: I permalinked old code that wasn't elegant, I've removed on recent commits

• It worked !

I've run myself on Linux and Windows, and asked friends on both systems to test the built pyapp binaries

I can confirm everything worked very well !

We all could run, render videos, load pytorch (even with cuda!), directories and package metadata are nominal 🎉

Tremeschin commented 6 months ago

I've found some time to attempt a proof of concept implementation in PyApp itself,

I've forked the repo and fixed my releases functions on the Monorepo, might change or tweak stuff still

I'll probably not PR this on my own, as there's some safety and panic concerns in Rust on my implementation. I only coded a couple months into the language before and there's also nuances of your code you can do a much better job :) !

I mostly eyeballed what PYAPP_PROJECT_PATH was doing and tried the same embedding and bootstrap

Tremeschin commented 4 months ago

I'm closing this as a (very decent) workaround with hatch to build a single wheel that bundles everything is to do:

# Use a single venv for all projects, and "import alpha" works
# Note that each of those directories contains an mostly empty pyproject.toml
# just make sure they also use hatchling, are managed, and includes your package
[tool.rye.workspace]
members = [
    "projects/alpha",
    "projects/beta",
    # ...
]

# Include all project packages with the __init__.py on the wheel as is
# Note: We aren't "building" them, just bundling the source code
[tool.hatch.build.targets.wheel]
only-include = [
    "shared_library",
    "projects/alpha/alpha",
    "projects/beta/beta",
    # ...
]

# Rewrite paths on the wheel so they are plain - "import alpha" works,
# instead of "import projects.alpha.alpha"
[tool.hatch.build.targets.wheel.sources]
"projects/alpha/alpha" = "alpha"
"projects/alpha/beta"  = "beta"
# ...

Then, build a local wheel or upload to PyPI and compile normally with PyApp. I find that using PYAPP_PROJECT_PATH=str(wheel) built by rye build and setting PYAPP_EXEC_SPEC=f"alpha.__main__:main" works the best for me. Also, PYAPP_UV_ENABLED=1 is so fast 😉

I have a pypi wheel of my projects done this way as a proof of concept

There's a side issue with this, no project library will contain its spec file, so importlib.metadata on any of alpha, beta, ... will fail (can get metadata from the shared lib alone, which I am doing). Plus, the venv will be the same for all projects, which isn't technically an issue, unless you depend on some custom resource file per-binary (probably solvable by hard-coding envs), or some code updates with a same-version wheel

I hope this helps anyone for future reference