indygreg / PyOxidizer

A modern Python application packaging and distribution tool
Mozilla Public License 2.0
5.48k stars 239 forks source link

Method for building Rust projects with a simple `cargo build` #498

Open mxk opened 2 years ago

mxk commented 2 years ago

My goal was to build a Rust project that embeds python and all dependencies via a simple cargo build. I finally figured out a method for doing this, but it's pretty much one big hack. Nevertheless, it may be helpful to others and could guide future development.

The problem with the project structure created by pyoxidizer init-rust-project is that the pyo3-build-config dependency needs PYO3_CONFIG_FILE to point to pyo3-build-config-file.txt, which is generated by pyoxidizer run-build-script. As a result, the only way to get a working build is to generate the artifacts first and then pass the relevant environment variables to cargo build, as described in #467. Running the generated build.rs in build-mode-pyoxidizer-exe mode does not work since all the pyo3 dependencies are already built by that point using the default configuration.

A way to get everything to work with a one cargo build command is to have pyo3-build-config run pyoxidizer run-build-script directly. I used a [patch] override to do that. Essentially, I created my own pyo3-build-config package that runs pyoxidizer. It depends on the real pyo3-build-config to avoid re-implementing most of the functionality.

Below are the relevant file contents. I've omitted ./py/pyoxidizer.bzl and other files generated by pyoxidizer init-rust-project because those are all standard. Use at your own risk.

Note that this still does not solve #466.

./Cargo.toml:

[workspace]
members = ["py"]
resolver = "2"

[patch.crates-io]
pyo3-build-config = { path = "py/pyo3-build-config" }
pyembed = { git = "https://github.com/indygreg/PyOxidizer.git", rev = "4cf22b8d76d6873471ba0fdf81e852fc8495955f" }

./py/Cargo.toml:

[package]
name = "py"
version = "0.1.0"
edition = "2021"
publish = false

[dependencies]
pyembed = { version = "0.20.0-pre", default-features = false }

# Required to pass DEP_PYEMBED_DEFAULT_CONFIG
pyo3-build-config = { path = "./pyo3-build-config", features = ["resolve-config"] }

[build-dependencies]
pyo3-build-config = { path = "./pyo3-build-config", features = ["resolve-config"] }

./py/build.rs:

fn main() {
    println!("cargo:rerun-if-changed=build.rs");
    println!(
        "cargo:rustc-env=DEFAULT_PYTHON_CONFIG_RS={}",
        std::env::var("DEP_PYEMBED_DEFAULT_CONFIG").expect(
            "DEP_PYEMBED_DEFAULT_CONFIG not set (misconfigured pyo3-build-config wrapper?)"
        )
    );
}

./py/pyo3-build-config/Cargo.toml:

# This package should be added to `[patch.crates-io]` section in the root
# `Crate.toml`. This will cause all `pyo3` packages to depend on it. The build
# script verifies the dependency tree to ensure that everything is configured
# correctly.
#
# The `version` below is set to ensure that this package is prefered over the
# published versions, but it needs to be updated when a new minor release is
# made.
#
# This package depends on the real `pyo3-build-config` to avoid duplicating most
# of the functionality. The dependency versions must be kept in sync.
[package]
name = "pyo3-build-config"
version = "0.15.999"
description = "PyOxidizer-based wrapper around the real pyo3-build-config"
edition = "2021"
links = "pyembed"
publish = false

[dependencies]
once_cell = "1"
pyo3-build-config = { version = "0.15.0, <0.15.999", default-features = false }

[build-dependencies]
embed-resource = "1.6"
pyo3-build-config = { version = "0.15.0, <0.15.999", default-features = false }

[features]
resolve-config = []

./py/pyo3-build-config/build.rs:

use std::env::var_os;
use std::io::{BufRead, Cursor};
use std::path::{Path, PathBuf};
use std::process::Command;

fn main() {
    if var_os("CARGO_FEATURE_RESOLVE_CONFIG").is_none() {
        println!("cargo:warning=resolve-config feature not enabled; build script in no-op mode");
        return;
    }

    ensure_dependency_tree();
    run_build_script(var_os("PYOXIDIZER_CONFIG").unwrap_or_else(|| "../pyoxidizer.bzl".into()));

    // TODO: Python source location should be configurable
    println!("cargo:rerun-if-changed=../pysrc");
}

// Runs `pyoxidizer run-build-script`.
fn run_build_script(cfg: impl AsRef<Path>) {
    let cfg = cfg.as_ref().canonicalize().unwrap();
    let root = cfg.parent().unwrap();
    let out_dir = PathBuf::from(var_os("OUT_DIR").unwrap());

    // PyOxidizer compares the mtime of the pyoxidizer binary and pyoxidizer.bzl
    // with that of default_python_config.rs to determine if the artifacts are
    // current. We want cargo to be in charge of cache invalidation, so remove
    // the config to rebuild the artifacts unconditionally.
    let out_cfg = out_dir.join("default_python_config.rs");
    let _ = std::fs::remove_file(&out_cfg);

    println!("cargo:rerun-if-env-changed=PYOXIDIZER_EXE");
    let status = Command::new(var_os("PYOXIDIZER_EXE").unwrap_or_else(|| "pyoxidizer".into()))
        .args(["run-build-script", "--system-rust", "build.rs"])
        .env("PYOXIDIZER_CONFIG", &cfg)
        .env("PYOXIDIZER_ARTIFACT_DIR", &out_dir)
        .current_dir(root)
        .status()
        .expect("failed to run pyoxidizer");
    if !status.success() {
        panic!("`pyoxidizer run-build-script` error: {}", status);
    }

    // Embed Windows resource file
    if var_os("CARGO_CFG_TARGET_FAMILY").unwrap() == "windows" {
        for f in std::fs::read_dir(root).unwrap() {
            let f = f.unwrap();
            if f.file_name().to_string_lossy().ends_with("-manifest.rc") {
                embed_resource::compile(f.path());
            }
        }
    }

    // Parse the config file to guarantee that this will not fail at run time
    pyo3_build_config::InterpreterConfig::from_path(out_dir.join("pyo3-build-config-file.txt"))
        .expect("failed to parse contents of PYO3_CONFIG_FILE");
    println!("cargo:default_config={}", out_cfg.to_str().unwrap());
}

/// Ensures that pyo3 is using the patched version of pyo3-build-config. This is
/// necessary because the `[patch]` mechanism does not provide a way of ignoring
/// package versions from crates.io, so the `version` in our `Cargo.toml` needs
/// to be set correctly in order for this package to shadow the official one.
fn ensure_dependency_tree() {
    let out = Command::new(var_os("CARGO").unwrap())
        .args([
            "tree",
            "--quiet",
            "--workspace",
            "--edges=normal",
            "--invert",
            concat!(env!("CARGO_PKG_NAME"), ":", env!("CARGO_PKG_VERSION")),
            "--depth=3",
            "--prefix=none",
            "--format={lib}",
            "--frozen",
        ])
        .output()
        .expect("failed to execute `cargo tree`")
        .stdout;
    if !Cursor::new(out).lines().any(|ln| ln.unwrap() == "pyo3") {
        panic!(
            "pyo3 is using the wrong {} package (check 'version' in {:?})",
            env!("CARGO_PKG_NAME"),
            PathBuf::from(var_os("CARGO_MANIFEST_DIR").unwrap()).join("Cargo.toml")
        );
    }
}

./py/pyo3-build-config/src/lib.rs:

//! PyOxidizer-based wrapper around the real pyo3-build-config.
//!
//! This crate forwards most of the functionality to the real pyo3-build-config,
//! but uses PyOxidizer-generated `pyo3-build-config-file.txt`.

pub use pyo3_build_config::*;

#[cfg(feature = "resolve-config")]
pub fn use_pyo3_cfgs() {
    get().emit_pyo3_cfgs();
}

#[doc(hidden)]
#[cfg(feature = "resolve-config")]
pub fn get() -> &'static InterpreterConfig {
    use once_cell::sync::OnceCell;
    static CFG: OnceCell<InterpreterConfig> = OnceCell::new();
    CFG.get_or_init(|| pyo3_build_script_impl::resolve_interpreter_config().unwrap())
}

#[doc(hidden)]
pub fn env_var(var: &str) -> Option<std::ffi::OsString> {
    if cfg!(feature = "resolve-config") {
        println!("cargo:rerun-if-env-changed={}", var);
    }
    std::env::var_os(var)
}

#[doc(hidden)]
pub mod pyo3_build_script_impl {
    pub use super::env_var;
    pub use pyo3_build_config::pyo3_build_script_impl::*;

    #[cfg(feature = "resolve-config")]
    pub fn resolve_interpreter_config() -> errors::Result<InterpreterConfig> {
        const CFG: &str = include_str!(concat!(env!("OUT_DIR"), "/pyo3-build-config-file.txt"));
        InterpreterConfig::from_reader(std::io::Cursor::new(CFG))
    }
}
indygreg commented 2 years ago

Oooh - this is really clever and the first potential solution for making cargo build just work that seems viable! Thank you for figuring this out and taking the time to write it up!

Correct me if I'm wrong, but isn't a drawback of using [patch.crates-io] that you can't upload the crate to crates.io? So if we went with this approach, projects using this approach wouldn't be publishable? (If so, this isn't a deal breaker to implementing this. But we'd probably want to make the functionality optional so people can have a publishable project.)

Also, the PyO3 maintainers have been very generous in implementing features upstream to accommodate PyOxidizer's use cases. But if I were a PyO3 maintainer, I'd be highly skeptical of teaching the canonical pyo3-build-config to invoke pyoxidizer and look for environment variables like PYOXIDIZER_CONFIG and PYOXIDIZER_EXE because it is too tightly coupled to a narrow use case. Can you think of a more generic way to achieve similar results in upstream pyo3-build-config so we don't have to patch that crate indefinitely?

Thank you again for contributing this. This looks like a great idea!

mxk commented 2 years ago

According to this, crates.io ignores [patch] sections, so this solution definitely won't work for anything that needs to be published.

I think a better solution will require pyo3-build-config to have an optional feature/dependency on something that can generate the config as part of the build process. I'm imagining a very minimal crate that either has a single public CONFIG_FILE const, or uses links = "pyo3" to pass DEP_PYO3_CONFIG_FILE to pyo3-build-config.

This dependency could be PyOxidizer-specific, which would be very similar to the build script above, and require the user to either run PYOXIDIZER_CONFIG=<path> cargo build, or add that variable to the [env] section. This solution completely eliminates the need for any patching and can be maintained as part of this project. The only real complication I see is how to trigger rebuilds when the Python source that's being packaged changes. That's why I currently have the cargo:rerun-if-changed=../pysrc line above.

Alternatively, the dependency can be very generic, with the published version just being a no-op. The user could then patch it to supply custom config file generation logic, which may or may not depend on PyOxidizer.

I'm thinking something like this being added to pyo3-build-config's manifest and some small modifications being made to its build script to use the output of these dependencies when PYO3_CONFIG_FILE is unspecified:

[build-dependencies]

# Config generator that depends on PYOXIDIZER_CONFIG and PYOXIDIZER_EXE
pyo3-pyoxidizer-gen = { version = "0.20.0", optional = true }

# Custom generator that needs to be patched to do anything useful
pyo3-config-gen = { version = "1.0.0", optional = true }
DCNick3 commented 2 years ago

I think it can be worth linking to this issue from documentation (from here and/or here), along with the gotchas that one might face when trying to make cargo drive the build.

I unfortunately spent some time debugging, not knowing that pyo3 config that was being used in my build is incorrect =(

DCNick3 commented 2 years ago

Also, what are the advantages of using pyoxidizer exe (it being a dependency that should be manually installed and updated) instead of calling pyoxidizerlib::project_building::run_from_build directly from build.rs?