Open ptr-br opened 4 months ago
Yeah, could you please put together a minimal example (in its own repository) that reproduces the problem? I'm struggling to understand whether you mean that you are trying to "embed" Python or that the shebang lines are wrong for some reason or that something else is happening. Thanks!
@junyer, I created a toy example here. My main problem is creating/selecting an interpreter for cc_binary
like py_binary
does...
This sounds like a problem that @rickeylev would know how to solve... What happens if you do this to pin the Python version for the np_wrapper_lib
target?
I'm actually already pinning it here. When I change the version from my system one (3.10) to some other version (e.g. 3.11) I get the following error running bazel run //cc:my_cc_binary
:
INFO: Analyzed target //cc:my_cc_binary (75 packages loaded, 1482 targets configured).
INFO: Found 1 target...
Target //cc:my_cc_binary up-to-date:
bazel-bin/cc/my_cc_binary
INFO: Elapsed time: 0.450s, Critical Path: 0.01s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Running command line: bazel-bin/cc/my_cc_binary
Could not find platform independent libraries <prefix>
Python path configuration:
PYTHONHOME = (not set)
PYTHONPATH = (not set)
program name = 'python3'
isolated = 0
environment = 1
user site = 1
safe_path = 0
import site = 1
is in build tree = 0
stdlib dir = '/install/lib/python3.11'
sys._base_executable = '/usr/bin/python3'
sys.base_prefix = '/install'
sys.base_exec_prefix = '/usr'
sys.platlibdir = 'lib'
sys.executable = '/usr/bin/python3'
sys.prefix = '/install'
sys.exec_prefix = '/usr'
sys.path = [
'/install/lib/python311.zip',
'/install/lib/python3.11',
'/usr/lib/python3.11/lib-dynload',
]
terminate called after throwing an instance of 'std::runtime_error'
what(): failed to get the Python codec of the filesystem encoding
Aborted (core dumped)
You aren't doing anything with @python_versions
? As per the documentation, pinning the Python version would mean doing something like load("@python_versions//3.11:defs.bzl", "py_binary")
in the BUILD
file.
I played around with the version and actually switching works. E.g. python version that is used in my_cc_binary
is python3.9 instead of my default 3.10 system python when specifing:
# MODULE.bazel
...
python.toolchain(
python_version = "3.9",
is_default = True,
)
use_repo(python, "python_versions")
pip = use_extension("@rules_python//python/extensions:pip.bzl", "pip")
pip.parse(
hub_name = "my_pip",
python_version = "3.9",
requirements_lock = "//cc:requirements.txt",
)
...
# BUILD.bazel
load("@python_versions//3.9:defs.bzl","py_binary")
...
I get the PYTHONPATH
values:
/usr/lib/python3.9
/usr/lib/python3.9/lib-dynload
/usr/local/lib/python3.9/dist-packages
/usr/lib/python3/dist-packages
Python version: 3.9.18 (main, Oct 3 2023, 01:30:02)
Could the problem be that py_binary
does add all the paths from the required directories to the PYTHONPATH
and those are not added when simply specifying the data property in cc_binary
?
What would be the best way to solve this then?
Given that your example didn't work with load("@rules_python//python:defs.bzl", "py_binary")
and now does work with load("@python_versions//3.9:defs.bzl", "py_binary")
, the problem is likely to be on the rules_python side. I'm reminded of https://github.com/bazelbuild/rules_python/issues/1069#issuecomment-1942053014 in particular: this may be a subtle bug in the Starlark implementation. Could you please file a bug against rules_python?
Sorry for the confusion, but the problem was not resolved by switching to load("@python_versions//3.9:defs.bzl", "py_binary")
. I only verified that pybind
is using another interpreter. So switching the interpreter works, but not installing and using dependencies as it is resolved when using py_binary
...
Oh. :(
Also, I see now that I misspoke earlier:
What happens if you do this to pin the Python version for the
np_wrapper_lib
target?
I should have said "the np_wrapper
target", not "the np_wrapper_lib
target", because the py_binary()
rule can be pinned whereas the py_library()
rule can't. Would data = ["//python:np_wrapper"]
work with the my_cc_binary
target?
I think what's happening is the embedded interpreter is trying to take settings from the local environment. I saw this when I was trying to construct a runnable test linking with the hermetic python libraries: it kept trying to "escape" and use things from the local system. Eventually I traced it back to the Py_Initialize() call trying to automatically fill in various details based on the environment settings.
The docs for how to initialize an embedded interpreter are here: https://docs.python.org/3/extending/embedding.html
I think the two key things that have to be setup are:
For (1), I think this can be derived based on the location of e.g. the header files. You basically need the runfiles-relative path to where the stdlib etc are in the runfiles.
For (2), this information comes from PyInfo.imports.
What we probably want to do is generate a cc file with those values in them. Maybe something like this:
def _py_cc_init_info_impl(ctx)
toolchain = ctx.toolchains["@rules_python//python/cc:toolchain_type"]
runtime_dir = <get File.short_path from toolchain.headers or .libs>
path_entries = []
for info in [t[PyInfo] for t in ctx.attr.deps]:
path_entries.extend(info.imports)
sys_path = ":".join(path_entries)
header = ctx.actions.declare_file("info.h")
ctx.actions.write(header, "string runtime_dir = {}; string sys_path= {}".format(
runtime_dir, sys_path))
return [DefaultInfo(files=[header])]
py_cc_init_info = rule(
implementation = _py_cc_init_info,
attrs = {"deps": attr.label_list()},
toolchains = ["@rules_python//python/cc:toolchain_type"],
)
At the lesat, it probably makes sense to add (1) to the py_cc_toolchain info as e.g. a runtime_install_location (equiv of PYTHONHOME?) attribute or something, to avoid having to try and unpack so much.
@rickeylev I faced this issue myself and I hacked a solution almost as you describe. However I discovered a conceptual ambiguity. In your rule you use the py_cc_info. As implemented, py_cc_info is about compile time dependencies since it exposes only that information related to compiling and linking a cc_library that depends on libpython.
A cc_binary which embeds Python, however, needs to also express a runtime/data dependency on a certain collection files (Lib/, DLLS/, ...) currently listed under the "files" filegroup of the instantiated python toolchain repository (and also a a data dependency on any third party .py files). The "files" filegroup is exposed through the py_runtime rule of the @bazel_tools//tools/python:toolchain_type toolchain.
In my hacks, I made use of the @bazel_tools//tools/python:toolchain_type toolchain (py_runtime instead of py_cc_info) to prepare such metadata for embedding runtime python dependency files. I successfully built a binary with an embedded python interpreter.
Here is my implementation
# .bzl
def _py_embedded_libs_impl(ctx):
deps = ctx.attr.deps
toolchain = ctx.toolchains["@bazel_tools//tools/python:toolchain_type"]
py3_runtime = toolchain.py3_runtime
# addresses that need to be added to python sys.path
all_imports = []
for lib in deps:
all_imports.append(lib[PyInfo].imports)
imports_txt = "\n".join(depset(transitive = all_imports).to_list())
imports_file = ctx.actions.declare_file(ctx.attr.name + ".imports")
ctx.actions.write(imports_file, imports_txt)
python_home_txt = str(py3_runtime.interpreter.dirname)
python_home_file = ctx.actions.declare_file(ctx.attr.name + ".python_home")
ctx.actions.write(python_home_file, python_home_txt)
py3_runfiles = ctx.runfiles(files = py3_runtime.files.to_list())
dep_runfiles = [py3_runfiles]
for lib in deps:
lib_runfiles = ctx.runfiles(files = lib[PyInfo].transitive_sources.to_list())
dep_runfiles.append(lib_runfiles)
dep_runfiles.append(lib[DefaultInfo].default_runfiles)
runfiles = ctx.runfiles().merge_all(dep_runfiles)
return [DefaultInfo(files=depset([imports_file, python_home_file]),
runfiles=runfiles
)]
# collect paths to all files of a python library and generate .imports file and .python_home file
py_embedded_libs = rule(
implementation = _py_embedded_libs_impl,
attrs = {
"deps": attr.label_list(
providers = [PyInfo],
),
},
toolchains = [
str(Label("@bazel_tools//tools/python:toolchain_type")),
],
)
# BUILD
py_embedded_libs(
name = "embed_paths",
deps = [
"@pip//scipy:pkg"
])
cc_binary(
name = "embed",
srcs = ["embed.cpp"],
deps = [
"//:current_libpython_unstable", # hacks around issue #1823 , cannot use current_py_cc_libs yet
"@bazel_tools//tools/cpp/runfiles", # needed to resolve python sys.path additions, and python home location
],
data = [":embed_paths"]
)
Then again I'm pretty unskilled with Bazel and you can probably figure out a better way to do this. I just thought it might help to post my learnings here.
a cc binary also needs py_runtime.files
Ahhh yes, excellent point. This seems obvious once you said it. So really, we don't need a runtime_install_dir value, but a depset[File]
(or, actually, maybe a runfiles, since they are runtime files) of what the runtime needs. Or, actually, maybe both (a locally installed runtime can just point to that directory instead). Good food for thought, thanks.
The bzl code you posted looks pretty correct. You probably want .short_name
instead of .dirname
(the latter isn't a runfiles path, iirc). There are some minor optimizations you could make (e.g. avoiding to_list() calls; write()
can be passed an args object sing Args.add_all(map_each=...)
, which can be used to defer depset flattening to execution phase and still allow writing mostly-arbitrary lines to a file).
@axbycc-mark Could you share an example for how you use the imports_file and the python_home_file in a cc_binary/cc_test target to appropriately set the PYTHONPATH and PYTHONHOME? When depending on numpy using your suggest approach above, I am consistently getting: ModuleNotFoundError: No module named 'numpy'
. Thank you!
@ahojnnes Continuing my example from above, here is the code I had in my actual .cc file.
#include <iostream>
#include <fstream>
#include <Python.h>
#include <filesystem>
#include "tools/cpp/runfiles/runfiles.h"
#include <print>
using bazel::tools::cpp::runfiles::Runfiles;
void InitializePythonEnvironment(const std::string& pythonHome, const std::vector<std::string>& additionalPaths) {
PyStatus status;
PyConfig config;
PyConfig_InitPythonConfig(&config);
// Set PYTHONHOME
wchar_t* pythonHomeW = Py_DecodeLocale(pythonHome.c_str(), nullptr);
status = PyConfig_SetString(&config, &config.home, pythonHomeW);
if (PyStatus_Exception(status)) {
PyConfig_Clear(&config);
Py_ExitStatusException(status);
}
config.isolated = 1;
// Initialize the interpreter with the given configuration
status = Py_InitializeFromConfig(&config);
if (PyStatus_Exception(status)) {
Py_ExitStatusException(status);
}
// The PyConfig structure should be released after initialization
PyConfig_Clear(&config);
// Check if initialization was successful
if (!Py_IsInitialized()) {
LOG(FATAL) << "Failed to initialize Python interpreter.";
}
// Import the sys module.
PyObject* sysModule = PyImport_ImportModule("sys");
if (!sysModule) {
LOG(FATAL) << "Failed to import 'sys' module." << std::endl;
}
// Get the sys.path list.
PyObject* sysPath = PyObject_GetAttrString(sysModule, "path");
if (!sysPath) {
LOG(FATAL) << "Failed to get 'sys.path'." << std::endl;
}
// Add each path in additionalPaths to sys.path.
for (const auto& path : additionalPaths) {
PyObject* pyPath = PyUnicode_FromString(path.c_str());
if (!pyPath) {
std::cerr << "Failed to create Python string from path." << std::endl;
continue;
}
if (PyList_Append(sysPath, pyPath) != 0) {
std::cerr << "Failed to append path to 'sys.path'." << std::endl;
}
Py_DECREF(pyPath);
}
// Clean up references.
Py_DECREF(sysPath);
Py_DECREF(sysModule);
Py_DECREF(pythonHomeW); // Clean up the allocated string.
// At this point, the Python interpreter is initialized, PYTHONHOME is set,
// and additional paths have been added to sys.path. You can now proceed to
// execute Python scripts or finalize the interpreter as needed.
}
void PrintSysPath() {
// Import the sys module.
PyObject* sysModule = PyImport_ImportModule("sys");
if (!sysModule) {
PyErr_Print(); // Print any error if occurred
std::cerr << "Failed to import 'sys' module." << std::endl;
return;
}
// Get the sys.path list.
PyObject* sysPath = PyObject_GetAttrString(sysModule, "path");
if (!sysPath || !PyList_Check(sysPath)) {
PyErr_Print(); // Print any error if occurred
std::cerr << "Failed to access 'sys.path'." << std::endl;
Py_XDECREF(sysModule); // Py_XDECREF safely decrements the ref count if the object is not NULL
return;
}
// Get the size of sys.path list to iterate over it
Py_ssize_t size = PyList_Size(sysPath);
for (Py_ssize_t i = 0; i < size; i++) {
PyObject* path = PyList_GetItem(sysPath, i); // Borrowed reference, no need to DECREF
if (path) {
const char* pathStr = PyUnicode_AsUTF8(path);
if (pathStr) {
std::println("\t{}", pathStr);
} else {
PyErr_Print(); // Print any error if occurred
}
}
}
// Clean up: DECREF objects created via PyImport_ImportModule and PyObject_GetAttrString
Py_DECREF(sysPath);
Py_DECREF(sysModule);
}
void ImportAndPrintVersion(const std::string& python_module_name) {
// Import the scipy module.
PyObject* pyModule = PyImport_ImportModule(python_module_name.c_str());
if (!pyModule) {
PyErr_Print(); // Print the error to stderr.
std::cerr << "Failed to import module" << python_module_name << std::endl;
return;
}
// Access the __version__ attribute of the module.
PyObject* version = PyObject_GetAttrString(pyModule, "__version__");
if (!version) {
PyErr_Print(); // Print the error to stderr.
std::cerr << "Failed to get '__version__'." << std::endl;
Py_DECREF(pyModule);
return;
}
// Convert the version PyObject to a C string.
const char* versionStr = PyUnicode_AsUTF8(version);
if (!versionStr) {
PyErr_Print(); // Print the error to stderr.
std::cerr << "Failed to convert '__version__' to C string." << std::endl;
} else {
// Print the version string.
std::println("\t{} version: {}", python_module_name, versionStr);
}
// Clean up references.
Py_DECREF(version);
Py_DECREF(pyModule);
}
std::vector<std::string> read_lines(const std::string& path) {
std::ifstream file(path);
CHECK(file.is_open()) << "Could not open file " << path;
std::string line;
std::vector<std::string> lines;
while (std::getline(file, line)) {
lines.push_back(line);
}
return lines;
}
int main(int argc, char* argv[]) {
std::string error;
std::unique_ptr<Runfiles> runfiles(
Runfiles::Create(argv[0], BAZEL_CURRENT_REPOSITORY, &error));
CHECK(runfiles) << "Could not create runfiles";
std::string dot_python_home_path = runfiles->Rlocation("_main/python/experimental/embed_paths.python_home");
std::string python_home_path = read_lines(dot_python_home_path).front();
std::string python_home_path_absolute = runfiles->Rlocation("_main/" + python_home_path);
auto external_dir = std::filesystem::path(python_home_path_absolute).parent_path();
std::string dot_imports_path = runfiles->Rlocation("_main/python/experimental/embed_paths.imports");
std::vector<std::string> imports = read_lines(dot_imports_path);
std::vector<std::string> absolute_imports;
for (const std::string& relative_import : imports) {
const auto absolute_import = runfiles->Rlocation(relative_import);
absolute_imports.push_back(absolute_import);
}
InitializePythonEnvironment(python_home_path_absolute, absolute_imports);
std::cout << "Initialized. Dumping sys path." << "\n";
PrintSysPath();
std::cout << "Testing module import" << "\n";
ImportAndPrintVersion("numpy");
ImportAndPrintVersion("scipy");
Py_Finalize();
return 0;
}
Can you see if this little program works for you?
@axbycc-mark Thank you very much. This is very helpful.
After hacking at this for a bit, I came up with the following rule/macro combination that doesn't require any custom C++ code:
def _cc_py_runtime_impl(ctx):
toolchain = ctx.toolchains["@bazel_tools//tools/python:toolchain_type"]
py3_runtime = toolchain.py3_runtime
imports = []
for dep in ctx.attr.deps:
imports.append(dep[PyInfo].imports)
python_path = ""
for path in depset(transitive = imports).to_list():
python_path += "external/" + path + ":"
py3_runfiles = ctx.runfiles(files = py3_runtime.files.to_list())
runfiles = [py3_runfiles]
for dep in ctx.attr.deps:
dep_runfiles = ctx.runfiles(files = dep[PyInfo].transitive_sources.to_list())
runfiles.append(dep_runfiles)
runfiles.append(dep[DefaultInfo].default_runfiles)
runfiles = ctx.runfiles().merge_all(runfiles)
return [
DefaultInfo(runfiles = runfiles),
platform_common.TemplateVariableInfo({
"PYTHON3": str(py3_runtime.interpreter.path),
"PYTHONPATH": python_path,
}),
]
_cc_py_runtime = rule(
implementation = _cc_py_runtime_impl,
attrs = {
"deps": attr.label_list(providers = [PyInfo]),
},
toolchains = [
str(Label("@bazel_tools//tools/python:toolchain_type")),
],
)
def cc_py_test(name, py_deps = [], **kwargs):
py_runtime_target = name + "_py_runtime"
_cc_py_runtime(
name = py_runtime_target,
deps = py_deps,
)
kwargs.update({
"data": kwargs.get("data", []) + [":" + py_runtime_target],
"env": {"__PYVENV_LAUNCHER__": "$(PYTHON3)", "PYTHONPATH": "$(PYTHONPATH)"},
"toolchains": kwargs.get("toolchains", []) + [":" + py_runtime_target],
})
native.cc_test(
name = name,
**kwargs
)
def cc_py_binary(name, py_deps = [], **kwargs):
py_runtime_target = name + "_py_runtime"
_cc_py_runtime(
name = py_runtime_target,
deps = py_deps,
)
kwargs.update({
"data": kwargs.get("data", []) + [":" + py_runtime_target],
"env": {"__PYVENV_LAUNCHER__": "$(PYTHON3)", "PYTHONPATH": "$(PYTHONPATH)"},
"toolchains": kwargs.get("toolchains", []) + [":" + py_runtime_target],
})
native.cc_binary(
name = name,
**kwargs
)
which can be used as follows:
cc_py_test(
name = "pybind_embed_test",
srcs = ["pybind_embed_test.cc"],
py_deps = ["//some/py:target", "@pypi//numpy:pkg"],
deps = ["//some/cc:target"],
)
@ahojnnes - I literally was looking at this issue earlier in the week, and checked back and you had written exactly what I was trying to write! Thank you so much for posting it.
The only downside to the approach you describe is that if you cc_py_test
brings in a transitive dependency you have to make sure to include all the required python packages in the py_deps
field.. What I kinda want is being able to construct a cc_py_library
that has cc code & declares the required py deps, and then be able to include that cc_py_library
in the deps
field of a cc_py_test
or cc_py_binrary
. Thoughts?
@jared2501 I don't think that my rules above have this limitation. You only need to list the imports made in the ccpy{test,binary} sources. Any transitive dependencies should be automatically added to the runfiles and imports. At least, it worked in some minimal tests for me.
@ahojnnes does your solution avoid the:
terminate called after throwing an instance of 'std::runtime_error'
what(): failed to get the Python codec of the filesystem encoding
error (with python 3.10) as described above? I think this comes from PYTHONPATH not including the default lib/python3.10
dir? Even for a single dependency, the PYTHONPATH comes through as external/rules_python~~pip~pip_deps_310_numpy/site-packages
, which potentially doesn't exist relative to the cwd of the binary?
ok actually the issue for me I think is bzlmod related, instead of prefixing external/
I needed to prefix ../
Could someone in this thread cobble together a minimal example of using pybind11's embedded functionality to plot from matplotlib. Similar to this but using pybind11_bazel with hermetic python? This has been a key blocker for me switching to bazel and I can't figure it out.
I've actually started an example repo here to get two embedded examples to run but I'm still having significant issues.
I'm trying to use pybind11_bazel to execute some python code within C++. When I build my python targets with
rules_python
I'm able to install packages that can be used by the interpreter. However, when I link these to thedata=
attribute ofcc_binary
and usepybind11
as a dependency, the default interpreter at/usr/bin/python3
is used instead of the one from the python targets.Is there any elegant way of telling bazel to use the same interpreter as for the python part?
If further explanation is needed I could come up with a minimal example, please let me know. I'm rather new to bazel so thanks for your help!
Thanks.