risingwavelabs / arrow-udf

An User-Defined Function Framework for Apache Arrow.
Apache License 2.0
32 stars 5 forks source link

arrow-udf-python PYO3_PYTHON #15

Open hanxuanliang opened 3 weeks ago

hanxuanliang commented 3 weeks ago

Currently, I have introduced arrow-udf-python into databend and encountered this error.

run-command: PYO3_PYTHON=python3.12 cargo build


Env

arrow-udf-python = { package = "arrow-udf-python", git = "https://github.com/risingwavelabs/arrow-udf", rev = "23fe0dd" }
python version: python 3.12.2

BACKTRACE

error: failed to run custom build command for `arrow-udf-python v0.1.0 (https://github.com/risingwavelabs/arrow-udf?rev=23fe0dd#23fe0dd4)`

Caused by:
  process didn't exit successfully: `/Users/xxx/Desktop/code/contribute/databend/target/debug/build/arrow-udf-python-fa3a3e2a7b69d90f/build-script-build` (exit status: 101)
  --- stderr
  thread 'main' panicked at /Users/xxx/.cargo/git/checkouts/arrow-udf-7ba7f51509153980/23fe0dd/arrow-udf-python/build.rs:9:5:
  arrow-udf-python requires Python 3.12 or later, but found 3.7
  hint: you can set `PYO3_PYTHON` environment varibale, e.g. `PYO3_PYTHON=python3.12`
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
hanxuanliang commented 3 weeks ago
-- PYO3_PRINT_CONFIG=1 is set, printing configuration and halting compile --
  implementation=CPython
  version=3.7
  shared=true
  abi3=true
  lib_name=python3.12
  lib_dir=/Users/xxx/.pyenv/versions/3.12.2/lib
  executable=/Users/xxx/.pyenv/versions/3.12.2/bin/python
  pointer_width=64
  build_flags=
  suppress_build_script_link_lines=false
wangrunji0408 commented 3 weeks ago

I have no idea on this problem. Is it possible to be related with pyenv?

sundy-li commented 2 weeks ago

Can we use RustPython to interpret the Python scripts?

wangrunji0408 commented 2 weeks ago

Can we use RustPython to interpret the Python scripts?

I have investigated RustPython. Its API documentation is far less complete than pyo3, making it harder to integrate. Considering the risk of its completeness and performance issues (someone said RustPython is generally 5-15x slower than CPython. ref), I did not choose RustPython as the runtime.

However, the major advantage of RustPython against pyo3 (CPython) is that natively it has no GIL limitation. Currently, to use CPython in a multithreaded environment, we have to switch between sub-interpreters, which has introduced significant performance overhead. I would be very appreciated if someone is willing to explore the RustPython solution. 😄