pybind / pybind11

Seamless operability between C++11 and Python
https://pybind11.readthedocs.io/
Other
15.52k stars 2.09k forks source link

[BUG]: iterators are needlessly slow / need an exception-free way of finishing. #5233

Open Jannik2099 opened 2 months ago

Jannik2099 commented 2 months ago

Required prerequisites

What version (or hash if on master) of pybind11 are you using?

2.13.1

Problem description

Currently, the only way to have an iterator finish is by throwing py::stop_iteration{}. While this is "pythonic", it also incurs huge overhead, especially on short-lived iterators.

I was benchmarking an utility I wrote in C++ that iterates over a lot of files and parses text fields from each file. The files are organized in a specific directory structure that is reflected as three layers of directory iterators, leading to a total of ~25k iterators being created. The C++ program took 0.8s to execute, 0.47s of which were spent waiting on IO. The equivalent Python code exposed via pybind took 4s to execute.

When profiling, I saw that 15% of time was spent in exception handling (or up to 40% when using libunwind or llvm-libunwind, bumping execution time to 6s). This seems like a low-hanging fruit compared to all the other pybind-induced overhead.

Sadly, I couldn't come up with a good way how to solve this just yet. Perhaps pybind could add a std::optional-esque container that wraps the iterator return type + a tag on whether the iterator is at it's end?

I also found no way to signal the iterator end without going throw py::stop_iteration, if I missed something obvious please yell at me.

Reproducible example code

No response

Is this a regression? Put the last known working version here if it is.

Not a regression

NvSchobi commented 1 month ago

I see the same issue for any exception and started a thread here: https://github.com/pybind/pybind11/discussions/5317