Closed jorgensd closed 12 months ago
At least for the regression seen in DOLFINx, we have found a resolution: https://github.com/FEniCS/dolfinx/pull/2895#issuecomment-1817294687 I will test is for the case above today. Could you comment on https://github.com/FEniCS/dolfinx/pull/2895#issuecomment-1817294687?
If you are doing heavy computations in your bindings themselves rather than the more typical setup of forwarding calls to another library that does the heavy lifting, then NOMINSIZE
is probably an option you will want to add.
If you are doing heavy computations in your bindings themselves rather than the more typical setup of forwarding calls to another library that does the heavy lifting, then
NOMINSIZE
is probably an option you will want to add.
In the wrapper itself in DOLFINx, we do not do heavy lifting, it calls the DOLFINx C++ code that has been compiled.
Maybe the issue is that some of the functions we wrap from dolfinx
are templated, so they are compiled together with the bindings.
In any case, as long as your performance regression disappears with NOMINSIZE, it seems to me that this issue can be closed.
Problem description
In DOLFINx, we are currently seeing some major performance regressions when using nanobind instead of pybind: https://github.com/FEniCS/dolfinx/issues/2891
I've tried to dig into what is wrong, but haven't been able to explain the 2x slowdown with nanobind. When trying to isolate the issue, I just compared wrapping a numpy array with nanobind and pybind, code available at: https://github.com/jorgensd/nanobind_example/tree/dokken/nano_vs_pb
i.e Pybinding
nanobinding
Running the code in the reproducible example below, I get the timings:
for an array of size 100 million, and
with a 1 billion array.
As I stated earlier, I know this is not the cause of the regression (but could be a part of it). Do you have any insight for us? The binding that we are currently timing is: https://github.com/FEniCS/dolfinx/commit/0edd19f3cbed5d9acc911610aef4703410123dad#diff-97f6b59b7db9ef802d0a9cff4b9a3780864c3aa8c5efe78a920a7a04c428714aL147-L162
Reproducible example code