Open jagerman opened 7 years ago
I definitely agree with both the detail
and cast
reorganization. I think there's one more thing that might be worth considering as this reorganization is taking place: going from header-only to optionally header-only. For example, see opt-in header-only libraries (but in the case of pybind11, it would be opt-out). I bring this up here because this would influence some of the header organization.
There are more details in the linked blog post, but I'll summarize what it would look like for pybind11 (it's actually slightly simplified since pybind11 would only support header-only and static lib configurations but not dynamic lib).
This would make it possible to compile pybind11 itself as a static library which would help with compile time. This can be done so that it's 100% backward compatible -- existing users would not notice any change and everything would still function as header-only by default. Compiling as a static library would be opt-in, e.g. by defining PYBIND11_STATIC_LIB
.
The main idea is that headers would be split up into the expected .h
and .cpp
files and follow completely normal C++ development. The only departure would be that all functions would need to annotated with a macro, e.g. PYBIND11_FUNC
that looks like this:
#if !defined(PYBIND11_STATIC_LIB)
# define PYBIND11_FUNC inline
#else
# define PYBIND11_FUNC
#endif
This would essentially replace inline
which is currently required for all non-template functions.
The header and source files would look like this:
// foo.h
#pragma once
PYBIND11_FUNC void foo();
#if !defined(PYBIND11_STATIC_LIB)
# include "foo.cpp"
#endif
// foo.cpp
#include "foo.h"
PYBIND11_FUNC void foo() {
// implementation
}
The obvious benefit would be compile time. I made a very quick prototype to measure the speedup that could be expected. I only moved generic_type
and cpp_function
into the static lib and got the numbers below. Note there is a lot more non-template code that could be moved into the static lib. This is measured by compiling pybind11_tests
:
macOS/clang 8.0.0
-----------------
header-only
time make: 96.01s user 3.03s system 121% cpu 1:21.71 total
file size: 1460552
static lib (prototype -- most of the library is still header-only)
time make: 77.64s user 2.79s system 120% cpu 1:06.86 total
file size: 1468768
--> time: +23% faster compilation
--> size: +0.56% (+8216 bytes) larger binary
macOS/GCC 6.3.0
---------------
header-only
time make: 130.98s user 6.10s system 99% cpu 2:17.59 total
file size: 1931124
static lib (prototype -- most of the library is still header-only)
time make: 102.53s user 5.57s system 99% cpu 1:48.59 total
file size: 1884812
--> time: +28% faster compilation
--> size: -2.40% (-46312 bytes) smaller binary
The performance would still go up as more of the files split up into .h
/.cpp
pairs. The speedup also becomes larger as the number of translation units is increased, as would be expected. The binary size goes up with clang, but down with GCC. I'd expect that should go down slightly and the regression with clang is most likely due to the quick prototype that this was measured with.
Most of the .h
/.cpp
split-up is fairly straightforward. It would just need to be coordinated with the reorganization above to avoid moving the files twice.
Thoughts?
Interesting idea re: header-only opt-in; I recall a few libraries where this is done quite well, like fmt
.
If done properly, this will probably involve a bit more work than a mere .h/.cpp split, like -- considering which templates to extern, maybe a few commonly used specialisations could get precompiled; considering whether the guts of some heavier types could be pimpled away; moving the type-parameter-independent functionality out of template classes (luckily, pybind11 development has always been very focused on binary size, so this is already satisfied for the most part).
@dean0x7d I think that pretty much the only benefactors would be cpp_function
and the generic_type_caster
as well as some initialization code, the other big and complicated parts are all templates. 25% sounds nice, but I don't think it's enough for me to get super-excited about this (especially since the change is quite intrusive).
@wjakob firstly hello Wenzel and contributors and thank you for this awesome software.
I'm working on the system simulator gem5 which makes heavy use of pybind11.
I was feeling that the build was a bit slower than what seemed reasonable, and I started to poke a bit, and it seemed that pybind11, or at least the way we are using it, was responsible for a large part of the problem.
I then tried to simply split pybind11 into header and cpp (we currently have an in tree copy of pybind11 2.4.1), and that alone reduced our full build time by 40% from 25 minutes to 15 minutes, which is, needless to say, a game changing improvement for our project as described at: https://gem5.atlassian.net/browse/GEM5-572
Because of this I was wondering, is there any chance you would review a patch that does such a split, or is there anyone with an ongoing patch?
It would be of course done in a way along what @dean0x7d suggested in an optional opt-in way that would not affect existing users.
Or alternatively, if you see any alternative that does not require modifying pybind11 source, also do let us know. Someone proposed precompiled headers, but I'm not sure the gains would be as significant (haven't tried it yet).
To give a clearer explanation of how gem5 uses pybind11, we basically just use it as a way to pass configuration parameters from python to C++. Our full build has 1271 object files, and 365 of them are simple auto-generated C++ files that include pybind11 to contain parameters for different C++ classes, and those 365 files dominate build times.
One such sample typical file looks like:
#include "pybind11/pybind11.h"
#include "pybind11/stl.h"
#include "params/ThermalReference.hh"
#include "python/pybind11/core.hh"
#include "sim/init.hh"
#include "sim/sim_object.hh"
#include "sim/power/thermal_model.hh"
namespace py = pybind11;
static void
module_init(py::module &m_internal)
{
py::module m = m_internal.def_submodule("param_ThermalReference");
py::class_<ThermalReferenceParams, SimObjectParams, std::unique_ptr<ThermalReferenceParams, py::nodelete>>(m, "ThermalReferenceParams")
.def(py::init<>())
.def("create", &ThermalReferenceParams::create)
.def_readwrite("temperature", &ThermalReferenceParams::temperature)
;
py::class_<ThermalReference, SimObject, std::unique_ptr<ThermalReference, py::nodelete>>(m, "ThermalReference")
.def("setNode", &ThermalReference::setNode)
;
}
static EmbeddedPyBind embed_obj("ThermalReference", module_init, "SimObject");
so we can see that each file individually is pretty simple, and the problem really seems to come from redefining common pybind11 objects 365 times.
As a random fact :-) I also saw that one of the reasons why pytorch does not use pybind11 is the build time impact: https://discuss.pytorch.org/t/how-are-python-bindings-created/46453/2?u=cirosantilli so maybe other big projects would also benefit from this.
I also generate pybinds automatically, and was aware of the build time problem from using luabind in the past. I chose to generate one large file with all of my pybinds in it, instead of a bunch of smaller files. This seems to work pretty well, but since I did not generate separate files first, I don't really have anything to compare to.
Just a thought, in case it would work for someone else.
@carlsonmark yes, this is one of the possibilities we are considering if it is decided that the split should not happen yet. Hopefully we can avoid that though to prevent slow rebuilds when touching one of the generation inputs.
679 initially added a
detail/
directory to the header includes. It was withdrawn to be discussed later in a separate issue/PR; this is meant to be the start of discussion. Also read the following not as a done deal, but as a request for comments where I'm mostly thinking out loud to start the discussion.I think we should make the move for most (or even all?) of our
detail
namespace code. Basically, all the things inclass_support.h
are an obvious first start to move todetail/class.h
; but there are also many other potential additions:Detail headers
the various template meta-code in common.h (for things like satisfied_all_of, would fit nicely into a
detail/meta.h
. It's relatively centralized now (in common.h), but that hasn't always been the case and has led to me, at least, occassionally duplicating meta template functionality that was already implemented (but implemented elsewhere, i.e. immediately before it was used/needed).The same sort of moving of
detail
namespace bits intodetail
-namespace companions would be a nice cleanup to the non-detail code. E.g. pybind11.h'sgeneric_type
andinit
classes, keep alive implementation, etc. are all already in thedetail
namespace; moving them into a detail directory header would be appropriate.Type casters
Along a similar line, we could move type caster implementations into their own namespace,
pybind11::cast
, and directory,pybind11/cast
. For now, for backwards compatibility,cast
would be a namespace alias ofdetail
, but the plan would be to eventually drop the alias so that all type casters live aspybind11::cast::type_caster<T>
, thus having external type casters put themselves in thedetail
namespace.Thus you would include
pybind11/cast/stl.h
for stl casters,pybdin11/cast/eigen.h
for eigen casters, etc. (The currentpybind11/stl.h
,pybind11/eigen.h
would just one-line compatibility headers that just include the new location).Many of the built-in casters would be usefully moved as well:
cast/arithmetic.h
cast/string.h
cast/tuple.h
cast.h
.cast.h
—I think it's there mainly for historic reasons, since it was originally part of thestd::tuple
caster. Perhapsdetail/argloader.h
?