Closed arturbac closed 2 months ago
Interesting, thanks for pointing this out. Will address this.
I just used to work with -Weverything on clang and a lot of -Werror= .. so it catches all that kind of stuff , it is not me ;-)
@arturbac, can you explain a bit more of what the problem is? And, how would you recommend a fix for a header-only library?
When You declare polimorphic struct and use it it needs to instantiate its virtual table somewehre. And when threre is no single translation unit in which any of its polimorphic methods is intantiated compiler can't assume where it vatable belongs to and where in will be present during linking, so it happly instantiates it in all translation units it is included and finally linker have to remove 1 , 10, 100, 1000 etc and use only one from all translation units. When You have such classes that needs to have vtable it means they in general are not header only/shouldn't be header only. solution is one of 3 options
in .h
glz::any::storage_base { ~storage_base(); };
in hpp -> include it in only one cpp translation unit
glz::any::storage_base::~storage_base() {}
that way it's vtable will be instantiated only once
The same applies to larger inline non template functions, they will be instantiated hundreds of times in every translation unit they are used. So having header only library with large non template inlines and virtual structs instead of normal linked static/shared lib is in general causing much longer compile times.
Thanks so much for your thorough explanation and suggestions. Do you have any idea of how std::any
is implemented? How does std::any
avoid this problem?
The reason we have our own implementation and don't use
std::any
is so that we can have a.data()
method.
However, glz::any
is also only used in glz::poly
, which isn't required by anything in Glaze, so I'll remove the inclusion of glz::poly
so that these headers aren't included in the rest of Glaze and users can include the file explicitly. This will remove this linker explosion for pretty much all users of Glaze. But, I would still like to improve glz::any
.
Will have to check, later.
The same applies to larger inline non template functions, they will be instantiated hundreds of times in every translation unit they are used. So having header only library with large non template inlines and virtual structs instead of normal linked static/shared lib is in general causing much longer compile times.
True, when a project has many translation units and needs to build from scratch. And the same is true for templated functions. Precompiled headers helps a lot with Glaze. And, most of Glaze uses templates and we wouldn't want to build a static library because it has so many compile time options that the library would be way larger than necessary for users (indeed it is infeasible). We could benefit from compiling number parsing into a static library, but I am planning to add support for C++20 modules, which I believe will solve the same problem without needing to build a static library in the traditional cpp approach.
This has been addressed for most users of Glaze by just not including glaze/util/poly.hpp
within glaze/glaze.hpp
, as most users don't need it. #792
Precompiled headers helps a lot with Glaze.
When i try to use them with clang on glaze I only get errors building pch
if(${CMAKE_VERSION} VERSION_GREATER_EQUAL "3.16.0") file(GLOB_RECURSE HEADERS "${CMAKE_SOURCE_DIR}/include/.h" "${CMAKE_SOURCE_DIR}/include/.hpp" ) target_precompile_headers( glaze_glaze INTERFACE ${HEADERS}) endif()
like that
@arturbac, Interesting. I use precompiled headers for Glaze on GCC all the time. I'll set up an issue to get them working on Clang.
@arturbac, Interesting. I use precompiled headers for Glaze on GCC all the time. I'll set up an issue to get them working on Clang.
But You dont have such code in glaze CMakeLists.txt at all, shouln't it be preset there ?
gcc doesn't have -Wweak-vtables warning at all so in stdlibc++ there are actually multiple such mistakes as it depends on knowledge of person writing the given code or if he not forgotten about it. missing override and inline implementation of virtual method instead of explictit vtable instantation in cpp such inline cannot be inlined at compile time only sometimes at linking time using de virtualization of known object types.
class bad_any_cast : public bad_cast { public: virtual const char* what() const noexcept { return "bad any_cast"; } };
@arturbac, Interesting. I use precompiled headers for Glaze on GCC all the time. I'll set up an issue to get them working on Clang.
But You dont have such code in glaze CMakeLists.txt at all, shouln't it be preset there ?
I wouldn't want to use precompiled headers by default. I could put an option in there, but it's easy for a user to add the precompiled headers that they want, and a lot of Glaze users only use some of the headers.
I use:
set(PCH_HEADERS ${glaze_SOURCE_DIR/include/glaze/glaze.hpp})
target_precompile_headers(${PROJECT_NAME} PRIVATE {PCH_HEADERS})
That is all I need for GCC
using such precomp for a whole project after few years it leads to tragic code quality related to includes. As it assumes everything is available always so people do not manage includes at all in larger project. After some time when You need to switch off precomp You are surprised that project doesn't build at all I already had such issue with large project at my company with msvc and stdafx.h it took me ~2 moths to fix the project includes and make it compile with gcc (at that time android was using gcc) for android. And benefits for large 800K lines project with gcc and clang AFIR I last tested were discussable as building precomp and project summary was longer that just building project without precomp. And when You add that modifying header causes to rebuild precomp it is a waste of time only.
But there is another opion after adding glaze to project explicitly add precomp on its interface
CPMAddPackage(
glaze
GITHUB_REPOSITORY stephenberry/glaze
GIT_TAG v2.1.5
)
find_package(glaze REQUIRED)
target_precompile_headers( glaze_glaze INTERFACE "${CMAKE_BINARY_DIR}/_deps/glaze-src/include/glaze/glaze.hpp" )
At my company precompiled headers reduce our compile time by more than half for a couple of our projects. But, we use unity builds, so we're building from scratch each time. To say they are a waste of time isn't true for all use cases. But, I tend to agree that they shouldn't be relied on in general. At my last company the project was quite large and had many compilation units and precompiled headers still had a big impact (shaving off about 30% of compilation time).
I think C++20 modules are a much better solution, but we're just at the beginning stages of compiler support.
In terms of include quality, I've used include-what-you-use to automate includes. And, you can always have your CI pipeline test building without precompiled headers on. So, there are multiple ways to make working with precompiled headers helpful and not dangerous.
To say they are a waste of time isn't true for all use cases.
U are Right.
I think C++20 modules are a much better solution, but we're just at the beginning stages of compiler support.
Yep right but there is bigger issue with them AFIR and I would not expect modules used at production in less than 5 yers or even longer. I read last time there is unsolvable problem with code completion in IDE because of methods being used to generate them. In short it went ~The code must generate some module interface to make libclang be able to use it, but to generate it it must be compiled ..
And, you can always have your CI pipeline test building without precompiled headers on.
definitely right.
Yeah, there's a lot of work to do on modules. So, finding ways to improve compilation times by reducing unnecessary includes and reducing template instantiations is where I'm primarily concerned right now.
ok, but making it static lib instead of header only is low hanging fruit so I don't understand why people tend to force projects to be header only, it doesn't simplify anything with CMake it is just a matter of INTERFACE VS PUBLIC and target_add_sources
and having cpp and static lib allows You to use explicit template instantation and cut dramatically compile time as You can expose to users only extern template decl, but this could be used only to finite instantations used internally in Your lib .. like this my code, I am exploiting that always when it is closed finite number of instantations
//.h only template declaration
template<map_load_mode_e mode, typename context_type>
[[nodiscard]]
load_map_error_e load_map(
std::string_view map_path, context_type & ctx, map_context_settings_t const & cfg, std::string_view targeo_config_path
);
extern template load_map_error_e load_map<map_load_mode_e::image>(
std::string_view, image_context_t &, map_context_settings_t const &, std::string_view
);
extern template load_map_error_e load_map<map_load_mode_e::route>(
std::string_view, image_context_t &, map_context_settings_t const &, std::string_view
);
// .cpp all implementation of template + explicit instantations
template load_map_error_e load_map<map_load_mode_e::image>(
std::string_view, image_context_t &, map_context_settings_t const &, std::string_view
);
template load_map_error_e load_map<map_load_mode_e::route>(
std::string_view, image_context_t &, map_context_settings_t const &, std::string_view
);
btw do You know about clang -ftime-trace ?
I have been considering for a while adding an optional cpp that uses a configuration macro to enable extern templates for commonly used templates. I think this would be worthwhile, but it would be limited because glaze uses glz::opts
for compile time options and there are simply too many options and combinations to support extern templates for most functions.
If I added a cpp it would be for handling basic types like integers, floating point numbers, and strings. I think this makes sense.
btw do You know about clang -ftime-trace ?
Yes, I have used -ftime-trace and that visualizer extensively to improve glaze compilation times. But, it has been a while since I've revisited it.
You can use explicit instantation using implicit instantation just wrapping all caeses by variadic template folding and explicitly instantiating only 1 template doing it job using all possible variants ..
You can use explicit instantation using implicit instantation just wrapping all caeses by variadic template folding and explicitly instantiating only 1 template doing it job using all possible variants ..
Yes, but doing all variants would cause the binary to explode. The number of combinations would be in the millions.
from users point of view it dosn't matter as long it will not occupy large amount of storage. With release linker will drop all redudant instantations from static lib.
Yeah, we could probably support a decent common subset without the binary being too ridiculous. However, I find development teams typically know how to deal with header only libraries, such as the standard template library.
Glaze also intends to be unobtrusive with including headers. For example, Glaze supports std::unordered_map
, but it doesn't #include <unordered_map>
in any of its primary headers. So, users don't need to pay for this. By using concepts, we're able to minimize our standard library includes and yet support the standard library and many other third party libraries. To build this support into a static library doesn't really make sense. And yet, these are the templates that cost the most to instantiate and implement.
We could make a static lib just for the standard template library, but I'm torn about this. I think it is best for users to manage the number of compilation units for which they compile struct reading/writing. This is something we cannot do with a static library and is the most expensive part of compilation. Glaze customizes compile time maps with various hashing algorithms based on user structs. None of these user structs can be built into a static library. So, our best performance improvements come from improving the header only codebase.
Add to this that force inlining (not for basic types, but for standard library types and custom structs) makes Glaze run up to 20% faster. Most users would prefer a faster runtime product than faster compilation times.
I think it is best for users to manage the number of compilation units for which they compile struct reading/writing.
I do that include thigs like glaze only in dedicated trlanslation unit so it cost me nothing. Also I dosplitting template implementation with template declaration and with compile time error when some ones try to include implementation header into normal header and not other implementation one or tranaslation unit.
Thanks for this dialogue and helpful suggestions. It's motivated me to spend more time improving compilation times and looking into making some optional extern templates. I'm going to start out by reducing unnecessary includes and then proceed from there.
One thing related to Opts, check if at every part of code You really need all parameters of Opts, if not split them into SubOpts with minimal set of params, then You will cut number of variations of functions specializations.
for ex opts::format is completely redundant after entering json code.
layout is also invalid for json code, and bunch of other params
so starting at top level at such place
detail::read
This is an excellent point. I'm not a fan of the added complexity, but there are a lot of places where I think it would make sense to reduce the set of options. Though it would only reduce compile times in the case of extern templates.
yes U are right with compile times, but in general improve code quality and catch all errors related if they exists to invalid Opts use, meaning out of valid scope use.
I am not a fun but maybe type erasure for parts of code that realy doesn't need to work with user provided type and changing this to limited number of template arguments related to describe user type specifics with extern template, potentially optimisation reducing heavly time of compilation but may not be feasible
Is not that reducing complexity actually as per backend You have decidcated struct with opts and no convertions from any global Opts just concept .. ?
template<typename T, typename... Args>
concept same_as_any_of = std::disjunction_v<std::is_same<T, Args>...>;
struct json_opts
{
int shared_var;
int json_spec_var;
};
struct csv_opts
{
int shared_var;
int csv_spec_var;
};
template<typename opts_type>
concept concept_opts =
requires ( opts_type const & opts ){
requires same_as_any_of<opts_type, json_opts, csv_opts>;
{ opts.shared_var } -> std::same_as<int const &>;
};
struct glz_json{ int res; };
struct glz_csv{ int res; };
template<json_opts opts>
constexpr auto backend_read() -> glz_json
{ return glz_json{ opts.json_spec_var + opts.shared_var }; }
template<csv_opts opts>
constexpr auto backend_read() -> glz_csv
{ return glz_csv{ opts.csv_spec_var + opts.shared_var }; }
template<concept_opts auto options>
constexpr auto glz_read()
{ return backend_read<options>(); }
static_assert(std::same_as<glz_json, decltype(glz_read<json_opts{ .json_spec_var=3 }>())> );
static_assert(std::same_as<glz_csv, decltype(glz_read<csv_opts{ .shared_var=2, .csv_spec_var = 4 }>())> );
That does seem like a good approach. But, I also want to look into compile time inheritance for handling shared options. And, I don't think this is the most effective way to gain compilation performance (your example actually adds template instantiations).
I'm looking at approaches of optionally including extern templates.
But, I've also seen huge performance benefits from removing static from constexpr where appropriate, as it reduces the memory consumption at compile time.
Also, I went through implementations of std::any and other third party versions and I'm now cleaning up a version to replace in Glaze that avoids virtual
and should fix the original issue.
Removing glz::poly
, which removes glz::any
in v3.0.0
glz::poly
is being removed in favor of future C++26 reflection and more powerful static polymorphism that is faster than glz::poly
which is planned for the future of Glaze.
Please make glz::any::storage_base destructor implementation in some translation unit, because ~storage_base() =default causes that vtable for it is emitted in every translation unit it is included and used write_file_json/read_file_json.