stephenberry / glaze

Extremely fast, in memory, JSON and interface library for modern C++
MIT License
1.11k stars 113 forks source link

glz::any::storage_base weak vtable #763

Closed arturbac closed 2 months ago

arturbac commented 7 months ago

Please make glz::any::storage_base destructor implementation in some translation unit, because ~storage_base() =default causes that vtable for it is emitted in every translation unit it is included and used write_file_json/read_file_json.

stephenberry commented 7 months ago

Interesting, thanks for pointing this out. Will address this.

arturbac commented 7 months ago

I just used to work with -Weverything on clang and a lot of -Werror= .. so it catches all that kind of stuff , it is not me ;-)

stephenberry commented 6 months ago

@arturbac, can you explain a bit more of what the problem is? And, how would you recommend a fix for a header-only library?

arturbac commented 6 months ago

When You declare polimorphic struct and use it it needs to instantiate its virtual table somewehre. And when threre is no single translation unit in which any of its polimorphic methods is intantiated compiler can't assume where it vatable belongs to and where in will be present during linking, so it happly instantiates it in all translation units it is included and finally linker have to remove 1 , 10, 100, 1000 etc and use only one from all translation units. When You have such classes that needs to have vtable it means they in general are not header only/shouldn't be header only. solution is one of 3 options

in .h
glz::any::storage_base { ~storage_base(); };

in hpp -> include it in only one cpp translation unit
glz::any::storage_base::~storage_base() {}

that way it's vtable will be instantiated only once

arturbac commented 6 months ago

The same applies to larger inline non template functions, they will be instantiated hundreds of times in every translation unit they are used. So having header only library with large non template inlines and virtual structs instead of normal linked static/shared lib is in general causing much longer compile times.

stephenberry commented 6 months ago

Thanks so much for your thorough explanation and suggestions. Do you have any idea of how std::any is implemented? How does std::any avoid this problem?

The reason we have our own implementation and don't use std::any is so that we can have a .data() method.

However, glz::any is also only used in glz::poly, which isn't required by anything in Glaze, so I'll remove the inclusion of glz::poly so that these headers aren't included in the rest of Glaze and users can include the file explicitly. This will remove this linker explosion for pretty much all users of Glaze. But, I would still like to improve glz::any.

arturbac commented 6 months ago

Will have to check, later.

stephenberry commented 6 months ago

The same applies to larger inline non template functions, they will be instantiated hundreds of times in every translation unit they are used. So having header only library with large non template inlines and virtual structs instead of normal linked static/shared lib is in general causing much longer compile times.

True, when a project has many translation units and needs to build from scratch. And the same is true for templated functions. Precompiled headers helps a lot with Glaze. And, most of Glaze uses templates and we wouldn't want to build a static library because it has so many compile time options that the library would be way larger than necessary for users (indeed it is infeasible). We could benefit from compiling number parsing into a static library, but I am planning to add support for C++20 modules, which I believe will solve the same problem without needing to build a static library in the traditional cpp approach.

stephenberry commented 6 months ago

This has been addressed for most users of Glaze by just not including glaze/util/poly.hpp within glaze/glaze.hpp, as most users don't need it. #792

arturbac commented 6 months ago

Precompiled headers helps a lot with Glaze.

When i try to use them with clang on glaze I only get errors building pch

if(${CMAKE_VERSION} VERSION_GREATER_EQUAL "3.16.0") file(GLOB_RECURSE HEADERS "${CMAKE_SOURCE_DIR}/include/.h" "${CMAKE_SOURCE_DIR}/include/.hpp" ) target_precompile_headers( glaze_glaze INTERFACE ${HEADERS}) endif()

like that obraz

stephenberry commented 6 months ago

@arturbac, Interesting. I use precompiled headers for Glaze on GCC all the time. I'll set up an issue to get them working on Clang.

arturbac commented 6 months ago

@arturbac, Interesting. I use precompiled headers for Glaze on GCC all the time. I'll set up an issue to get them working on Clang.

But You dont have such code in glaze CMakeLists.txt at all, shouln't it be preset there ?

arturbac commented 6 months ago

gcc doesn't have -Wweak-vtables warning at all so in stdlibc++ there are actually multiple such mistakes as it depends on knowledge of person writing the given code or if he not forgotten about it. missing override and inline implementation of virtual method instead of explictit vtable instantation in cpp such inline cannot be inlined at compile time only sometimes at linking time using de virtualization of known object types.

class bad_any_cast : public bad_cast { public: virtual const char* what() const noexcept { return "bad any_cast"; } };

stephenberry commented 6 months ago

@arturbac, Interesting. I use precompiled headers for Glaze on GCC all the time. I'll set up an issue to get them working on Clang.

But You dont have such code in glaze CMakeLists.txt at all, shouln't it be preset there ?

I wouldn't want to use precompiled headers by default. I could put an option in there, but it's easy for a user to add the precompiled headers that they want, and a lot of Glaze users only use some of the headers.

I use:

set(PCH_HEADERS ${glaze_SOURCE_DIR/include/glaze/glaze.hpp})
target_precompile_headers(${PROJECT_NAME} PRIVATE {PCH_HEADERS})

That is all I need for GCC

arturbac commented 6 months ago

using such precomp for a whole project after few years it leads to tragic code quality related to includes. As it assumes everything is available always so people do not manage includes at all in larger project. After some time when You need to switch off precomp You are surprised that project doesn't build at all I already had such issue with large project at my company with msvc and stdafx.h it took me ~2 moths to fix the project includes and make it compile with gcc (at that time android was using gcc) for android. And benefits for large 800K lines project with gcc and clang AFIR I last tested were discussable as building precomp and project summary was longer that just building project without precomp. And when You add that modifying header causes to rebuild precomp it is a waste of time only.

arturbac commented 6 months ago

But there is another opion after adding glaze to project explicitly add precomp on its interface

CPMAddPackage(
  glaze
  GITHUB_REPOSITORY stephenberry/glaze
  GIT_TAG        v2.1.5
  )
find_package(glaze REQUIRED)
target_precompile_headers( glaze_glaze INTERFACE "${CMAKE_BINARY_DIR}/_deps/glaze-src/include/glaze/glaze.hpp" )
stephenberry commented 6 months ago

At my company precompiled headers reduce our compile time by more than half for a couple of our projects. But, we use unity builds, so we're building from scratch each time. To say they are a waste of time isn't true for all use cases. But, I tend to agree that they shouldn't be relied on in general. At my last company the project was quite large and had many compilation units and precompiled headers still had a big impact (shaving off about 30% of compilation time).

I think C++20 modules are a much better solution, but we're just at the beginning stages of compiler support.

In terms of include quality, I've used include-what-you-use to automate includes. And, you can always have your CI pipeline test building without precompiled headers on. So, there are multiple ways to make working with precompiled headers helpful and not dangerous.

arturbac commented 6 months ago

To say they are a waste of time isn't true for all use cases.

U are Right.

I think C++20 modules are a much better solution, but we're just at the beginning stages of compiler support.

Yep right but there is bigger issue with them AFIR and I would not expect modules used at production in less than 5 yers or even longer. I read last time there is unsolvable problem with code completion in IDE because of methods being used to generate them. In short it went ~The code must generate some module interface to make libclang be able to use it, but to generate it it must be compiled ..

And, you can always have your CI pipeline test building without precompiled headers on.

definitely right.

stephenberry commented 6 months ago

Yeah, there's a lot of work to do on modules. So, finding ways to improve compilation times by reducing unnecessary includes and reducing template instantiations is where I'm primarily concerned right now.

arturbac commented 6 months ago

ok, but making it static lib instead of header only is low hanging fruit so I don't understand why people tend to force projects to be header only, it doesn't simplify anything with CMake it is just a matter of INTERFACE VS PUBLIC and target_add_sources

arturbac commented 6 months ago

and having cpp and static lib allows You to use explicit template instantation and cut dramatically compile time as You can expose to users only extern template decl, but this could be used only to finite instantations used internally in Your lib .. like this my code, I am exploiting that always when it is closed finite number of instantations

//.h only template declaration
template<map_load_mode_e mode, typename context_type>
[[nodiscard]]
load_map_error_e load_map(
  std::string_view map_path, context_type & ctx, map_context_settings_t const & cfg, std::string_view targeo_config_path
);
extern template load_map_error_e load_map<map_load_mode_e::image>(
  std::string_view, image_context_t &, map_context_settings_t const &, std::string_view
);
extern template load_map_error_e load_map<map_load_mode_e::route>(
  std::string_view, image_context_t &, map_context_settings_t const &, std::string_view
);

// .cpp all implementation of template + explicit instantations
template load_map_error_e load_map<map_load_mode_e::image>(
  std::string_view, image_context_t &, map_context_settings_t const &, std::string_view
);
template load_map_error_e load_map<map_load_mode_e::route>(
  std::string_view, image_context_t &, map_context_settings_t const &, std::string_view
);
arturbac commented 6 months ago

btw do You know about clang -ftime-trace ? obraz

stephenberry commented 6 months ago

I have been considering for a while adding an optional cpp that uses a configuration macro to enable extern templates for commonly used templates. I think this would be worthwhile, but it would be limited because glaze uses glz::opts for compile time options and there are simply too many options and combinations to support extern templates for most functions.

If I added a cpp it would be for handling basic types like integers, floating point numbers, and strings. I think this makes sense.

stephenberry commented 6 months ago

btw do You know about clang -ftime-trace ? obraz

Yes, I have used -ftime-trace and that visualizer extensively to improve glaze compilation times. But, it has been a while since I've revisited it.

arturbac commented 6 months ago

You can use explicit instantation using implicit instantation just wrapping all caeses by variadic template folding and explicitly instantiating only 1 template doing it job using all possible variants ..

stephenberry commented 6 months ago

You can use explicit instantation using implicit instantation just wrapping all caeses by variadic template folding and explicitly instantiating only 1 template doing it job using all possible variants ..

Yes, but doing all variants would cause the binary to explode. The number of combinations would be in the millions.

arturbac commented 6 months ago

from users point of view it dosn't matter as long it will not occupy large amount of storage. With release linker will drop all redudant instantations from static lib.

stephenberry commented 6 months ago

Yeah, we could probably support a decent common subset without the binary being too ridiculous. However, I find development teams typically know how to deal with header only libraries, such as the standard template library.

Glaze also intends to be unobtrusive with including headers. For example, Glaze supports std::unordered_map, but it doesn't #include <unordered_map> in any of its primary headers. So, users don't need to pay for this. By using concepts, we're able to minimize our standard library includes and yet support the standard library and many other third party libraries. To build this support into a static library doesn't really make sense. And yet, these are the templates that cost the most to instantiate and implement.

We could make a static lib just for the standard template library, but I'm torn about this. I think it is best for users to manage the number of compilation units for which they compile struct reading/writing. This is something we cannot do with a static library and is the most expensive part of compilation. Glaze customizes compile time maps with various hashing algorithms based on user structs. None of these user structs can be built into a static library. So, our best performance improvements come from improving the header only codebase.

Add to this that force inlining (not for basic types, but for standard library types and custom structs) makes Glaze run up to 20% faster. Most users would prefer a faster runtime product than faster compilation times.

arturbac commented 6 months ago

I think it is best for users to manage the number of compilation units for which they compile struct reading/writing.

I do that include thigs like glaze only in dedicated trlanslation unit so it cost me nothing. Also I dosplitting template implementation with template declaration and with compile time error when some ones try to include implementation header into normal header and not other implementation one or tranaslation unit.

stephenberry commented 6 months ago

Thanks for this dialogue and helpful suggestions. It's motivated me to spend more time improving compilation times and looking into making some optional extern templates. I'm going to start out by reducing unnecessary includes and then proceed from there.

arturbac commented 6 months ago

One thing related to Opts, check if at every part of code You really need all parameters of Opts, if not split them into SubOpts with minimal set of params, then You will cut number of variations of functions specializations.

arturbac commented 6 months ago

for ex opts::format is completely redundant after entering json code. layout is also invalid for json code, and bunch of other params so starting at top level at such place detail::read::template op < Opts > (value, ctx, b, e); you can stich with consteval function to format specific SubOpts and You can add then thx to that compile time error handling with invalid Opts

stephenberry commented 6 months ago

This is an excellent point. I'm not a fan of the added complexity, but there are a lot of places where I think it would make sense to reduce the set of options. Though it would only reduce compile times in the case of extern templates.

arturbac commented 6 months ago

yes U are right with compile times, but in general improve code quality and catch all errors related if they exists to invalid Opts use, meaning out of valid scope use.

arturbac commented 6 months ago

I am not a fun but maybe type erasure for parts of code that realy doesn't need to work with user provided type and changing this to limited number of template arguments related to describe user type specifics with extern template, potentially optimisation reducing heavly time of compilation but may not be feasible

arturbac commented 6 months ago

Is not that reducing complexity actually as per backend You have decidcated struct with opts and no convertions from any global Opts just concept .. ?

template<typename T, typename... Args>
concept same_as_any_of = std::disjunction_v<std::is_same<T, Args>...>;

struct json_opts 
  {
  int shared_var;
  int json_spec_var;
  };

struct csv_opts 
  {
  int shared_var;
  int csv_spec_var;
  };

template<typename opts_type>
concept concept_opts = 
requires ( opts_type const & opts ){
  requires same_as_any_of<opts_type, json_opts, csv_opts>;
  { opts.shared_var } -> std::same_as<int const &>;
};

struct glz_json{ int res; };
struct glz_csv{ int res; };

template<json_opts opts>
constexpr auto backend_read() -> glz_json
  { return glz_json{ opts.json_spec_var + opts.shared_var }; }

template<csv_opts opts>
constexpr auto backend_read() -> glz_csv
  { return glz_csv{ opts.csv_spec_var + opts.shared_var }; }

template<concept_opts auto options>
constexpr auto glz_read()
  { return backend_read<options>(); }

static_assert(std::same_as<glz_json, decltype(glz_read<json_opts{ .json_spec_var=3 }>())> );
static_assert(std::same_as<glz_csv, decltype(glz_read<csv_opts{ .shared_var=2, .csv_spec_var = 4 }>())> );
stephenberry commented 6 months ago

That does seem like a good approach. But, I also want to look into compile time inheritance for handling shared options. And, I don't think this is the most effective way to gain compilation performance (your example actually adds template instantiations).

I'm looking at approaches of optionally including extern templates.

But, I've also seen huge performance benefits from removing static from constexpr where appropriate, as it reduces the memory consumption at compile time.

stephenberry commented 6 months ago

Also, I went through implementations of std::any and other third party versions and I'm now cleaning up a version to replace in Glaze that avoids virtual and should fix the original issue.

stephenberry commented 2 months ago

Removing glz::poly, which removes glz::any in v3.0.0

glz::poly is being removed in favor of future C++26 reflection and more powerful static polymorphism that is faster than glz::poly which is planned for the future of Glaze.