kuzudb / kuzu

Embeddable property graph database management system built for query speed and scalability. Implements Cypher.
https://kuzudb.com/
MIT License
1.17k stars 85 forks source link

Windows: C++ ABI stability? #2521

Open LowLevelMahn opened 7 months ago

LowLevelMahn commented 7 months ago

for example:

KUZU_API explicit Database(std::string databasePath);

the std::string-(template) parameter only works if you use the very same compiler with equal settings (debug/release,optimization) or else its more or less undefined behavior if the call works properbly

i'll switch to the C API because of this - what is the intention for having a non-100%-ABI-stable C++ interface? or should the C++ API only be used if Kuzu is build as part of my project (is that possible?)

benjaminwinger commented 7 months ago

I think part of the issue is that MSVC's builds are less compatible between debug and release builds than on other toolchains and platforms. On Linux (or GCC and clang at least) you can usually link release builds with debug builds.

If the issue is primarily with std::string, we could probably provide overloaded versions of the functions which take C-strings instead, (or maybe std::string_view, assuming that doesn't have similar problems to std::string). For the most part I think that should be straightforward, though in certain cases it might cause overload resolution issues if you're passing a value that can be implicitly converted to multiple types.

what is the intention for having a non-100%-ABI-stable C++ interface?

This is more an issue with C++ in general than with Kuzu in particular. C++'s ABI as far as I'm aware is, despite some efforts towards compatibility, not well defined and incompatible in several major ways. Clang for example, tries to be both ABI-compatible with MSVC and with GCC (citation needed), however GCC and MSVC aren't compatible at all (MinGW-built libraries on windows won't work with with MSVC-built ones), and MSVC itself, as mentioned, doesn't provide full ABI compatibility between debug and release binaries.

If you need a more ABI-stable interface, then the C API is probably a better choice. Do note that none of the APIs are stable between different versions of kuzu at the moment, so don't expect the C API to provide ABI stability with other versions of kuzu except by coincidence.

or should the C++ API only be used if Kuzu is build as part of my project (is that possible?)

Should be possible. You could include kuzu as an ExternalProject/with FetchContent in a CMake build, for example. I'm not sure if this has been tested, but it probably will work. But you could also build kuzu using the toolchain and configuration you need and link manually against those binaries. You don't have to use the pre-built binaries.

LowLevelMahn commented 7 months ago

I think part of the issue is that MSVC's builds are less compatible between debug and release builds than on other toolchains and platforms. On Linux (or GCC and clang at least) you can usually link release builds with debug builds.

but its still undefined - so more or less nothing to build on forever

If the issue is primarily with std::string, we could probably provide overloaded versions of the functions which take C-strings

yeah that could help

at least the debug/release difference can happen to nearly every template based type - so most of STL or you're own templates

what is the intention for having a non-100%-ABI-stable C++ interface?

This is more an issue with C++ in general than with Kuzu in particular.

thats clear, i only talked about why Kuzu relies on this not very well defined stuff

If you need a more ABI-stable interface, then the C API is probably a better choice. Do note that none of the APIs are stable between different versions of kuzu at the moment, so don't expect the C API to provide ABI stability with other versions of kuzu except by coincidence.

these are two different things: API/ABI stability (so the parameters/return values beeing the same over serveral version) or missing ABI-stibility due to undefined behavior when using templates in the API

or should the C++ API only be used if Kuzu is build as part of my project (is that possible?)

Should be possible. You could include kuzu as an ExternalProject in a CMake build, for example. I'm not sure if this has been tested, but it probably will work. But you could also build kuzu using the toolchain and configuration you need and link manually against those binaries. You don't have to use the pre-built binaries.

im already building Kuzu from source - having it as inlined part of my project would force the same compiler(settings) :)

benjaminwinger commented 7 months ago

thats clear, i only talked about why Kuzu relies on this not very well defined stuff

Oh, okay. It wasn't clear from your original post what you were referring to. We rely on the "not very well defined stuff" mostly because it hasn't been brought up as an issue before (again, we mostly work with compilers on other platforms where this is less of an issue), but also because I don't know of any good resource that defines exactly what isn't very well defined. I was aware that there are incompatibilities between debug and release ABIs on windows, but haven't been treating that as something that can be worked around.

LowLevelMahn commented 7 months ago

the big difference between gcc/clang and MSVC is that MSVC activates the STL-debug-features per default (in some STL containers/features)

Ubuntu 22.04.3 
  g++ 11.4
  clang 14.0
SUSE Tumbleweed
  g++ 13.2.1
  clang 17.0.5

g++ abi_test.cpp 
  sizeof(std::string): 32
  sizeof(std::vector<int>): 24

g++ -D_GLIBCXX_DEBUG abi_test.cpp 
  sizeof(std::string): 32
  sizeof(std::vector<int>): 56

clang++ abi_test.cpp
  sizeof(std::string): 32
  sizeof(std::vector<int>): 24

clang++ -stdlib=libc++ abi_test.cpp
  sizeof(std::string): 24
  sizeof(std::vector<int>): 24

clang++ -D_LIBCPP_DEBUG=1 -stdlib=libc++ abi_test.cpp
  sizeof(std::string): 24
  sizeof(std::vector<int>): 24

VS2022 x64, Debug
  sizeof(std::string): 40
  sizeof(std::vector<int>): 32

VS2022 x64, Release
  sizeof(std::string): 32
  sizeof(std::vector<int>): 24

and that could also change with different compiler versions/standard lib version - so prebuild libraries are definitly a problem what about kuzu using libstdcpp and my application is using libc++ - could maybe work, and the mangling is not abi defined

and libc++ from the llvm project uses different mangling

shared.cpp
  #include <string>
  size_t abi_test(std::string str){ return str.size(); }

linux@localhost:~/tests/shared> g++ -shared -o libshared.so shared.cpp
linux@localhost:~/tests/shared> nm libshared.so | grep abi_test
0000000000001109 T _Z8abi_testNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
linux@localhost:~/tests/shared> clang++ -shared -o libshared.so shared.cpp
linux@localhost:~/tests/shared> nm libshared.so | grep abi_test
0000000000001110 T _Z8abi_testNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
linux@localhost:~/tests/shared> clang++ -stdlib=libc++ -shared -o libshared.so shared.cpp
linux@localhost:~/tests/shared> nm libshared.so | grep abi_test
0000000000001100 T _Z8abi_testNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEE <===
benjaminwinger commented 6 months ago

2553 improved this somewhat by replacing some unnecessary uses of std::string with std::string_view, which has the same size with stl debugging enabled.

To do more we'd also need to re-write the classes exposed in the API as PIMPL classes: instead of returning std::unique_ptr<T>, we return a PIMPLed public wrapper class which acts like a unique_ptr<T>, but lets us destroy the objects using code from kuzu instead of relying on the calling function's implementation of ~unique_ptr which needs to know the size of the type. We can also forward declare the actual implementations and omit them from the public headers, letting us omit as much non-public stuff as possible (primarily Database and QueryResult need to be changed; Connection may as well be changed too for consistency, but doesn't contain any types that get larger with STL debugging).

As mentioned in https://github.com/kuzudb/kuzu/pull/2553#discussion_r1427258835 (and following comment), this is a lot of work for relatively little gain, and the issue can be worked around by linking against different versions of kuzu depending on your configuration (which admittedly has a number of downsides). However a significant part of the work (the aforementioned change to fix the destructors) is a good practice which we should probably do anyway. Making sure that std::string isn't mandatory anywhere on the other hand may be more trouble than it's worth (e.g. the user defined function APIs use it a lot), but for the simpler and more often used functions we can try to make sure to at least provide overloads.

LowLevelMahn commented 4 months ago

another idea is to do it the LLVM way - they do NOT write any wrapper for C++ to make the use safe but saying "you need to build and link with the same compiler when directly using the C++ interface" - only their C-API is language/runtime/compiler-version stable

maybe thats enough for Kuzu?

LowLevelMahn commented 1 month ago

2553 improved this somewhat by replacing some unnecessary uses of std::string with std::string_view, which has the same size with stl debugging enabled.

that is also in no way defined - could change any time - would not rely anthing on it - especially not a public API

but as i said - saying that the C++ API is only stable when consumed using the very same CRT and build-options - like LLVM does is maybe ok - but that must be stated clearly - and not based on "seems to work" mentality :)