kuzudb / kuzu

Embeddable property graph database management system built for query speed and scalability. Implements Cypher.
https://kuzudb.com/
MIT License
1.41k stars 99 forks source link

Windows/Shared lib/Dll: clib free can't be used on string-results with non shared CRT #2527

Closed LowLevelMahn closed 5 months ago

LowLevelMahn commented 11 months ago

all C-API routines that internaly using convertToOwnedCString can't be freed on application side using free (will crash due to different heaps in application and dll) - under linux sharing is default - on windows its mostly not thats why most other C libaries like openssl, curl, ... have such "free" functions in their API

crash on free of kuzu_value_get_string result

        kuzu_flat_tuple *tuple = kuzu_query_result_get_next(result);

        kuzu_value *value = kuzu_flat_tuple_get_value(tuple, 0);
        char *name = kuzu_value_get_string(value);
        kuzu_value_destroy(value);

        value = kuzu_flat_tuple_get_value(tuple, 1);
        int64_t since = kuzu_value_get_int64(value);
        kuzu_value_destroy(value);

        value = kuzu_flat_tuple_get_value(tuple, 2);
        char *name2 = kuzu_value_get_string(value);
        kuzu_value_destroy(value);

        printf("%s follows %s since %lld \n", name, name2, since);
        free(name); <=== this crashes - due to different Heaps in the kuzu_shared.dll and my application
        free(name2); <=== same crash
        kuzu_flat_tuple_destroy(tuple);

these seems to alloc on dll side the string:

there needs to be something like a kuzu_free(void) or kuzu_string_free(char) to make destroying safe

would be also great to have functions that do not allocate but filling a string buffer on the application side

so as alternative to:

char *name2 = kuzu_value_get_string(value);

also

int kuzu_value_set_string2(char* buf, max_size);

char name2[30];
res =kuzu_value_set_string2(name2, sizeof(name2));
res == 0, res == -1 (buffer too small)
acquamarin commented 11 months ago

May i know which compiler are you using on windows? I don't get a seg fault when freeing the kuzu allocated string using MSVC.

LowLevelMahn commented 11 months ago

VS2022 up-to-date, x64, Debug -> using a VS2022 build of Kuzu current git

  mkdir kuzu_dev
  cd kuzu_dev
  git clone https://github.com/kuzudb/kuzu
  cd kuzu
  make release NUM_THREADS=4
  make test NUM_THREADS=4
  cmake --install build/release --prefix ../../kuzu_install --strip

it should not crash if both Kuzu and the App are sharing the same CRT else it needs to crash https://learn.microsoft.com/en-us/cpp/c-runtime-library/potential-errors-passing-crt-objects-across-dll-boundaries?view=msvc-170

i got this error in the output:

HEAP[kuzu_tests.exe]: Invalid address specified to RtlValidateHeap( 00000177D6220000, 0000097E36772340 )

what more or less means that its not on this heap

this check happens only in Debug

Release silently ignores that because then the Debug/HeapValidation is not active (there is still the problem that the Kuzu buils is using the Release CRT and my application is using the Debug CRT - i think they can't share data) - so Release works due to same Release CRT

https://learn.microsoft.com/en-us/cpp/build/reference/md-mt-ld-use-run-time-library?view=msvc-170

image

but this is all unrelevant if the DLL provides a "free" functions then all cases are working

mewim commented 10 months ago

We will rework C API a bit to provide a destructor function for all the values we return, so that the user can destruct the values we return even if it is allocated by a different version of malloc.

zaddach commented 7 months ago

Actually I'd guess that this is related to #3225. You mention explicitly that you build with Debug (using msvcrtd), and Rust on Windows always uses the release version (msvcrt). If you somehow manage to link, you'll still use two different runtimes (debug vs release).

3226 should fix that..

LowLevelMahn commented 7 months ago

its not solveable by linking with other libs on kuzu side if your own application is using a different CRT (debug, release or a complete different compiler version, or language with C calling compatiblity) then kuzu => it will not work at runtime (because different CRTs are not compatible)

the differences on linux are smaller - but only due to different default behavior - mostly shared heaps and nearly no differences between Release/Debug builds (when not using different check flags as i explained)

for a C++ API its common(and needed because of missing ABI-Standard) to use the same compiler (what makes it implicitly someway (when CRTs are shared) compatible) but a C API should not depend on using the same compiler/same CRT - thats the reason for having a C API - for not beeing dependent at all

the C API is not independent if its needed that the dll/so of Kuzu NEEDS to share the same CRT with the application - and only this would allow a free of a Kuzu string in my own application

and this is not a Windows-special - its only not that super common on linux - but getting still a problem when switching stdc++ lib or using more check-flags etc. - what many developers just miss to do

and i have technically no chance of having the same CRT if i write a Java or C#, Python Wrapper on top of the Kuzu C API - thats the reason for having free-Routines in C APIs

some developers tend to think that "it seem to work" is some sort of guarantee :)

benjaminwinger commented 5 months ago

I believe this has been fixed in #2815 (with more work following that up in #3457, better described in #3133). There is now a kuzu_destroy_string function for freeing strings produced by the C API.

https://github.com/kuzudb/kuzu/blob/bc8213e6893e83ee59e6857d912e2a63be551a7d/src/include/c_api/kuzu.h#L1452-L1459

LowLevelMahn commented 5 months ago

the question is why not kuzu_destroy - anything that gets allocated inside Kuzu dll needs to be freed - not only strings

and is preferred over using the standard C free function.

its not prefered - its needed - or else your using code is dependent on how Kuzu is linked to the project and maybe you're not responsible for that in your project

mewim commented 5 months ago

the question is why not kuzu_destroy - anything that gets allocated inside Kuzu dll needs to be freed - not only strings

and is preferred over using the standard C free function.

its not prefered - its needed - or else your using code is dependent on how Kuzu is linked to the project and maybe you're not responsible for that in your project

I think we have now provide a destroy function for everything allocated by kuzu C API (if you search for destroy in this file): https://github.com/kuzudb/kuzu/blob/bc8213e6893e83ee59e6857d912e2a63be551a7d/src/include/c_api/kuzu.h

If there is anything still missing, let us know.