potassco / clingo

🤔 A grounder and solver for logic programs.
https://potassco.org/clingo
MIT License
601 stars 79 forks source link

Support custom malloc in build system #407

Closed haampie closed 1 year ago

haampie commented 1 year ago

In Spack we're using clingo, and there I noticed there's a consistent 15% speedup when switching from glibc malloc to mimalloc:

$ export LD_PRELOAD=/path/to/libmimalloc.so
$ spack solve --timers hdf5+mpi
...
    load           0.015s
    ground         2.032s
    solve          2.448s
...

vs

$ spack solve --timers hdf5+mpi
    load           0.016s
    ground         2.312s
    solve          3.061s

(Also an equal speedup when enabling profile guided optimization and link time optimization)

Would it be OK to support this directly in the build system? On Linux you can rely on LDPRELOAD, but it's cumbersome in that it affects all applications. I don't have numbers for macOS, but there it's definitely harder to take a libc symbol from a different dylibc, and DYLD* variables don't propagate to child processes.

rkaminsk commented 1 year ago

Shouldn't it be enough to set the linker flags when compiling clingo without any changes to clingo itself? If you pass

-DCMAKE_CXX_FLAGS:STRING='-include <path-to>/mimalloc.h'
-DCMAKE_SHARED_LINKER_FLAGS:STRING='-lmimalloc'
-DCMAKE_EXE_LINKER_FLAGS:STRING='-lmimalloc'
haampie commented 1 year ago

Personally I think it's better to do it in the build system, since cmake knows the relevant flags for specific compilers, and mimalloc has a cmake-config file. On top of that it's a bit finicky to get this right, since (on linux/elf platforms) when libclingo is dlopen'ed, it will resolve the malloc symbol first from its dependents, not from dependencies. So you typically end up with default malloc.

One option is to compile with the object file and use -Wl,-Bsymbolic to force the dynamic loader to look for malloc inside the shared library itself. Of course that could also be done through -DCMAKE_* flags, but I think it's cleaner to make it part of the build system under a toggle -DCLINGO_WITH_MIMALLOC:BOOL=ON and have it do find_package(mimalloc 2.0 REQUIRED) conditionally, and ensure that the shared libs are created with -Wl,-Bsymbolic on relevant platforms. Or maybe easier: have it statically link mimalloc's malloc into the relevant shared libs, and make sure this symbol is not exported, so that malloc is resolved internally.

I'm happy to open a pull request, let me know what you think.

rkaminsk commented 1 year ago

One option is to compile with the object file and use -Wl,-Bsymbolic to force the dynamic loader to look for malloc inside the shared library itself. Of course that could also be done through -DCMAKE_* flags, but I think it's cleaner to make it part of the build system under a toggle -DCLINGO_WITH_MIMALLOC:BOOL=ON and have it do find_package(mimalloc 2.0 REQUIRED) conditionally, and ensure that the shared libs are created with -Wl,-Bsymbolic on relevant platforms.

If it can be done with a few lines of cmake code, I am fine with this. It looks like an option has to be introduced and mimalloc has to be added to the target link libraries. Does it have to be added to all targets or is just the clingo library enough? Furthermore, the webpage suggests to include mimalloc-new-delete.h for best performance with C++.

I'm happy to open a pull request, let me know what you think.

Yes, please go ahead.

haampie commented 1 year ago

Scoping mimalloc's malloc just to libclingo seems difficult if not impossible to achieve in the end :( if you link mimalloc statically but keep the symbol visible, it gets overridden by glibc malloc (from python in my case); if you make the symbol hidden/private (through -Wl,--version-script with global: clingo_*; local: *;) it seems to run into issues where malloc from glibc is used but free from mimalloc, causing segfaults. Maybe LD_PRELOAD globally is the way...

rkaminsk commented 1 year ago

Have you tried just using the header mimalloc-new-delete.h? Clingo should mostly use C++'s new/delete.

haampie commented 1 year ago

Ah, nice, that can indeed be scoped to libclingo. I didn't even try, cause it's an implementation header, so it thought it would be similar to statically linking mimalloc's malloc symbol. The upside is that this can work without the malloc override. I will test a bit more, see if it's better to statically link mimalloc (and if so, the --version-script file may be nice to have, since we can't control the visibility of the symbols in the static mimalloc lib)

Edit: mimalloc-static reduces the runtime consistently by 1.3% compared to dynamic. With LTO that difference is gone.

haampie commented 1 year ago

At the end of the day, mimalloc-new-delete.h worked "by accident" for our use of the Python module. This is because including the header puts operator new/delete symbols in libclingo.so, and libclingo.so itself happens to be the first library to pull in libstdc++.so. So basically it's globally overriding operator new/delete, but its success depends on load order of shared libs; in our case that translates into the order in which Python modules are imported.

Given that it's so brittle, I think I'll drop this and stick to a small, hacky patch in Spack itself to ensure mimalloc's new/delete are used.

(You can of course still have build system support to link bin/clingo executables to mimalloc, since that's not so painful as scoping it to a library, but Spack is just using the libraries to interface with clingo)

rkaminsk commented 1 year ago

At the end of the day, mimalloc-new-delete.h worked "by accident" for our use of the Python module. This is because including the header puts operator new/delete symbols in libclingo.so, and libclingo.so itself happens to be the first library to pull in libstdc++.so. So basically it's globally overriding operator new/delete, but its success depends on load order of shared libs; in our case that translate

Then the LD_PRELOAD options looks better. Changing clingo to explicitly use mimalloc (or any other library) would be possible but this would be a rather large patch.