google / tcmalloc

Apache License 2.0
4.35k stars 472 forks source link

How would I build a shared object file? #27

Open SamSaffron opened 4 years ago

SamSaffron commented 4 years ago

Trying to get my head around bazel is there a way of building a tcmalloc.so shared object file from this project?

I know the recommendation is just to compile applications with tcmalloc directly but for my use case: https://github.com/SamSaffron/allocator_bench I would like to do a side by side comparison to perftools and jemalloc that are LD_PRELOADed.

Also not against experimenting with a statically compiled ruby including tcmalloc, if we can prove it is faster / better maybe Ruby folks would be open to adding it.

ckennelly commented 4 years ago

Modifying the BUILD file to no longer have the linkstatic=1 attribute should produce a libtcmalloc.so.

That said, all of TCMalloc's dependencies (in Abseil) will end up being dynamically linked in that case, so the performance overhead will be higher than necessary.

gaffneyc commented 4 years ago

@SamSaffron I had the same idea after seeing it on HackerNews yesterday. I was able to build a shared library using code in #16.

Benchmarks were run on Linux 5.6.7, AMD 2700, compiled with gcc 9.3.0

ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-linux]
built-in mem: 170800 duration: 4.936174764
built-in mem (MALLOC_ARENA_MAX=2): 139292 duration: 5.318339345
tcmalloc 2.7 mem: 172408 duration: 4.401484569
tcmalloc 2.7.90 mem: 173216 duration: 4.773377773
jemalloc 5.0.0 mem: 169692 duration: 4.470789708
jemalloc 5.0.1 mem: 135116 duration: 4.678362483
jemalloc 5.1.0 mem: 168668 duration: 4.454680265
jemalloc 5.2.1 mem: 139296 duration: 4.780960854
google/tcmalloc mem: 197184 duration: 6.906723601
karanaggarwal1994 commented 2 years ago

How to link abseil in tcmalloc.so?

MaskRay commented 2 years ago

Having a documented way to build a self-contained tcmalloc.so or tcmalloc.a will be very useful.

I am playing with lld with different malloc implementations today (https://gist.github.com/MaskRay/219effe23a767b85059097f863ebc085). For many allocators using them is simply

ld.lld @response.txt -o lld.glibc
ld.lld ~/Dev/mimalloc/out/release/libmimalloc.a @response.txt -o lld.mi
ld.lld ~/Dev/jemalloc/out/release/lib/libjemalloc.a @response.txt -o lld.je
ld.lld ~/Dev/snmalloc/out/release/libsnmallocshim-static.a @response.txt /usr/lib/x86_64-linux-gnu/libatomic.so.1 -o lld.sn

It seems that if a project doesn't use Bazel, it's very difficult to plug tcmalloc into it. (Yeah I know llvm-project has an unofficial util/bazel/ but still it's very unclear how to make it use tcmalloc.)

junyer commented 2 years ago

AFAIK, creating a shared library with Bazel requires a cc_binary() rule with, at minimum, linkshared = True; see this part of the Bazel documentation. Depending on the situation, wrangling symbol visibility might also be necessary; see this part of the pybind_extension() implementation. One could start off by cloning //tcmalloc:tcmalloc as //tcmalloc:tcmalloc.so and then tweaking the latter until it works well enough for one's purposes.

jesHrz commented 3 months ago

Following this instruction solves the problem.

Briefly, adding a new cc_binary target called libtcmalloc.so that depends on the the target tcmalloc with linkshared = 1 at the end of the BUILD, and then bazel build //tcmalloc:libtcmalloc.so will generate the dynamic library.

cc_binary(
    name = "libtcmalloc.so",
    deps = [":tcmalloc"],
    linkshared = 1,
    copts = TCMALLOC_DEFAULT_COPTS,
)

A new question: How can I access the headers outside of the tcmalloc project?

lano1106 commented 2 months ago

@SamSaffron I had the same idea after seeing it on HackerNews yesterday. I was able to build a shared library using code in #16.

Benchmarks were run on Linux 5.6.7, AMD 2700, compiled with gcc 9.3.0

ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-linux]
built-in mem: 170800 duration: 4.936174764
built-in mem (MALLOC_ARENA_MAX=2): 139292 duration: 5.318339345
tcmalloc 2.7 mem: 172408 duration: 4.401484569
tcmalloc 2.7.90 mem: 173216 duration: 4.773377773
jemalloc 5.0.0 mem: 169692 duration: 4.470789708
jemalloc 5.0.1 mem: 135116 duration: 4.678362483
jemalloc 5.1.0 mem: 168668 duration: 4.454680265
jemalloc 5.2.1 mem: 139296 duration: 4.780960854
google/tcmalloc mem: 197184 duration: 6.906723601

this is an old benchmark... Is there more recent benchmark that compares the latest TCMalloc version performance against other allocators? I had the assumption that the latest version was at least as good as the gperf one and certainly better with all the new bells and whistles such as rseq, THP support and C++14 sized delete... but since I am having such a hard time playing with the new version, I need to make sure that the effort is worth the trouble... I am currently doubting after having spent a full day learning bazel and getting results not working well...

I do not care if the allocation time is slightly slower if the use of THP means less TLB lookups, less page faults and makes the overall process better but those things have to be measured. With all the rich documentation that the TCMalloc team is providing, I am surprised that there is no performance numbers published anywhere... gperf TCMalloc had a lot of benchmarks numbers to compare it against pt2malloc...

https://gperftools.github.io/gperftools/tcmalloc.html

xiedeacc commented 2 months ago

Following this instruction solves the problem.

Briefly, adding a new cc_binary target called libtcmalloc.so that depends on the the target tcmalloc with linkshared = 1 at the end of the BUILD, and then bazel build //tcmalloc:libtcmalloc.so will generate the dynamic library.

cc_binary(
    name = "libtcmalloc.so",
    deps = [":tcmalloc"],
    linkshared = 1,
    copts = TCMALLOC_DEFAULT_COPTS,
)

A new question: How can I access the headers outside of the tcmalloc project?

just copy tcmalloc directory under tcmalloc repo to /usr/local/include, or add -I point to tcmalloc repo path

xiedeacc commented 2 months ago

@SamSaffron I had the same idea after seeing it on HackerNews yesterday. I was able to build a shared library using code in #16. Benchmarks were run on Linux 5.6.7, AMD 2700, compiled with gcc 9.3.0

ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-linux]
built-in mem: 170800 duration: 4.936174764
built-in mem (MALLOC_ARENA_MAX=2): 139292 duration: 5.318339345
tcmalloc 2.7 mem: 172408 duration: 4.401484569
tcmalloc 2.7.90 mem: 173216 duration: 4.773377773
jemalloc 5.0.0 mem: 169692 duration: 4.470789708
jemalloc 5.0.1 mem: 135116 duration: 4.678362483
jemalloc 5.1.0 mem: 168668 duration: 4.454680265
jemalloc 5.2.1 mem: 139296 duration: 4.780960854
google/tcmalloc mem: 197184 duration: 6.906723601

this is an old benchmark... Is there more recent benchmark that compares the latest TCMalloc version performance against other allocators? I had the assumption that the latest version was at least as good as the gperf one and certainly better with all the new bells and whistles such as rseq, THP support and C++14 sized delete... but since I am having such a hard time playing with the new version, I need to make sure that the effort is worth the trouble... I am currently doubting after having spent a full day learning bazel and getting results not working well...

I do not care if the allocation time is slightly slower if the use of THP means less TLB lookups and makes the overall process better but those things have to be measured. With all the rich documentation that the TCMalloc team is providing, I am surprised that there is no performance numbers published anywhere... gperf TCMalloc had a lot of benchmarks numbers to compare it against pt2malloc...

comment #linkstatic = 1, still works

lano1106 commented 2 months ago

Following this instruction solves the problem.

Briefly, adding a new cc_binary target called libtcmalloc.so that depends on the the target tcmalloc with linkshared = 1 at the end of the BUILD, and then bazel build //tcmalloc:libtcmalloc.so will generate the dynamic library.

cc_binary(
    name = "libtcmalloc.so",
    deps = [":tcmalloc"],
    linkshared = 1,
    copts = TCMALLOC_DEFAULT_COPTS,
)

A new question: How can I access the headers outside of the tcmalloc project?

there is something not quite right... I am not very fond of bazel for being very opaque about what it is doing. This recipe to create a shared library will create one. With one small but critical omission. With the help of @fweimer-rh

he made me discover that bazel was not including the elf soname field in the created shared lib. This omission will stop LD_PRELOAD to work. Unless you happen to also have the libtcmalloc.so within the LD_LIBRARY_PATH search paths.

to find if you have the problem, you can use: $ readelf -Wd libtcmalloc.so | grep SONAME

if the output is empty, you have a halfbacked shared lib...

maybe there is a way to specify bazel to include the soname but I don't know enough the tool to know how...

xiedeacc commented 2 months ago

Wl,soname may works

xiedeacc commented 2 months ago

cc_binary( name = "libtcmalloc.so", linkopts = ["-Wl,-soname,libtcmalloc.so"], linkshared = 1, deps = [":tcmalloc"], )

this works

lano1106 commented 2 months ago

I am going to give you the nickname of bazel master!

thx a lot... I am determined to make tcmalloc work with my project but it may take few days...

the other issue is the ODR violation if so lib user is also using Abseil that Chris Kennelly did report in some other issues... it is the case of my project...

kenshin92 commented 2 months ago

https://bazel.build/reference/be/c-cpp#cc_shared_library

lano1106 commented 2 months ago

https://bazel.build/reference/be/c-cpp#cc_shared_library

interesting info... I am definitely keeping a bookmark on this important bazel reference page

I am not too sure what the is the value added of the cc_shared_library rule over cc_library or cc_binary with the needed flags value.

Nonetheless, I gave cc_shared_library a try.

you still need: user_link_flags = ["-Wl,-soname,libtcmalloc_new.so"],

to have the soname added in the binary.

I am giving the target a different name. Bazel cannot have 2 targets with the same names.

lano1106 commented 2 months ago

I am improving my custom bazel setup:

tcmalloc/BUILD modif:

cc_shared_library(
    name = "tcmalloc_shared",
    shared_lib_name = "libtcmalloc.so",
    user_link_flags = ["-Wl,-O1,--sort-common,--as-needed", "-Wl,-soname,libtcmalloc.so"],
    deps = [":tcmalloc"],
)

built with this cmdline: bazel build --copt "-march=sapphirerapids" --copt "-O3" --copt "-flto" --linkopt "-O3" --linkopt "-flto=auto" //tcmalloc:tcmalloc_shared

build is successful but I get ODR warnings:

INFO: From Linking tcmalloc/libtcmalloc.so:
tcmalloc/internal/percpu.cc:65:30: warning: 'tcmalloc_sampler' violates the C++ One Definition Rule [-Wodr]
./tcmalloc/sampler.h:89:7: note: type name 'tcmalloc::tcmalloc_internal::Sampler' should match type name 'char'
./tcmalloc/allocation_sampling.h:57:58: note: 'tcmalloc_sampler' was previously declared here

I have created a pull request to address the problem: https://github.com/google/tcmalloc/pull/257

ganwenbo commented 1 month ago

cc_binary( name = "libtcmalloc.so", linkopts = ["-Wl,-soname,libtcmalloc.so"], linkshared = 1, deps = [":tcmalloc"], )

this works

How to modify it if I want to access "libtcmalloc.a" ?

lano1106 commented 1 month ago

A prerequisite to have a tcmalloc shared object file is to have abseil offer the same thing through bazel:

https://github.com/abseil/abseil-cpp/issues/1746

now once that is being said, this path appears to be a difficult one to go through... I have come to consider 3 alternatives:

  1. check if the current tcmalloc.so containing abseil code would work within my app despite possible ODR issues... I strictly use Abseil for its containers only. I link with -labsl_hash -labsl_raw_hash_set
  2. Have tcmalloc and its abseil libs symbols exported from the executable binary to be made available to the loaded libraries after having statically linked tcmalloc into the executable binary.
  3. static link everything (IMHO this is a pain... that would be my last resort)

Concerning 2, this is a new idea/concept to me but this is apparently possible: https://stackoverflow.com/questions/16354268/how-to-export-specific-symbol-from-executables-in-gnu-linux https://stackoverflow.com/questions/5685617/missing-symbols-from-static-library-in-linked-executable

This is such an out-of-the-norm idea that I will need to have a better understanding of how the loader works. I know that with LD_PRELOAD_LIBRARY=tcmalloc, its symbols will take precedence over libc ones... but how about if they are exported from the executable? I suppose the executable symbols will also take precedence over any other ones but I need to confirm this point. I have never done this before...

My main motivation for making this succeed is that I want to make third parties libs used by my program all use tcmalloc, with OpenSSL being the main one.

lano1106 commented 1 month ago

option 2 looks promising... I have replaced linking with abseil shared libraries with tcmalloc private abseil obj files and my other shared libraries using the abseil classes are now using symbols exported by the exec binary:

(I had to modify WORKSPACE file to make tcmalloc use the latest abseil LTS release)

# Abseil
http_archive(
    name = "com_google_absl",
    urls = ["https://github.com/abseil/abseil-cpp/releases/download/20240722.0/abseil-cpp-20240722.0.tar.gz"],
)
ABSL_OBJS = /home/lano1106/dev/tcmalloc/bazel-bin/external/abseil-cpp~/absl/container/_objs/raw_hash_set/raw_hash_set.pic.o /home/lano1106/dev/tcmalloc/bazel-bin/external/abseil-cpp~/absl/hash/_objs/hash/hash.pic.o /home/lano1106/dev/tcmalloc/bazel-bin/external/abseil-cpp~/absl/hash/_objs/low_level_hash/low_level_hash.pic.o
LIBS := -lkraken -ltradecore -ltrillionbase -lev -lfmt #-labsl_hash -labsl_raw_hash_set
$(TARGET) : $(OBJS)
    $(CXX) $(LDFLAGS) -o $@ $(OBJS) $(ABSL_OBJS) $(LIBS)
lano1106 commented 1 month ago

this is my final update. This is a first time for me. Yes, a program executable can export symbols for its shared libraries and it works. This conclusion totally removes the need for me to have tcmalloc in a shared library format.

/*
 * test_shared.cpp
 *
 * Olivier Langlois - August 21, 2024
 *
 * g++ -g -std=c++26 -fPIC -shared -Wl,-soname,libtest_shared.so -o libtest_shared.so test_shared.cpp
 */

#include <cstdlib>
#include <iostream>

/*
 * test_malloc()
 */
void test_malloc()
{
    char *ptr{static_cast<char *>(malloc(1))};

    *ptr = 'a';
    std::cout << *ptr << '\n';
    free(ptr);
}

/*
 * test_new()
 */
void test_new()
{
    char *ptr = new char;

    *ptr = 'a';
    std::cout << *ptr << '\n';
    delete ptr;
}
/*
 * tcmalloc_test.cpp
 *
 * Olivier Langlois - August 21, 2024
 *
 * bazel build --copt "-std=c++26" tcmalloc_test
 */

void test_malloc();
void test_new();

int main(int argc, char *argv[])
{
    test_malloc();
    test_new();
    return 0;
}

BUILD file:

cc_binary(
    name = "tcmalloc_test",
    srcs = ["tcmalloc_test.cpp", "libtest_shared.so"],
    deps = [
        "@com_google_tcmalloc//tcmalloc:tcmalloc",
    ],
    copts = [
        "-std=c++26",
    ],
)
lano1106 commented 1 month ago

cc_binary( name = "libtcmalloc.so", linkopts = ["-Wl,-soname,libtcmalloc.so"], linkshared = 1, deps = [":tcmalloc"], ) this works

How to modify it if I want to access "libtcmalloc.a" ?

I am not sure....

tcmalloc/BUILD tcmalloc rule is already a cc_library rule yet it does not create a libtcmalloc.a file... when you include the cc_library tc_malloc into another binary rule, underneath, bazel will collect and link a bunch of object files from the tcmalloc dependencies and link them into the final target...

If I needed to get what you want, I would look into the cc_library rule arguments at https://bazel.build/reference/be/c-cpp#cc_library_args

you are right... with a libtcmalloc.a, you could use it outside bazel... Otherwise building manually the dependency object files list is a pain...

it is not easily made because AFAIK, the philosophy of bazel is to enable its users to use the master branch of all their project dependencies. So as soon as a commit is performed, it is recompiled and added immediately at the next bazel build... Creating a lib file goes against its philosophy...

update: I did dig the question a little bit deeper.

I assume that you did just follow the tcmalloc quickstart guide.

when you do: bazel build tcmalloc/testing:hello_main

because of the tcmalloc alwayslink = 1 argument, hello_main is going to be linked against all the tcmalloc obj files.

however if your target is tcmalloc itself, a lib file is going to be created...

do bazel build //tcmalloc:tcmalloc

it will create bazel-bin/tcmalloc/libtcmalloc.lo

what is a .lo file? idk for sure....

$ file libtcmalloc.lo
libtcmalloc.lo: current ar archive

I did a quick search... It is an archive containing PIC code... but I could not put my finger exactly on what was the difference with a .a file and its relation with it... when you should use a .lo or a .a file...

here is the best reference that I have found but it does not answer my question precisely... https://www.gnu.org/software/libtool/manual/libtool.html

I guess that it can be used for linking as if it was a .a file... or you can use gnu libtool to convert it into a .a file...

update: if you want a .a archive file, remove the alwayslink = 1, from the tcmalloc rule...

just be aware that linking against the libtcmalloc.a is tricky... You must somehow instruct the linker to use libtcmalloc.a malloc/free, new/delete from libtcmalloc.a file and not the ones provided by libc... I have not found how to do it... (or I even have not tried)... vs linking all the tcmalloc objects in, you leave no room for the linker to decide that...

ganwenbo commented 1 month ago

cc_binary( name = "libtcmalloc.so", linkopts = ["-Wl,-soname,libtcmalloc.so"], linkshared = 1, deps = [":tcmalloc"], ) this works

How to modify it if I want to access "libtcmalloc.a" ?

I am not sure....

tcmalloc/BUILD tcmalloc rule is already a cc_library rule yet it does not create a libtcmalloc.a file... when you include the cc_library tc_malloc into another binary rule, underneath, bazel will collect and link a bunch of object files from the tcmalloc dependencies and link them into the final target...

If I needed to get what you want, I would look into the cc_library rule arguments at https://bazel.build/reference/be/c-cpp#cc_library_args

you are right... with a libtcmalloc.a, you could use it outside bazel... Otherwise building manually the dependency object files list is a pain...

it is not easily made because AFAIK, the philosophy of bazel is to enable its users to use the master branch of all their project dependencies. So as soon as a commit is performed, it is recompiled and added immediately at the next bazel build... Creating a lib file goes against its philosophy...

update: I did dig the question a little bit deeper.

I assume that you did just follow the tcmalloc quickstart guide.

when you do: bazel build tcmalloc/testing:hello_main

because of the tcmalloc alwayslink = 1 argument, hello_main is going to be linked against all the tcmalloc obj files.

however if your target is tcmalloc itself, a lib file is going to be created...

do bazel build //tcmalloc:tcmalloc

it will create bazel-bin/tcmalloc/libtcmalloc.lo

what is a .lo file? idk for sure....

$ file libtcmalloc.lo
libtcmalloc.lo: current ar archive

I did a quick search... It is an archive containing PIC code... but I could not put my finger exactly on what was the difference with a .a file and its relation with it... when you should use a .lo or a .a file...

here is the best reference that I have found but it does not answer my question precisely... https://www.gnu.org/software/libtool/manual/libtool.html

I guess that it can be used for linking as if it was a .a file... or you can use gnu libtool to convert it into a .a file...

update: if you want a .a archive file, remove the alwayslink = 1, from the tcmalloc rule...

just be aware that linking against the libtcmalloc.a is tricky... You must somehow instruct the linker to use libtcmalloc.a malloc/free, new/delete from libtcmalloc.a file and not the ones provided by libc... I have not found how to do it... (or I even have not tried)... vs linking all the tcmalloc objects in, you leave no room for the linker to decide that...

I remove 'alwayslink = 1' from the file tcmalloc/BUILD and execute command 'bazel build //tcmalloc:tcmalloc'. I get it! Thanks a lot!