google / tcmalloc

Apache License 2.0
4.31k stars 463 forks source link

tcmalloc cannot optimize dynamic memory allocation caused by std::vector. #237

Closed NeilZhy closed 3 months ago

NeilZhy commented 3 months ago

I run my application with LD_PRELOAD=tcmalloc . My program uses the third-party library draco when running.The calling code is similar to the sample in draco.

draco::DecoderBuffer buffer;
buffer.Init(data.data(), data.size());

const draco::EncodedGeometryType geom_type =
    draco::GetEncodedGeometryType(&buffer);
if (geom_type == draco::TRIANGULAR_MESH) {
  unique_ptr<draco::Mesh> mesh = draco::DecodeMeshFromBuffer(&buffer);
} else if (geom_type == draco::POINT_CLOUD) {
  unique_ptr<draco::PointCloud> pc = draco::DecodePointCloudFromBuffer(&buffer);
}

I captured the dynamic memory allocation flame graph using the bcc tool.Combining perf, objdump, gdb and other tools, we found that a hotspot function with high memory usage is inside draco.The final draco code is as follows:

bool DataBuffer::Update(const void *data, int64_t size, int64_t offset) {
  if (data == nullptr) {
    if (size + offset < 0)
      return false;
    // If no data is provided, just resize the buffer.
    data_.resize(size + offset);
  } else {
    if (size < 0)
      return false;
    if (size + offset > static_cast<int64_t>(data_.size()))
      data_.resize(size + offset);
    const uint8_t *const byte_data = static_cast<const uint8_t *>(data);
    std::copy(byte_data, byte_data + size, data_.data() + offset);
  }
  descriptor_.buffer_update_count++;
  return true;
}

where data_ is defined as std::vector data。 Further analysis shows that every time the application calls the draco code, the entire draco will go through multiple nested calls to new to complete the definition of member variables. Therefore, every time data is used, it is a newly defined variable, which can actually be understood as a temporary variable. My question is, why can't tcmalloc optimize away this part of dynamic memory allocation?

junyer commented 3 months ago

TCMalloc isn't the C++ compiler. TCMalloc is a memory allocator.

NeilZhy commented 3 months ago

@junyer Yes, I know TCMalloc is a memory allocator. Please take a closer look at my question. My question is that after using tcmalloc to run the program, there are still a lot of dynamic memory allocations in my program caused by using std::vector.

junyer commented 3 months ago

Please clarify (a) what a memory allocator should have done, (b) how that could have been done and (c) which memory allocators would have done that. At the risk of stating the obvious, a memory allocator is a library that allocates memory; expecting it to "optimise" how it gets called is like expecting libc to "optimise" how it gets called to, say, fopen(3) and fclose(3) FILEs. There's nothing here beyond wishful thinking.

thelex0 commented 3 months ago

a) these are not tcmalloc function names. are you sure that program using tcmalloc? b) tcmalloc has limited cpu cache bounds, so it's not true that there won't be any allocations after a few minutes

junyer commented 3 months ago

Indeed, various caches are bounded in various ways. Moreover, it's conceivable that at least some allocations are larger than kMaxSize.

NeilZhy commented 3 months ago

a) these are not tcmalloc function names. are you sure that program using tcmalloc? b) tcmalloc has limited cpu cache bounds, so it's not true that there won't be any allocations after a few minutes

a. I use tcmalloc to load my program in the following way. LD_PRELOAD=path/to/gperftools/lib/libtcmalloc.so ./my_test_program
b. I noticed that github:tcmalloc has the MallocExtension::instance()->SetMaxPerCpuCacheSize interface, but these interfaces are not in github:gperftools. I did some configuration by using the github:gperftools's MallocExtension::instance()->SetNumericProperty( "tcmalloc.max_total_thread_cache_bytes", 100 * 1024 * 1024); interface, but it had no effect.

junyer commented 3 months ago

Is there some reason why this issue wasn't filed against https://github.com/gperftools/gperftools?

junyer commented 3 months ago

If you aren't using TCMalloc as provided by this project, is there some reason to seek support from this project? gperftools is a separate (older) project with a different (older) implementation of TCMalloc that, in particular, doesn't offer per-CPU caching.

junyer commented 3 months ago

I see you filed https://github.com/gperftools/gperftools/issues/1517, so I think we are done here.

junyer commented 3 months ago

Spamming unrelated projects is bad open source citizenship, IMO, but good luck with that.

junyer commented 3 months ago

It's also worth noting that even the issue title ("tcmalloc cannot optimize dynamic memory allocation caused by std::vector.") is at best a miscommunication and at worst a misrepresentation. It's unclear that there's any problem here beyond your misunderstanding of how to behave towards open source projects and how to respect people's time and attention.