microsoft / mimalloc

mimalloc is a compact general purpose allocator with excellent performance.
MIT License
9.74k stars 791 forks source link

Calling mi_heap_collect does not decommits all unused pages #878

Closed adiholden closed 2 months ago

adiholden commented 2 months ago

Hi, I am using version v2.0.9 and I can see high discrepancy between used and rss memory also after calling mi_heap_collect. The discrepancy is seen after calling free for allocations of multiple blocks which reside in segment with pages that were not freed. To reproduce the discrepancy I wrote 2 unit tests both with same number and sizes of allocation but the order of allocation is different between test. (see tests bellow) One test first allocates 200 times blocks of size 32288 and than allocates 100000 times blocks of size 3584, than frees all 100000 blocks of size 3584, and checking the commited memory after this free shows expected 64Mb memory. The other test does the also allocations of 200 times blocks of size 32288 and 100000 times blocks of size 3584 but toggles between them i.e one 32288 bytes allocation and than 400 times 32288 bytes allocation and continues. In this test after I free all the 32288 blocks the commited memory is 352.0 MiB and after calling mi_heap_collect it is 319.9 MiB.

In the first unit test there is no intersecting in segments between the big and small blocks so when freeing all the small blocks segments are freed and the memory is decommitted.

While trying to debug the flow in the second unit test I do see that some memory is decommitted while freeing the 32288 blocks even if segment still in use (from mi_segment_delayed_decommit flow), but it is not always decommitted because of delaying decommits (from mi_option_decommit_extend_delay and mi_option_decommit_extend_delay) I would think that this flow will also be called in mi_heap_collect flow and will decommit segment->decommit_mask, does that makes sense?

TEST_F(MiMallocTest, Test1) {
  auto* myheap = mi_heap_new();

  const int kBigAllocCount = 200;
  const int kSmallAllocCount = 100000;

  void* big_ptr[kBigAllocCount];
  for (size_t i = 0; i < kBigAllocCount; ++i) {
    big_ptr[i] = mi_heap_malloc_aligned(myheap, 32288, 8);
  }

  void* small_ptr[kSmallAllocCount];
  for (size_t i = 0; i < kSmallAllocCount; ++i) {
    small_ptr[i] = mi_heap_malloc_aligned(myheap, 3584, 8);
  }

  mi_stats_print_out(NULL, NULL);
  // free small allocations
  for (size_t i = 0; i < kSmallAllocCount; ++i) {
    mi_free_size_aligned(small_ptr[i], 0, 8);
  }

  mi_stats_print_out(NULL, NULL);

  mi_heap_collect(myheap, true);

  mi_stats_print_out(NULL, NULL);

  // free small allocations
  for (size_t i = 0; i < kBigAllocCount; ++i) {
    mi_free_size_aligned(big_ptr[i], 32288, 8);
  }

  mi_heap_destroy(myheap);
}

TEST_F(MiMallocTest, Test2) {
  auto* myheap = mi_heap_new();

  const int kBigAllocCount = 200;
  const int kSmallAllocCount = 100000;

  void* big_ptr[kBigAllocCount];
  void* small_ptr[kSmallAllocCount];

  int j = 0;
  for (size_t i = 0; i < kSmallAllocCount; ++i) {
    if (j < kBigAllocCount && i % 400 == 0) {
      big_ptr[j] = mi_heap_malloc_aligned(myheap, 32288, 8);
      ++j;
    }
    small_ptr[i] = mi_heap_malloc_aligned(myheap, 3584, 8);
  }
  CHECK_EQ(j, kBigAllocCount);

  mi_stats_print_out(NULL, NULL);
  // free small allocations
  for (size_t i = 0; i < kSmallAllocCount; ++i) {
    mi_free_size_aligned(small_ptr[i], 0, 8);
  }

  mi_stats_print_out(NULL, NULL);

  mi_heap_collect(myheap, true);

  mi_stats_print_out(NULL, NULL);

  // free big allocations
  for (size_t i = 0; i < kBigAllocCount; ++i) {
    mi_free_size_aligned(big_ptr[i], 32288, 8);
  }
  mi_stats_print_out(NULL, NULL);
  mi_heap_collect(myheap, true);

  mi_stats_print_out(NULL, NULL);

  mi_heap_destroy(myheap);
}
daanx commented 2 months ago

Thanks for pointing this out -- I updated dev and dev-slice to be more aggressive when mi_collect is called with forced true. I tested with your code and I got the expected results (at 6.4 MiB , not 64MiB). I needed to set MIMALLOC_ARENA_EAGER_COMMIT=0 to get the right statistics as on recent versions it defaults to eagerly committing 1GiB on systems that have overcommit.

adiholden commented 2 months ago

Thanks alot @daanx I also run my tests with the fix and see that the changes fixed the problem. Do you know when you are going to create a new tag version which will include this changes?