koolhazz / gperftools

Automatically exported from code.google.com/p/gperftools
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Like a teeter totter,run slowly because call ThreadCache::FetchFromCentralCache and ThreadCache::ReleaseToCentralCache frequently #675

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.create many sub threads until unclaimed_cache_space_ < 0,all threads call 
free(malloc(1)),and sleep long time expect the first subthread .
2.the first subthread alloc many times then free for max_length_ = 8192
  state1:
  ThreadCache::size_=8,max_size_=64K
  ThreadCache::freelist[1] :length_ = 1,max_length_ = 8192,lowater_ = 1
3.alloc(8) ->
  state2:
  ThreadCache::size_=0,max_size_=64K
  ThreadCache::freelist[1] :length_ = 0,max_length_ = 8192,lowater_ = 0
4:alloc(8) -->
  state3:
  ThreadCache::size_=64K-8,max_size_=64K
  ThreadCache::freelist[1] :length_ = 8191,max_length_ = 8192,lowater_ = 0
  call
  ThreadCache::FetchFromCentralCache
  CentralFreeList::RemoveRange(spin lock of CentralFreeList) 
5:free
  state4:
  ThreadCache::size_=64K,max_size_=64K
  ThreadCache::freelist[1] :length_ = 8192,max_length_ = 8192,lowater_ = 8192
  call 
  ThreadCache::Scavenge
  ThreadCache::IncreaseCacheLimit(spin lock of heap)  
6:free -->state1
  call
  ThreadCache::ListTooLong
  ThreadCache::ReleaseToCentralCache  
  ThreadCache::PopRange
  CentralFreeList::InsertRange(spin lock of CentralFreeList) 
and 3,4,5,6,3,4...

What is the expected output? What do you see instead?
Because the list of malloc and free call many functions that wastes times and 
use spin lock,once enter this scene,then run slowly although these cpu kernels 
are busy.

What version of the product are you using? On what operating system?
ver:gperftools-2.4
os:RHEL 6.5

Please provide any additional information below.
In the sence, policy of lowater and policy of unclaimed_cache_space are invalid.

void* ThreadCache::FetchFromCentralCache(size_t cl, size_t byte_size) {
  FreeList* list = &list_[cl];
  ASSERT(list->empty());
  const int batch_size = Static::sizemap()->num_objects_to_move(cl);

  const int num_to_move = min<int>(list->max_length(), batch_size);
  void *start, *end;
  int fetch_count = Static::central_cache()[cl].RemoveRange(
      &start, &end, num_to_move);

  ASSERT((start == NULL) == (fetch_count == 0));
  if (--fetch_count >= 0) {
    size_ += byte_size * fetch_count;
    list->PushRange(fetch_count, SLL_Next(start), end);
+   list->clear_lowwatermark();
  }

test code

#include "config_for_unittests.h"
#include <stdio.h>
#ifdef HAVE_UNISTD_H
#include <unistd.h>    // for sleep()
#endif
#include "base/logging.h"
#include <gperftools/malloc_extension.h>
#include "tests/testutil.h"   // for RunThread()
#include <sys/time.h>

static void AllocOnce(void);

static void AllocStuffId(int nId);

static void TestVgop(int nId);

static void FastOrSlow(int nId);

static void CycleAllocFree(int nCount, int nSize);

int main(int argc, char** argv) 
{
    AllocOnce();

    FastOrSlow(0);
    int nThCount = 600;
    RunManyThreadsWithId(AllocStuffId, nThCount, 1 << 20);
    return 0;
}

static void AllocStuffId(int nId)
{
    AllocOnce();

    if (0 != nId)
    {
        sleep(5 * 60);
    }
    else
    {
        sleep(2 * 60);
        TestVgop(nId + 1);
    }
}

static void TestVgop(int nId)
{
    int nCount = (1 + 8192) / 2 * 8192;
    int nSize = 6;
    void** objects = new void*[nCount];
    int i = 0;
    for (i = 0; i < nCount; i++) {
        objects[i] = malloc(nSize);
    }

    for (i = 0; i < nCount; i++) {
        free(objects[i]);
    }
    int nPackCount = 8039;
    for (i = 0; i < nPackCount; i++) {
        objects[i] = malloc(nSize);
    }
    FastOrSlow(nId);
    for (i = 0; i < nPackCount; i++) {
        free(objects[i]);
    }

    delete[] objects;
}

void FastOrSlow(int nId)
{
    timeval starttime, endtime;
    gettimeofday(&starttime, 0);
    int nCount = 30 * 64 * 1024;
    int nSize = 6;
    CycleAllocFree(nCount, nSize);

    gettimeofday(&endtime, 0);
    double timeuse = 1000000 * (endtime.tv_sec - starttime.tv_sec) + endtime.tv_usec - starttime.tv_usec;
    timeuse /= 1000;
    printf("task id %d\ttime:%d ms\n",nId, int(timeuse));
}

void CycleAllocFree(int nCount, int nSize)
{
    for (int i = 0; i < nCount; i++)
    {
        void *p1 = malloc(nSize);
        void *p2 = malloc(nSize);
        free(p2);
        free(p1);
    }
}

void AllocOnce(void)
{
    free(malloc(1));
}

Original issue reported on code.google.com by fanzheny...@163.com on 26 Feb 2015 at 6:28

GoogleCodeExporter commented 9 years ago
I will need more details on what exactly is a problem, on proposed fix and on 
what exactly your test program is proving.

Original comment by alkondratenko on 1 Mar 2015 at 1:27