microsoft / mimalloc

mimalloc is a compact general purpose allocator with excellent performance.
MIT License
9.74k stars 793 forks source link

The performance of mimalloc is unexpectedly lower than that of the system's built-in allocator. #825

Closed litterbug23 closed 7 months ago

litterbug23 commented 7 months ago

Test Envirment: Windows 10 , MSVC 2017 x86 system malloc: 59000 (ms) mimalloc 92000 (ms):

#include <mimalloc.h>
#include <ctime>
#include <iostream>
typedef uint32_t u32;
typedef float f32;
u32 getMicrosecondsCPU()
{
    clock_t newClock = clock();
    return (u32)((f32)(newClock) / ((f32)CLOCKS_PER_SEC / 1000000.0));
}

int main(int argc, char** argv) {
    u32 time1= getMicrosecondsCPU();
#pragma omp parallel for num_threads(6) // NEW ADD
        for (int i = 0; i < 1000000; i++)
        {
            //void* pppp = _aligned_malloc(1024, 8);
            //_aligned_free(pppp);
            void* pppp = malloc(1024);
            free(pppp);
        }
        u32 time2 = getMicrosecondsCPU();
        std::cout << "system malloc:" << (time2 - time1) << std::endl;
    }
    {
        u32 time1 = getMicrosecondsCPU();
#pragma omp parallel for num_threads(6) // NEW ADD
        for(int i = 0; i < 1000000; i++)
        {
            void* pppp = mi_malloc(1024);
            mi_free(pppp);
        }
        u32 time2 = getMicrosecondsCPU();
        std::cout <<"mimalloc: " <<  (time2 - time1) << std::endl;
    }
}
mjp41 commented 7 months ago

You have chosen a benchmark size just above the threshold for small allocations.

https://github.com/microsoft/mimalloc/blob/4e50d6714d471b72b2285e25a3df6c92db944593/doc/mimalloc-doc.h#L253

Do you see the same performance with 1024? It would be useful if you submitted your performance numbers with your issue, and removed the dead/commented code from the sample.

If you begin the code with ```C++ and end it with ``` then it will be easier to read.

litterbug23 commented 7 months ago

Test Enviroment: windows 10 ,msvc 2017,x86 If a new problem is found, where memory is only allocated but not released, the performance will be even lower. lower 10x sys malloc: 59000 mimalloc: 534000

I don't know Why? It's very simple test.

    {
        u32 time1 = getMicrosecondsCPU();
#pragma omp parallel for num_threads(6) // NEW ADD
        for(int i = 0; i < 1000000; i++)
        {
            void* pppp = mi_malloc(1024);
        }
        u32 time2 = getMicrosecondsCPU();
        std::cout << (time2 - time1) << std::endl;
    }
litterbug23 commented 7 months ago

Test Enviroment: CPU: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz , Windows 10 19045.3448

Recompile mimalloc x64, is faster than system malloc.

os malloc: 68000
mimalloc: 19000

Recompile mimalloc win32, is slower than system malloc.

os malloc: 59000 (ms)
mimalloc: 92000 (ms):

Why x64 is faster than x86 ?