herumi / xbyak

A JIT assembler for x86/x64 architectures supporting MMX, SSE (1-4), AVX (1-2, 512), FPU, APX, and AVX10.2
BSD 3-Clause "New" or "Revised" License
2.05k stars 278 forks source link

Allocator: take optional name parameter and use it with memfd #140

Closed cota closed 2 years ago

cota commented 2 years ago

By initializing an allocator with a specific name, we can allow users to define names of their memfd-allocated memory regions. This can help users obtain more meaningful profiles with tools such as perf(1).

For example, take oneDNN, which instantiates a CodeGenerator for each operation. Before this patch, from perf(1) we cannot discern which oneDNN operations were executed since they are all assigned to "xbyak":

$ perf report -i /tmp/before.perf --stdio | egrep '(Overhead|xbyak)' | head -10
# Overhead  Command          Shared Objec             Symbol
     0.50%  benchmark        memfd:xbyak (deleted)    [.] 0x00000000000000cc
     0.49%  benchmark        memfd:xbyak (deleted)    [.] 0x00000000000000e0
     0.47%  benchmark        memfd:xbyak (deleted)    [.] 0x00000000000000f4
     0.38%  benchmark        memfd:xbyak (deleted)    [.] 0x00000000000000ba
     0.29%  benchmark        memfd:xbyak (deleted)    [.] 0x0000000000000139
     0.28%  benchmark        memfd:xbyak (deleted)    [.] 0x00000000000001e8
     0.27%  benchmark        memfd:xbyak (deleted)    [.] 0x00000000000001ad
     0.26%  benchmark        memfd:xbyak (deleted)    [.] 0x0000000000000172
     0.20%  benchmark        memfd:xbyak (deleted)    [.] 0x000000000000029c
     0.19%  benchmark        memfd:xbyak (deleted)    [.] 0x0000000000000234

After this patch (and after updating oneDNN to name allocators with the appropriate name, i.e. "/oneDNN:$op"), we can:

$ perf report -i /tmp/after.perf --stdio | egrep '(Overhead|oneDNN)' | head -10
# Overhead  Command          Shared Objec                                               Symbol
     0.47%  benchmark        oneDNN:jit_avx_gemv_t_f32_kern (deleted)                   [.] 0x00000000000000e0
     0.47%  benchmark        oneDNN:jit_avx_gemv_t_f32_kern (deleted)                   [.] 0x00000000000000cc
     0.44%  benchmark        oneDNN:jit_avx_gemv_t_f32_kern (deleted)                   [.] 0x00000000000000f4
     0.33%  benchmark        oneDNN:jit_avx_gemv_t_f32_kern (deleted)                   [.] 0x00000000000000ba
     0.25%  benchmark        oneDNN:inner_product_utils::jit_pp_kernel_t (deleted)      [.] 0x0000000000000139
     0.22%  benchmark        oneDNN:inner_product_utils::jit_pp_kernel_t (deleted)      [.] 0x0000000000000172
     0.22%  benchmark        oneDNN:inner_product_utils::jit_pp_kernel_t (deleted)      [.] 0x00000000000001ad
     0.21%  benchmark        oneDNN:inner_product_utils::jit_pp_kernel_t (deleted)      [.] 0x00000000000001e8
     0.17%  benchmark        oneDNN:jit_avx512_common_gemm_f32_xbyak_gemm (deleted)     [.] 0x0000000000000234
     0.14%  benchmark        oneDNN:jit_avx512_core_gemm_bf16bf16f32_kern (deleted)     [.] 0x000000000000087a
herumi commented 2 years ago

Thank you for the patch. I modified it to minimize the difference and to avoid increasing the args of CodeGenerator. https://github.com/herumi/xbyak/compare/master...dev

Here is a sample to use it.

>cat t.cpp
#define XBYAK_USE_MEMFD
#include <xbyak/xbyak.h>

class Code : Xbyak::MmapAllocator, public Xbyak::CodeGenerator {
public:
    Code(const char *name, int v)
        : Xbyak::MmapAllocator(name)
        , Xbyak::CodeGenerator(4096, nullptr, this)
    {
        mov(eax, v);
        ret();
    }
};

int main()
{
    Code c1("abc", 123);
    Code c2("xyz", 456);
    printf("c1 %d\n", c1.getCode<int (*)()>()());
    printf("c2 %d\n", c2.getCode<int (*)()>()());
    getchar();
}
xbyak%> g++ t.cpp -I ./ ./a.out
xbyak%> cat /proc/`pidof ./a.out`/maps
7f9b0f763000-7f9b0f764000 rwxs 00000000 00:01 1669965                    /memfd:xyz (deleted)
7f9b0f764000-7f9b0f765000 r--p 00000000 08:02 47722023                   /usr/lib/x86_64-linux-gnu/ld-2.31.so
7f9b0f765000-7f9b0f788000 r-xp 00001000 08:02 47722023                   /usr/lib/x86_64-linux-gnu/ld-2.31.so
7f9b0f788000-7f9b0f790000 r--p 00024000 08:02 47722023                   /usr/lib/x86_64-linux-gnu/ld-2.31.so
7f9b0f790000-7f9b0f791000 rwxs 00000000 00:01 1669964                    /memfd:abc (deleted)

How about this?

cota commented 2 years ago

Works for me, thanks! I have squashed your changes onto mine and updated the commit. I've also added your example usage to the commit log, which I like better than mine since it is self-contained.

herumi commented 2 years ago

I made a sample/memfd.cpp.