ibogosavljevic / johnysswlab

119 stars 13 forks source link

array_of_pointer example behave wired #2

Open baiwfg2 opened 10 months ago

baiwfg2 commented 10 months ago

Hi. I like this article https://johnnysswlab.com/faster-hash-maps-binary-trees-etc-through-data-layout-modification/

I try it by self and find interesting results.

compile: clang++ -DLIKWID_PERFMON -O3 -std=c++17 -g array_of_pointers.cpp -I ../common -fopenmp

On fedora, clang 15 within virtualbox,

# ./a.out 32 (I tweak the size , 32 means size=32*1024*1024)
Found = 20335
Found = 20335
Found = 20335
Region std_variant
        CPU id = 0, count = 1, runtime 0.159694 s
                User time = 0.149849 s, system time = 0.004162 s
                Minor faults = 0, major faults = 0, context switches = 5
        STAT, cummulative runtime = 0.159694 s, min = 0.159694 s, avg = 0.159694 s, max = 0.159694 s
Region array_pointer_low_fragmentation
        CPU id = 0, count = 1, runtime 0.187114 s
                User time = 0.173766 s, system time = 0 s
                Minor faults = 0, major faults = 0, context switches = 9
        STAT, cummulative runtime = 0.187114 s, min = 0.187114 s, avg = 0.187114 s, max = 0.187114 s
Region array_pointer_high_fragmentation
        CPU id = 0, count = 1, runtime 2.76325 s 
                User time = 2.6361 s, system time = 0.012987 s
                Minor faults = 0, major faults = 0, context switches = 98
        STAT, cummulative runtime = 2.76325 s, min = 2.76325 s, avg = 2.76325 s, max = 2.76325 s

The results are very similar to yours in the article. But here is a.out 64:

# ./a.out 64 (I tweak the size , 64 means size=64*1024*1024)
Found = 40673
Found = 40673
Found = 40673
Region std_variant
        CPU id = 0, count = 1, runtime 0.392862 s
                User time = 0.387928 s, system time = 0.000268 s
                Minor faults = 0, major faults = 0, context switches = 12
        STAT, cummulative runtime = 0.392862 s, min = 0.392862 s, avg = 0.392862 s, max = 0.392862 s
Region array_pointer_low_fragmentation
        CPU id = 0, count = 1, runtime 10.2226 s  
                User time = 0.990382 s, system time = 6.61153 s
                Minor faults = 0, major faults = 492105, context switches = 2913
        STAT, cummulative runtime = 10.2226 s, min = 10.2226 s, avg = 10.2226 s, max = 10.2226 s
Region array_pointer_high_fragmentation
        CPU id = 0, count = 1, runtime 6.54839 s
                User time = 5.69053 s, system time = 0.432224 s
                Minor faults = 0, major faults = 33375, context switches = 408
        STAT, cummulative runtime = 6.54839 s, min = 6.54839 s, avg = 6.54839 s, max = 6.54839 s

The low fragmentation case is now much slower, even worse than the high fragmentation case. I also run it on Kali linux,clang 16, and it's the same result.

baiwfg2 commented 10 months ago

What could be the possible reason ? When size=32*1024*1024, neighboring pointers point to logically neighboring memory addresses. Isn't the case under size=64*1024*1024 ?

ibogosavljevic commented 9 months ago

The reason is weird. Look at time spent in system time. That is the kernel doing services for us. It should be very low, but in your case it's huge. My best guess is that your program has started to swap memory to disk. Use strace to understand what system calls are being done.