n0tknowing / chibicc

Personal fork of chibicc (Currently fixing the preprocessor and tokenizer)
MIT License
6 stars 0 forks source link

Macro expansion consumes GiB of memory #2

Closed n0tknowing closed 1 year ago

n0tknowing commented 1 year ago

Not surprised with the current compiler design. Here's result from memusage and perf when run https://github.com/swansontec/map-macro.

memusage results:

Memory usage summary: heap total: 1141447160, heap peak: 1141349037, stack peak: 4768
         total calls   total memory   failed calls
 malloc|         86          20485              0
realloc|         12             24              0  (nomove:11, dec:10, free:0)
 calloc|   69531994     1141426651              0
   free|         19          18744
Histogram for block sizes:
    0-15             72  <1% 
   16-31       69273163  99% ==================================================
   32-47           1615  <1% 
   48-63             78  <1% 
   64-79              3  <1% 
  112-127             5  <1% 
  128-143        257121  <1% 
  160-175             1  <1% 
  384-399             2  <1% 
  464-479             3  <1% 
  496-511            10  <1% 
  528-543             1  <1% 
  768-783             1  <1% 
 1024-1039            1  <1% 
 1536-1551            1  <1% 
 1920-1935            1  <1% 
 3072-3087            1  <1% 
 4096-4111            3  <1% 
 8192-8207           10  <1% 

For the slowness, it's mostly from the allocator when doing hideset stuff, perf reports:

Overhead  Command  Shared Object         Symbol
  45.06%  chibicc  libc.so.6             [.] _int_malloc
  25.37%  chibicc  libc.so.6             [.] __libc_calloc
   9.42%  chibicc  libc.so.6             [.] __mcount_internal
   6.68%  chibicc  libc.so.6             [.] _mcount
   5.70%  chibicc  chibicc               [.] new_hideset
   2.17%  chibicc  chibicc               [.] hideset_union
   1.17%  chibicc  [unknown]             [k] 0xffffffff986012b0
   1.06%  chibicc  libc.so.6             [.] alloc_perturb
   1.05%  chibicc  libc.so.6             [.] __strlen_sse2
   0.65%  chibicc  chibicc               [.] hideset_contains
   0.61%  chibicc  chibicc               [.] calloc@plt
   0.37%  chibicc  libc.so.6             [.] __strncmp_sse42
   0.07%  chibicc  chibicc               [.] copy_token
   0.06%  chibicc  libc.so.6             [.] sysmalloc
   0.06%  chibicc  chibicc               [.] expand_macro
   0.06%  chibicc  chibicc               [.] equal
   0.06%  chibicc  chibicc               [.] strlen@plt
   0.05%  chibicc  chibicc               [.] preprocess2
   0.04%  chibicc  chibicc               [.] add_hideset
   0.04%  chibicc  libc.so.6             [.] __memset_sse2_unaligned_erms
   0.03%  chibicc  libc.so.6             [.] __default_morecore@GLIBC_2.2.5
   0.02%  chibicc  libc.so.6             [.] __memset_sse2_unaligned
   0.02%  chibicc  chibicc               [.] fnv_hash
   0.01%  chibicc  chibicc               [.] strncmp@plt
   0.01%  chibicc  chibicc               [.] append
n0tknowing commented 1 year ago

It's also produces incomplete expansion (ignore broken newline printing):

char const *
foo_string ="foo"; char const *bar_string ="bar"; MAP1(STRING, baz, ()()(), ()()(), ()()(), 0)void function(int foo , int bar , MAP_LIST1(PARAM, baz, ()()(), ()()(), ()()(), 0));

putchar ('a'); putchar ('b'); MAP1(CALL, ('c'), ()()(), ()()(), ()()(), 0)

putchar ('a') , putchar ('b') , MAP_LIST1(CALL_LIST, ('c'), ()()(), ()()(), ()()(), 0);
clang results: ```c char const *foo_string = "foo"; char const *bar_string = "bar"; char const *baz_string = "baz"; void function(int foo , int bar , int baz); putchar ('a'); putchar ('b'); putchar ('c'); putchar ('a') , putchar ('b') , putchar ('c'); ```
GCC results: ```c char const *foo_string = "foo"; char const *bar_string = "bar"; char const *baz_string = "baz"; void function(int foo , int bar , int baz); putchar ('a'); putchar ('b'); putchar ('c'); putchar ('a') , putchar ('b') , putchar ('c'); ```
TCC results: ```c char const *foo_string = "foo"; char const *bar_string = "bar"; char const *baz_string = "baz"; void function(int foo , int bar , int baz); putchar ('a'); putchar ('b'); putchar ('c'); putchar ('a') , putchar ('b') , putchar ('c'); ```
n0tknowing commented 1 year ago

Just for curiosity, I did memusage on clang, GCC, and TCC:

clang: ``` Memory usage summary: heap total: 2106796, heap peak: 1786124, stack peak: 105200 total calls total memory failed calls malloc| 5592 1269959 0 realloc| 44 402684 0 (nomove:14, dec:1, free:0) calloc| 150 434153 0 free| 4181 482037 Histogram for block sizes: 0-15 132 2% === 16-31 1176 20% ============================= 32-47 2017 34% ================================================== 48-63 633 10% =============== 64-79 632 10% =============== 80-95 425 7% ========== 96-111 91 1% == 112-127 41 <1% = 128-143 78 1% = 144-159 6 <1% 160-175 5 <1% 192-207 65 1% = 208-223 9 <1% 224-239 2 <1% 240-255 4 <1% 256-271 12 <1% 288-303 1 <1% 304-319 1 <1% 320-335 4 <1% 336-351 1 <1% 352-367 16 <1% 384-399 46 <1% = 400-415 1 <1% 416-431 2 <1% 432-447 1 <1% 448-463 2 <1% 464-479 1 <1% 480-495 3 <1% 496-511 3 <1% 512-527 9 <1% 528-543 3 <1% 544-559 1 <1% 560-575 1 <1% 576-591 5 <1% 592-607 1 <1% 624-639 1 <1% 640-655 3 <1% 688-703 2 <1% 720-735 4 <1% 736-751 1 <1% 752-767 1 <1% 768-783 34 <1% 784-799 8 <1% 800-815 1 <1% 816-831 1 <1% 848-863 2 <1% 864-879 5 <1% 880-895 1 <1% 896-911 1 <1% 912-927 2 <1% 944-959 1 <1% 976-991 3 <1% 1024-1039 8 <1% 1040-1055 95 1% == 1056-1071 5 <1% 1072-1087 1 <1% 1088-1103 2 <1% 1104-1119 4 <1% 1120-1135 4 <1% 1152-1167 1 <1% 1168-1183 2 <1% 1200-1215 3 <1% 1216-1231 2 <1% 1248-1263 1 <1% 1296-1311 2 <1% 1360-1375 1 <1% 1440-1455 1 <1% 1504-1519 1 <1% 1536-1551 18 <1% 1664-1679 1 <1% 1680-1695 1 <1% 1696-1711 1 <1% 1792-1807 1 <1% 1824-1839 1 <1% 1952-1967 1 <1% 1984-1999 1 <1% 2048-2063 4 <1% 2320-2335 1 <1% 2960-2975 1 <1% 3040-3055 1 <1% 3072-3087 2 <1% 3200-3215 1 <1% 3312-3327 1 <1% 3664-3679 1 <1% 3984-3999 2 <1% 4080-4095 2 <1% 4096-4111 84 1% == 4608-4623 1 <1% 6112-6127 1 <1% 6144-6159 2 <1% 7360-7375 1 <1% 7952-7967 1 <1% 8160-8175 1 <1% 8192-8207 2 <1% 12256-12271 1 <1% 12288-12303 1 <1% 12304-12319 1 <1% 16320-16335 1 <1% 16384-16399 1 <1% 24544-24559 1 <1% 24576-24591 1 <1% 32816-32831 2 <1% 49120-49135 1 <1% 49152-49167 1 <1% large 7 <1% ```
GCC: ``` Memory usage summary: heap total: 36795442, heap peak: 927244, stack peak: 5248 total calls total memory failed calls malloc| 15298 33001178 0 realloc| 1168 3298984 0 (nomove:261, dec:0, free:0) calloc| 4791 495280 0 free| 17321 36118532 Histogram for block sizes: 0-15 1409 6% ================ 16-31 3487 16% ========================================= 32-47 40 <1% 48-63 4242 19% ================================================== 64-79 640 3% ======= 80-95 293 1% === 96-111 67 <1% 112-127 15 <1% 128-143 9 <1% 144-159 4 <1% 160-175 21 <1% 176-191 8 <1% 192-207 1633 7% =================== 208-223 19 <1% 224-239 5 <1% 240-255 6 <1% 256-271 11 <1% 272-287 2 <1% 288-303 2 <1% 304-319 12 <1% 320-335 4 <1% 336-351 4 <1% 352-367 2 <1% 368-383 2 <1% 384-399 3 <1% 400-415 2 <1% 416-431 3 <1% 432-447 2 <1% 448-463 2 <1% 464-479 3 <1% 480-495 2 <1% 496-511 2 <1% 512-527 2 <1% 528-543 83 <1% 544-559 3 <1% 560-575 83 <1% 576-591 83 <1% 592-607 2 <1% 608-623 83 <1% 624-639 164 <1% = 640-655 4 <1% 656-671 83 <1% 672-687 120 <1% = 688-703 2 <1% 704-719 39 <1% 720-735 83 <1% 736-751 4 <1% 752-767 2 <1% 768-783 39 <1% 784-799 4 <1% 800-815 87 <1% = 816-831 2 <1% 832-847 2 <1% 848-863 141 <1% = 864-879 2 <1% 880-895 77 <1% 896-911 85 <1% = 912-927 26 <1% 928-943 2 <1% 944-959 101 <1% = 960-975 4 <1% 976-991 24 <1% 992-1007 2 <1% 1008-1023 12 <1% 1024-1039 1587 7% ================== 1040-1055 6 <1% 1056-1071 2 <1% 1072-1087 2 <1% 1088-1103 2 <1% 1104-1119 2 <1% 1120-1135 56 <1% 1136-1151 2 <1% 1152-1167 21 <1% 1168-1183 2 <1% 1184-1199 8 <1% 1200-1215 3 <1% 1216-1231 5 <1% 1232-1247 2 <1% 1248-1263 4 <1% 1264-1279 2 <1% 1280-1295 2 <1% 1296-1311 4 <1% 1312-1327 2 <1% 1328-1343 3 <1% 1344-1359 2 <1% 1360-1375 2 <1% 1376-1391 2 <1% 1392-1407 3 <1% 1408-1423 2 <1% 1424-1439 4 <1% 1440-1455 3 <1% 1456-1471 2 <1% 1472-1487 2 <1% 1488-1503 2 <1% 1504-1519 2 <1% 1520-1535 3 <1% 1536-1551 3 <1% 1552-1567 2 <1% 1568-1583 2 <1% 1584-1599 2 <1% 1600-1615 4 <1% 1616-1631 2 <1% 1632-1647 2 <1% 1648-1663 2 <1% 1664-1679 2 <1% 1680-1695 3 <1% 1696-1711 2 <1% 1712-1727 2 <1% 1728-1743 2 <1% 1744-1759 2 <1% 1760-1775 2 <1% 1776-1791 2 <1% 1792-1807 2 <1% 1808-1823 2 <1% 1824-1839 2 <1% 1840-1855 2 <1% 1856-1871 2 <1% 1872-1887 2 <1% 1888-1903 2 <1% 1904-1919 2 <1% 1920-1935 3 <1% 1936-1951 2 <1% 1952-1967 2 <1% 1968-1983 2 <1% 1984-1999 2 <1% 2000-2015 4 <1% 2016-2031 2 <1% 2032-2047 2 <1% 2048-2063 1587 7% ================== 2064-2079 3 <1% 2080-2095 2 <1% 2096-2111 2 <1% 2112-2127 2 <1% 2128-2143 2 <1% 2144-2159 2 <1% 2160-2175 2 <1% 2176-2191 2 <1% 2192-2207 2 <1% 2208-2223 2 <1% 2224-2239 2 <1% 2240-2255 2 <1% 2256-2271 2 <1% 2272-2287 2 <1% 2288-2303 2 <1% 2304-2319 2 <1% 2320-2335 2 <1% 2336-2351 2 <1% 2352-2367 2 <1% 2368-2383 2 <1% 2384-2399 2 <1% 2400-2415 2 <1% 2416-2431 2 <1% 2432-2447 2 <1% 2448-2463 2 <1% 2464-2479 3 <1% 2480-2495 2 <1% 2496-2511 2 <1% 2512-2527 2 <1% 2528-2543 2 <1% 2544-2559 2 <1% 2560-2575 4 <1% 2576-2591 2 <1% 2592-2607 2 <1% 2608-2623 2 <1% 2624-2639 2 <1% 2640-2655 2 <1% 2656-2671 2 <1% 2672-2687 2 <1% 2688-2703 2 <1% 2704-2719 2 <1% 2720-2735 2 <1% 2736-2751 2 <1% 2752-2767 2 <1% 2768-2783 1 <1% 3024-3039 1 <1% 3200-3215 404 1% ==== 3840-3855 1 <1% 4064-4079 7 <1% 4080-4095 1 <1% 4096-4111 3 <1% 4192-4207 818 3% ========= 5760-5775 1 <1% 6000-6015 3 <1% 8032-8047 3103 14% ==================================== 8144-8159 1 <1% 8192-8207 2 <1% 8640-8655 1 <1% 9008-9023 1 <1% 32768-32783 2 <1% 45056-45071 1 <1% large 4 <1% ```
TCC: ``` Memory usage summary: heap total: 3695181, heap peak: 3187084, stack peak: 10192 total calls total memory failed calls malloc| 1892 3660565 0 realloc| 86 32752 0 (nomove:24, dec:0, free:0) calloc| 4 1864 0 free| 1944 3694157 Histogram for block sizes: 0-15 16 <1% 16-31 22 1% 32-47 10 <1% 48-63 3 <1% 64-79 2 <1% 128-143 1 <1% 160-175 2 <1% 256-271 1869 94% ================================================== 512-527 38 1% = 1024-1039 3 <1% 1104-1119 3 <1% 1712-1727 1 <1% 4096-4111 2 <1% 5200-5215 1 <1% 8176-8191 1 <1% 8192-8207 2 <1% 9296-9311 2 <1% 12288-12303 1 <1% large 3 <1% ```
n0tknowing commented 1 year ago

With 5976d6a32f86c7af5af8878f6dc8f9090597e283, it only consumes ~24MiB

Memory usage summary: heap total: 25386440, heap peak: 25288317, stack peak: 4768
         total calls   total memory   failed calls
 malloc|         86          20485              0
realloc|         12             24              0  (nomove:11, dec:10, free:0)
 calloc|     231743       25365931              0
   free|         19          18744
Histogram for block sizes:
    0-15             72  <1% 
   16-31          37704  16% =========
   32-47           1615  <1% 
   48-63             78  <1% 
   64-79              3  <1% 
  112-127             5  <1% 
  128-143        192329  82% ==================================================
  160-175             1  <1% 
  384-399             2  <1% 
  464-479             3  <1% 
  496-511            10  <1% 
  528-543             1  <1% 
  768-783             1  <1% 
 1024-1039            1  <1% 
 1536-1551            1  <1% 
 1920-1935            1  <1% 
 3072-3087            1  <1% 
 4096-4111            3  <1% 
 8192-8207           10  <1% 
n0tknowing commented 1 year ago

perf results:

Overhead  Command  Shared Object         Symbol
  21.62%  chibicc  libc.so.6             [.] __strlen_sse2
  21.06%  chibicc  libc.so.6             [.] __strncmp_sse42
  13.87%  chibicc  libc.so.6             [.] _int_malloc
  11.98%  chibicc  chibicc               [.] hideset_union
   8.06%  chibicc  chibicc               [.] preprocess2
   7.09%  chibicc  libc.so.6             [.] __libc_calloc
   3.74%  chibicc  libc.so.6             [.] __memset_sse2_unaligned_erms
   2.09%  chibicc  chibicc               [.] equal
   1.87%  chibicc  chibicc               [.] read_macro_arg_one
   1.47%  chibicc  [unknown]             [k] 0xffffffff9ae012b0
   1.29%  chibicc  libc.so.6             [.] __memset_sse2_unaligned
   1.23%  chibicc  chibicc               [.] strlen@plt
   1.20%  chibicc  chibicc               [.] get_entry
   1.05%  chibicc  chibicc               [.] strncmp@plt
   0.43%  chibicc  libc.so.6             [.] __memcmp_sse2
   0.26%  chibicc  chibicc               [.] calloc@plt
   0.24%  chibicc  libc.so.6             [.] alloc_perturb
   0.23%  chibicc  libc.so.6             [.] __cxa_finalize
   0.20%  chibicc  ld-linux-x86-64.so.2  [.] strcmp
   0.16%  chibicc  chibicc               [.] is_ident2
   0.15%  chibicc  libc.so.6             [.] sysmalloc
   0.13%  chibicc  chibicc               [.] tokenize