nim-lang / Nim

Nim is a statically typed compiled systems programming language. It combines successful concepts from mature languages like Python, Ada and Modula. Its design focuses on efficiency, expressiveness, and elegance (in that order of priority).
https://nim-lang.org
Other
16.54k stars 1.47k forks source link

Potential leak with threaded tlsf allocator #20542

Open guibar64 opened 2 years ago

guibar64 commented 2 years ago

What happened?

proc work() {.thread.} =
  var x = newSeq[int8](1000_000)

for i in 1..10:
  var th: Thread[void]
  createThread(th, work)
  joinThread(th)

shows higher than expected memory usage on Linux and errors when compiled with -g -d:nimAllocPagesViaMalloc --passc:-fsanitize=address --passl:-fsanitize=address

Nim Version

Nim Compiler Version 1.7.3 [Linux: amd64] Compiled at 2022-10-11 Copyright (c) 2006-2022 by Andreas Rumpf

git hash: 5602183234f59ece4fd668915da848f0753cbbb9 active boot switches: -d:release

Current Standard Output Logs

=================================================================
==11538==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 10281020 byte(s) in 20 object(s) allocated from:
    #0 0x7fe6b6ea0a89 in __interceptor_malloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:69
    #1 0x55f66a0fd7a7 in osTryAllocPages__system_5864 /home/user/Nim/lib/system/osalloc.nim:36
    #2 0x55f66a117f3b  (/home/user/Nim/bug_threaded_tlsf+0x27f3b)

SUMMARY: AddressSanitizer: 10281020 byte(s) leaked in 20 allocation(s).

Expected Standard Output Logs

No response

Possible Solution

No response

Additional Information

No error with commit b47d5486 (pre- threaded tlsf)

Araq commented 2 years ago

No error with commit b47d5486 (pre- threaded tlsf)

Are you certain of this? The threaded TLSF should not kick in here.

Araq commented 2 years ago

Cannot reproduce.

guibar64 commented 2 years ago

Are you certain of this? The threaded TLSF should not kick in here.

Yes, I just re-checked (linux). On a Windows box, I could reproduce with drmemory (could not get sanitizers to work there). Output:

Dr. Memory version 2.5.19232 build 0 built on Sep  3 2022 04:08:29
Windows version: WinVer=105;Rel=2009;Build=19044;Edition=Professional
Dr. Memory results for pid 13392: "bug.exe"
Application cmdline: ".\bug.exe"
Recorded 124 suppression(s) from default C:\Users\user\scoop\apps\drmemory\current\bin64\suppress-default.txt

Error #1: UNADDRESSABLE ACCESS beyond top of stack: reading 0x00000091fb5ff880-0x00000091fb5ff888 8 byte(s)
# 0 ___chkstk_ms                            [C:/Users/user/Nim/lib/system/threadlocalstorage.nim:292]
# 1 _pei386_runtime_relocator               [C:/Users/user/Nim/lib/system/threadlocalstorage.nim:292]
# 2 __tmainCRTStartup
# 3 .l_start
# 4 KERNEL32.dll!BaseThreadInitThunk
Note: @0:00:00.454 in thread 8660
Note: 0x00000091fb5ff880 refers to 856 byte(s) beyond the top of the stack 0x00000091fb5ffbd8
Note: instruction: or     $0x0000000000000000 (%rcx) -> (%rcx)

Error #2: POSSIBLE LEAK 27 direct bytes 0x000002107d1b01c0-0x000002107d1b01db + 0 indirect bytes
# 0 replace_malloc                    [D:\a\drmemory\drmemory\common\alloc_replace.c:2580]
# 1 msvcrt.dll!realloc               +0x193    (0x00007ff935649f44 <msvcrt.dll+0x19f44>)
# 2 msvcrt.dll!unlock                +0x40c    (0x00007ff93566b68d <msvcrt.dll+0x3b68d>)
# 3 msvcrt.dll!_getmainargs          +0x30     (0x00007ff935637a01 <msvcrt.dll+0x7a01>)
# 4 pre_cpp_init
# 5 msvcrt.dll!initterm              +0x42     (0x00007ff93566a553 <msvcrt.dll+0x3a553>)
# 6 __tmainCRTStartup
# 7 .l_start
# 8 KERNEL32.dll!BaseThreadInitThunk

Error #3: LEAK 8195 direct bytes 0x000002107d1b4530-0x000002107d1b6533 + 0 indirect bytes
# 0 replace_malloc                              [D:\a\drmemory\drmemory\common\alloc_replace.c:2580]
# 1 osAllocPages__system_5894                   [C:/Users/user/Nim/lib/system/osalloc.nim:36]
# 2 llAlloc__system_6170                        [C:/Users/user/Nim/lib/system/alloc.nim:284]
# 3 intSetPut__system_6295                      [C:/Users/user/Nim/lib/system/alloc.nim:363]
# 4 incl__system_6346                           [C:/Users/user/Nim/lib/system/alloc.nim:377]
# 5 getBigChunk__system_6618                    [C:/Users/user/Nim/lib/system/alloc.nim:655]
# 6 rawAlloc__system_6821                       [C:/Users/user/Nim/lib/system/alloc.nim:877]
# 7 alloc__system_7024                          [C:/Users/user/Nim/lib/system/alloc.nim:1033]
# 8 allocImpl__system_1762                      [C:/Users/user/Nim/lib/system/alloc.nim:1109]
# 9 allocSharedImpl                             [C:/Users/user/Nim/lib/system/alloc.nim:1165]
#10 allocShared0Impl__system_1777               [C:/Users/user/Nim/lib/system/alloc.nim:1167]
#11 alignedAlloc0__system_1934                  [C:/Users/user/Nim/lib/system/memalloc.nim:351]

Error #4: LEAK 1019907 direct bytes 0x000002107d5b0030-0x000002107d6a9033 + 0 indirect bytes
# 0 replace_malloc                              [D:\a\drmemory\drmemory\common\alloc_replace.c:2580]
# 1 osAllocPages__system_5894                   [C:/Users/user/Nim/lib/system/osalloc.nim:36]
# 2 requestOsChunks__system_6529                [C:/Users/user/Nim/lib/system/alloc.nim:477]
# 3 getBigChunk__system_6618                    [C:/Users/user/Nim/lib/system/alloc.nim:641]
# 4 rawAlloc__system_6821                       [C:/Users/user/Nim/lib/system/alloc.nim:877]
# 5 alloc__system_7024                          [C:/Users/user/Nim/lib/system/alloc.nim:1033]
# 6 allocImpl__system_1762                      [C:/Users/user/Nim/lib/system/alloc.nim:1109]
# 7 allocSharedImpl                             [C:/Users/user/Nim/lib/system/alloc.nim:1159]
# 8 allocShared0Impl__system_1777               [C:/Users/user/Nim/lib/system/alloc.nim:1167]
# 9 alignedAlloc0__system_1934                  [C:/Users/user/Nim/lib/system/memalloc.nim:351]
#10 newSeqPayload                               [C:/Users/user/Nim/lib/system/seqs_v2.nim:38]
#11 prepareSeqAdd                               [C:/Users/user/Nim/lib/system/seqs_v2.nim:57]

===========================================================================
FINAL SUMMARY:

DUPLICATE ERROR COUNTS:
    Error #   1:      2
    Error #   3:     10
    Error #   4:     10

SUPPRESSIONS USED:

ERRORS FOUND:
      1 unique,     2 total unaddressable access(es)
      0 unique,     0 total uninitialized access(es)
      0 unique,     0 total invalid heap argument(s)
      0 unique,     0 total GDI usage error(s)
      0 unique,     0 total handle leak(s)
      0 unique,     0 total warning(s)
      2 unique,    20 total, 10281020 byte(s) of leak(s)
      1 unique,     1 total,     27 byte(s) of possible leak(s)
ERRORS IGNORED:
      1 potential error(s) (suspected false positives)
         (details: C:\Users\user\scoop\apps\drmemory\current\drmemory\logs\DrMemory-bug.exe.13392.000\potential_errors.txt)
      4 unique,     4 total,     66 byte(s) of still-reachable allocation(s)
         (re-run with "-show_reachable" for details)
Details: C:\Users\user\scoop\apps\drmemory\current\drmemory\logs\DrMemory-bug.exe.13392.000\results.txt

With b47d5486, last lines of ouput:

ERRORS FOUND:
      1 unique,     2 total unaddressable access(es)
      0 unique,     0 total uninitialized access(es)
      0 unique,     0 total invalid heap argument(s)
      0 unique,     0 total GDI usage error(s)
      0 unique,     0 total handle leak(s)
      0 unique,     0 total warning(s)
      0 unique,     0 total,      0 byte(s) of leak(s)
      4 unique,     4 total, 1556516 byte(s) of possible leak(s)
ERRORS IGNORED:
      1 potential error(s) (suspected false positives)
         (details: C:\Users\user\scoop\apps\drmemory\current\drmemory\logs\DrMemory-bug.exe.9220.000\potential_errors.txt)
      4 unique,     4 total,     66 byte(s) of still-reachable allocation(s)
         (re-run with "-show_reachable" for details)

I think the last “possible leak” is a false positive, but I guess the 10281020 byte(s) of leak(s) in the previous ouput is the same thing found on linux.

Araq commented 2 years ago

For your test program the new code paths for "foreign thread deallocs" are not triggered at all, so afaict whatever you're measuring is something that existed for a long time.

Araq commented 2 years ago

Btw the allocator does not return all memory back to the OS, so if you use -d:nimAllocPagesViaMalloc there are indeed lots of missing "free" calls. Always have been.

guibar64 commented 2 years ago

In fact, initially I observed a very large memory consumption on a program, afterward I tried -d:nimAllocPagesViaMalloc, but OK, not reliable here.

I know it is not very reliable, but this is the maxresident memory I get with /usr/bin/time varying the number of iters in the loop:

With 100 iterations

With 1000 iterations:

planetis-m commented 2 years ago

Do you have a minimal reproducible example? Because there is no leak in the one you posted.

guibar64 commented 2 years ago

@planetis-m The original example is the minimal reproducible example I came up with, still does not work for me.

FWIW, a less minimum example

const nthreads = 8
var th: array[nthreads, Thread[void]]

proc work() {.thread.} =
  var x = newSeq[int8](1000_000)

proc test() =
  for i in 0..<nthreads:
    createThread(th[i], work)
  joinThreads(th)

for i in 1..100:
  test()

Tested on: last devel nim (57574eaf), os=Linux, cpu=amd64

Maybe it is OS/arch specific ?

Edit

I found an old macos VM. Interestingly the sanitizer does not fire up, but the max RSS is growing with the number of loops. Some numbers (Unit is kB):

nb loops 57574eaf b47d5486
1 1810 1826
10 10919 1855
100 101650 1871
1000 834641 1872
2000 865850 1888

I get the same qualitative behavior on Linux.

Edit About sanitizer on MacOs, it does not work by default apparently: https://stackoverflow.com/a/69616797.

planetis-m commented 2 years ago

Can't reproduce on Linux with clang. I compiled with -d:nimAllocStats

dumpAllocstats:
  for i in 1..100:
    test()

Outputs: (allocCount: 800, deallocCount: 800) But sometimes the numbers don't match! I even got (allocCount: 799, deallocCount: 800). Sorry I don't know how to debug this except that LeakSanitizer reports no leaks with -d:useMalloc.

guibar64 commented 2 years ago

@planetis-m Yes, there is no issue with -d:useMalloc, the issue would be at the level of the allocator/threads, not with memory management. Hence the trick with -d:nimAllocPagesViaMalloc in my first post.

Anyway, I am beginning to understand what happens here. Before (thread-local world), each thread cleaned up their OS pages, now with a shared heap this is no longer an option, as memory can be moved between threads. So the os-allocated memory they own hangs around.

planetis-m commented 2 years ago

But sometimes the numbers don't match! I even got (allocCount: 799, deallocCount: 800)

The global alloc counters are incremented without any locks lib/system/memalloc.nim:217 that's why the numbers don't match.