idris-lang / Idris2

A purely functional programming language with first class types
https://idris-lang.org/
Other
2.51k stars 374 forks source link

Memory leak in idris2_readLine #2552

Closed dunhamsteve closed 2 years ago

dunhamsteve commented 2 years ago

Idris appears to be leaking memory in FFI on scheme. This was first noticed because the LSP process was growing as I was working on Idris. Repeatedly reloading a touched Desugar.idr caused the process to grow in size. Further investigation revealed that both the leaks app on macos and valgrind on linux report that memory is leaked by the delim function in iogetdelim.c. I believe the malloc'd string needs to be free'd

Steps to Reproduce

In the Idris2 source directory:

On Linux run:

valgrind --leak-check=full "$HOME/.idris2/bin/idris2_app/idris2.so" --find-ipkg src/Idris/Desugar.idr < /dev/null

On MacOS:

MallocStackLogging=1 idris2 --find-ipkg src/Idris/Desugar.idr

and while it is running, run leaks with the PID.

Expected Behavior

Expect no leaks to be reported:

Observed Behavior

MacOS reports:

leaks Report Version: 4.0, multi-line stacks
Process 86396: 79532 nodes malloced for 79325 KB
Process 86396: 79229 leaks for 81130496 total leaked bytes.

STACK OF 34906 INSTANCES OF 'ROOT LEAK: <malloc>':
7   dyld                                  0x231e3fc10 start + 2368
6   scheme                                0x1046ec088 main + 2512
5   scheme                                0x10478002c run_script + 296
4   scheme                                0x10477fc64 boot_call + 44
3   ???                                   0x107929e60 0x7fffffffffffffff + 9223372041276792417
2   libidris2_support.dylib               0x104b5a4f4 idris2_readLine + 32
1   libidris2_support.dylib               0x104b5a0b4 getdelim + 72
0   libsystem_malloc.dylib                0x1a58ef42c _malloc_zone_malloc + 272

Linux reports:

==10369== LEAK SUMMARY:
==10369==    definitely lost: 8,729,128 bytes in 72,582 blocks
==10369==    indirectly lost: 0 bytes in 0 blocks
==10369==      possibly lost: 17,880 bytes in 149 blocks
==10369==    still reachable: 1,967 bytes in 8 blocks
==10369==         suppressed: 0 bytes in 0 blocks
==10369== Reachable blocks (those to which a pointer was found) are not shown.
==10369== To see them, rerun with: --leak-check=full --show-leak-kinds=all
CodingCellist commented 2 years ago

Looking at support/c/getline.c, it seems this code isn't ours; it was copied from a 2011 version of a file included with NetBSD. I wonder if the issue could be fixed by swapping in a more recent version of the same library (for example, OpenBSD's getdelim.c and getline.c)?

dunhamsteve commented 2 years ago

The man page says that the libc version of getline mallocs a buffer if null, and expects the caller to free it. Looks like openbsd requires a buffer to be passed in and gives an error otherwise. It looks like the Idris support code expects the caller to free it:

https://github.com/idris-lang/Idris2/blob/e856569d16659b43553dd92cfff9cd064222c9f7/support/c/idris_file.c#L150

Chez (9.5.7 on mac) does not seem to be freeing it though. I tested Linux with the latest (9.5.9) above. Maybe older versions of Chez free void* pointers, but I didn't see anything in the Chez release notes. I will retest on Linux with the 9.5 chez that ubuntu distributes.

dunhamsteve commented 2 years ago

It's reproducing for me with the 9.5 chez in ubuntu 20.04.3

dunhamsteve commented 2 years ago

This quick and dirty patch fixes it, but will probably break other backends. It is just to test if freeing the pointer works.

We should check if it is also an issue on racket. That pointer has to be passed back from C to scheme with a string declaration before the underlying buffer is freed. If we could get chez to manage the void* that would be nice. Otherwise, we might want to change prim__readLine to return string and take care of business in the scheme side of the support function.

diff --git a/libs/base/System/File/ReadWrite.idr b/libs/base/System/File/ReadWrite.idr
index 4d7030e0..f741d899 100644
--- a/libs/base/System/File/ReadWrite.idr
+++ b/libs/base/System/File/ReadWrite.idr
@@ -8,6 +8,7 @@ import System.File.Handle
 import public System.File.Error
 import System.File.Support
 import public System.File.Types
+import System.FFI

 %default total

@@ -61,7 +62,10 @@ fGetLine (FHandle f)
     = do res <- primIO (prim__readLine f)
          if prim__nullPtr res /= 0
             then returnError
-            else ok (prim__getString res)
+            else do
+              let s = prim__getString res
+              free $ prim__forgetPtr res
+              ok s

 ||| Get a number of characters from the given file handle.
 |||
melted commented 2 years ago

It feels a bit weird to have those primitives implemented in C instead of calling out to the Scheme backend. I guess it would have to be fixed for the RefC backend either way, so it wouldn't be easier to move it to Scheme, I don't remember, was performance the reason or was there some other consideration?

dunhamsteve commented 2 years ago

Racket version 8.4 [cs] has the same issue. I suspect every scheme has the issue since they can't know the difference between prim__getEnv (should never be freed) and prim__readLine (should always be freed), both of which return a PrimIO (Ptr String). I think it is Ptr String rather than String to allow nulls.

Process 17101: 168717 leaks for 172776368 total leaked bytes.

STACK OF 138089 INSTANCES OF 'ROOT LEAK: <malloc>':
7   dyld                                  0x231e3fc10 start + 2368
6   idris2-boot                           0x100aeabec main + 1712
5   idris2-boot                           0x100aeb2d0 racket_boot + 1512
4   idris2-boot                           0x100b7b004 Scall2 + 120
3   ???                                   0x10acc0900 0x7fffffffffffffff + 9223372041330886913
2   libidris2_support.dylib               0x105c664f4 idris2_readLine + 32
1   libidris2_support.dylib               0x105c660b4 getdelim + 72
0   libsystem_malloc.dylib                0x1a58ef42c _malloc_zone_malloc + 272