ocaml / flexdll

a dlopen-like API for Windows
Other
97 stars 30 forks source link

Implementing a less memory intensive read function #101

Closed EmmaJaneBonestell closed 1 year ago

EmmaJaneBonestell commented 2 years ago

When building the ocaml bibindings for Z3, it attempts to allocate ~10GiB before an oom error. This exceedingly high memory usage occurs while reading the import library (libz3.dll.a), and before the actual linker is called.

I don't know if there's a more efficient way to store these strings, or if it's possible to implement it such as to save results to disk and continue reading if memory usage is becoming too high. It seems very improbable that 33 GiB of strings are needed to make ~500KiB to 1.3MiB libraries.

The most pertinent information from the ctf is likely:

1391 samples of 10.4 GB allocations

 5.9 GB (57.0%) at Stdlib__Bytes.sub (bytes.ml:68:12-22, 68:12-22)
      including:
     2.9 GB via Coff.Lib.read_lib.read_member (97% of this)

 4.4 GB (42.4%) at Stdlib__Buffer.resize (buffer.ml:87:19-40)
      including:
     4.1 GB via Coff.Lib.read_lib.obj (56% of this)
     4.1 GB via Stdlib__Printf.ksprintf.k' (93% of this)
     4.1 GB via CamlinternalFormat.strput_acc (93% of this)
     4.1 GB via Stdlib__Buffer.add_string (93% of this)

The CTF is here, if you'd like to view it, though. https://gist.github.com/EmmaJaneBonestell/98fda00c93a8d09ea09bada08972761d

Edit: I used a GCP server, and it took about 33 GiB of RAM and a total of 46 GiB of allocations to build.

Edit 2: UCRT had nothing to do with it, but rather, it seems to have the excessive memory usage with any of the MSYS2 environments' built dlls. If the dll is built with Cygwin's MinGW toolchain, rather than MSYS2's MINGW toolchain, it doesn't have a problem. E.g. libz3.dll.a built with Cygwin's MinGW, vs MSYS2's, won't consume excessive RAM.

dra27 commented 1 year ago

Sorry for taking so long to get to this - is it possible to have a complete repro case for this? flexlink is creaking somewhat with more recent Windows link libraries as well (a significant amount of the time flexlink spends appears to be parsing the CRT and Windows .a files).

NielsMommen commented 1 year ago

I'm experiencing similar issues when trying to build OCaml bindings for z3 on cygwin (64 bit) with mingw64-x86_64 or mingw64-i686. The build fails with error: ** Fatal error: Error while reading libz3.dll.a: Out of memory. It works on my personal machine (16GB RAM) without any errors, but allocates over 40GB of heap memory.

Repro case

I made a minimal repro case at z3-ocaml-build. It is best to use it on a windows virtual machine that has limited amount of RAM and swap space or on Github Actions (The repository contains a workflow). I will refer to the root of the repository as z3-ocaml-build_dir.

Build

The repository contains a build.ps1 script that automates the compilation of z3 and all dependencies. Run this script as follows in powershell:

powershell.exe -ExecutionPolicy RemoteSigned -File .\build.ps1

This script does the following:

  1. Install cygwin with the required packages in z3-ocaml-build_dir/cygwin:
    • coreutils
    • make
    • mingw64-x86_64-gcc-g++
    • mingw64-x86_64-gmp
    • curl
    • python
  2. Build following packages in z3-ocaml-build_dir/prefix using the Makefile:
    • OCaml 4.14.0
    • Flexdll 0.41
    • Findlib 1.9.1
    • Zarith 1.12
    • Z3 4.11.2 The z3 build will fail at:
      ocamlmklib -o api/ml/z3ml  -I api/ml -L. api/ml/z3native_stubs.o api/ml/z3enums.cmo api/ml/z3native.cmo api/ml/z3.cmo  -lz3 -lstdc++ -cclib -static-libgcc -cclib -static-libstdc++ -ccopt -L$(ocamlfind printconf destdir)/stublibs -dllpath $(ocamlfind printconf destdir)/stublibs
      ** Fatal error: Error while reading libz3.dll.a: Out of memory

      Inspecting the error

      We can execute the failing command in a more vebose way as follows:

  3. Go to the bin directory of cygwin at z3-ocaml-build_dir/cygwin/bin and open bash.exe
  4. Inside bash, cd to z3-ocaml-build_dir:
    cd /cygdrive/c/z3-ocaml-build_dir
  5. Inside bash, add z3-ocaml-build_dir/cygwin/bin and z3-ocaml-build_dir/prefix/bin to PATH:
    export PATH=$PWD/prefix/bin:$PWD/cygwin/bin:$PATH
  6. Inside bash, cd to prefix/z3-z3-4.11.2/build:
    cd prefix/z3-z3-4.11.2/build

We can now run the command with the verbose flag and OCAMLRUNPARAM=v4,1024 to output GC statistics :

OCAMLRUNPARAM="v=4,1024"  ocamlmklib -verbose -o api/ml/z3ml  -I api/ml -L. api/ml/z3native_stubs.o api/ml/z3enums.cmo api/ml/z3native.cmo api/ml/z3.cmo  -lz3 -lstdc++ -cclib -static-libgcc -cclib -static-libstdc++ -ccopt -L$(ocamlfind printconf destdir)/stublibs -dllpath $(ocamlfind printconf destdir)/stublibs

Which produces following output on my virtual machine with 4GB of RAM:

flexlink -chain mingw64 -stack 33554432  -o api/ml\dllz3ml.dll api/ml/z3native_stubs.o    -L. -lz3 -lstdc++ -LC:/z3-ocaml-build_dir/prefix/lib/ocaml\flexdll
allocated_words: 1242910463
minor_words: 4503709
promoted_words: 1719120
major_words: 1240125874
minor_collections: 3967
major_collections: 22
heap_words: 954046976
heap_chunks: 54
top_heap_words: 954046976
compactions: 0
forced_major_collections: 0
allocated_words: 3936
minor_words: 3936
promoted_words: 0
major_words: 0
minor_collections: 0
major_collections: 0
heap_words: 126976
heap_chunks: 1
top_heap_words: 126976
compactions: 0
forced_major_collections: 0
Growing heap to 1472k bytes
Growing heap to 1952k bytes
Growing heap to 3600k bytes
Growing heap to 5856k bytes
...
Growing heap to 4900788k bytes
Growing heap to 5635908k bytes
Growing heap to 6481296k bytes
No room for growing heap
** Fatal error: Error while reading libz3.dll.a: Out of memory

Some notes

dra27 commented 1 year ago

This is fantastic, thanks @NielsMommen!

NielsMommen commented 1 year ago

Note that recently cygwin x86_64 is only supported.

NielsMommen commented 1 year ago

After some further investigation, I found out that MinGW binutils have changed after version 2.37. Import libraries (.dll.a) generated by ld through --out-impllib seem to have a different format, causing flexdll to consume a lot of memory to allocate buffers while reading all symbols.

Working build

As mentioned before, I'm able to build OCaml bindings for z3 on my old local cygwin installation that still has mingw64-i686-binutils version 2.37. Passing the -explain flag to flexdll while reading libz3.dll.a produces following output:

Explain output ``` Symbol _Z3_add_const_interp found in libz3.dll.a(d000001.o) Symbol __imp__Z3_add_const_interp found in libz3.dll.a(d000001.o) Symbol _Z3_add_func_interp found in libz3.dll.a(d000002.o) Symbol __imp__Z3_add_func_interp found in libz3.dll.a(d000002.o) Symbol _Z3_add_rec_def found in libz3.dll.a(d000003.o) Symbol __imp__Z3_add_rec_def found in libz3.dll.a(d000003.o) Symbol _Z3_algebraic_add found in libz3.dll.a(d000004.o) Symbol __imp__Z3_algebraic_add found in libz3.dll.a(d000004.o) Symbol _Z3_algebraic_div found in libz3.dll.a(d000005.o) Symbol __imp__Z3_algebraic_div found in libz3.dll.a(d000005.o) ... ```

Failing build

ld seems to produce a different dll.a format when building z3 with a newer (>2.37) version of binutils. Passing the -explain flag to flexdll while reading libz3.dll.a produces following output (piping the output to a file results in a file > 20GB):

Explain output ``` Scanning lib libz3.dll.a Symbol _Z3_add_const_interp found in libz3.dll.a(libz3_dll_d000001.o/ ) Symbol __imp__Z3_add_const_interp found in libz3.dll.a(libz3_dll_d000001.o/ ) Symbol _Z3_add_func_interp found in libz3.dll.a(libz3_dll_d000002.o/ libz3_dll_d000001.o/ ) Symbol __imp__Z3_add_func_interp found in libz3.dll.a(libz3_dll_d000002.o/ libz3_dll_d000001.o/ ) Symbol _Z3_add_rec_def found in libz3.dll.a(libz3_dll_d000003.o/ libz3_dll_d000002.o/ libz3_dll_d000001.o/ ) Symbol __imp__Z3_add_rec_def found in libz3.dll.a(libz3_dll_d000003.o/ libz3_dll_d000002.o/ libz3_dll_d000001.o/ ) Symbol _Z3_algebraic_add found in libz3.dll.a(libz3_dll_d000004.o/ libz3_dll_d000003.o/ libz3_dll_d000002.o/ libz3_dll_d000001.o/ ) Symbol __imp__Z3_algebraic_add found in libz3.dll.a(libz3_dll_d000004.o/ libz3_dll_d000003.o/ libz3_dll_d000002.o/ libz3_dll_d000001.o/ ) Symbol _Z3_algebraic_div found in libz3.dll.a(libz3_dll_d000005.o/ libz3_dll_d000004.o/ libz3_dll_d000003.o/ libz3_dll_d000002.o/ libz3_dll_d000001.o/ ) Symbol __imp__Z3_algebraic_div found in libz3.dll.a(libz3_dll_d000005.o/ libz3_dll_d000004.o/ libz3_dll_d000003.o/ libz3_dll_d000002.o/ libz3_dll_d000001.o/ ) ... ```

The difference is that each object is prefixed by libz3dll and the objects where each symbol is found keeps on growing for each symbol. This pattern keeps repeating: one entry for the first symbol, two entries for the second, ..., n entries for the last. This explains why flexdll is using a lot of heap memory.

Debugging coff.ml

When debugging flexdll, I noticed that the branch on line 954 is taken repeatedly in coff.ml#Lib#read_member: https://github.com/ocaml/flexdll/blob/47fbd3175369af6a63dd2c1fa6d9dfcbba66bc18/coff.ml#L946-L974

objdump

objdump -t libz3.dll.a confirms that objects are prefixed with libz3dll when built with binutils >2.37:

objdump - z3 built with binutils 2.37 ``` d031590.o: file format pe-i386 SYMBOL TABLE: [ 0](sec 1)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000000 .text [ 1](sec 2)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000000 .idata$7 [ 2](sec 3)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000000 .idata$5 [ 3](sec 4)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000000 .idata$4 [ 4](sec 5)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000000 .idata$6 [ 5](sec -1)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000001 @feat.00 [ 6](sec 1)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x00000000 __ZplRK8rationali [ 7](sec 3)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x00000000 __imp___ZplRK8rationali [ 8](sec 0)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x00000000 __head_libz3_dll d031589.o: file format pe-i386 SYMBOL TABLE: [ 0](sec 1)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000000 .text [ 1](sec 2)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000000 .idata$7 [ 2](sec 3)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000000 .idata$5 [ 3](sec 4)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000000 .idata$4 [ 4](sec 5)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000000 .idata$6 [ 5](sec -1)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000001 @feat.00 [ 6](sec 1)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x00000000 __ZplRK8rationalS1_ [ 7](sec 3)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x00000000 __imp___ZplRK8rationalS1_ [ 8](sec 0)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x00000000 __head_libz3_dll d031588.o: file format pe-i386 SYMBOL TABLE: [ 0](sec 1)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000000 .text [ 1](sec 2)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000000 .idata$7 [ 2](sec 3)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000000 .idata$5 [ 3](sec 4)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000000 .idata$4 [ 4](sec 5)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000000 .idata$6 [ 5](sec -1)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000001 @feat.00 [ 6](sec 1)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x00000000 __ZplRK16inf_int_rationalS1_ [ 7](sec 3)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x00000000 __imp___ZplRK16inf_int_rationalS1_ [ 8](sec 0)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x00000000 __head_libz3_dll ... ```
objdump - z3 built with binutils >2.37 ``` libz3_dll_d031590.o: file format pe-i386 SYMBOL TABLE: [ 0](sec 1)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000000 .text [ 1](sec 2)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000000 .idata$7 [ 2](sec 3)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000000 .idata$5 [ 3](sec 4)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000000 .idata$4 [ 4](sec 5)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000000 .idata$6 [ 5](sec -1)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000001 @feat.00 [ 6](sec 1)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x00000000 __ZplRK8rationali [ 7](sec 3)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x00000000 __imp___ZplRK8rationali [ 8](sec 0)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x00000000 __head_libz3_dll libz3_dll_d031589.o: file format pe-i386 SYMBOL TABLE: [ 0](sec 1)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000000 .text [ 1](sec 2)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000000 .idata$7 [ 2](sec 3)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000000 .idata$5 [ 3](sec 4)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000000 .idata$4 [ 4](sec 5)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000000 .idata$6 [ 5](sec -1)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000001 @feat.00 [ 6](sec 1)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x00000000 __ZplRK8rationalS1_ [ 7](sec 3)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x00000000 __imp___ZplRK8rationalS1_ [ 8](sec 0)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x00000000 __head_libz3_dll libz3_dll_d031588.o: file format pe-i386 SYMBOL TABLE: [ 0](sec 1)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000000 .text [ 1](sec 2)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000000 .idata$7 [ 2](sec 3)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000000 .idata$5 [ 3](sec 4)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000000 .idata$4 [ 4](sec 5)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000000 .idata$6 [ 5](sec -1)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x00000001 @feat.00 [ 6](sec 1)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x00000000 __ZplRK16inf_int_rationalS1_ [ 7](sec 3)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x00000000 __imp___ZplRK16inf_int_rationalS1_ [ 8](sec 0)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x00000000 __head_libz3_dll ... ```

objdump-z3_build_binutils_2.37.txt objdump-z3_build_binutils_2.39.txt

nojb commented 1 year ago

Thanks for the detective work @NielsMommen! Could you upload the failing libz3.dll.a somewhere? It would help debugging this issue. Thanks!

nojb commented 1 year ago

Thanks for the detective work @NielsMommen! Could you upload the failing libz3.dll.a somewhere? It would help debugging this issue. Thanks!

Never mind, I was able to to obtain it. I have a tentative fix over at https://github.com/nojb/flexdll/tree/longnames_newline. Can you try it? Thanks!

NielsMommen commented 1 year ago

Never mind, I was able to to obtain it. I have a tentative fix over at https://github.com/nojb/flexdll/tree/longnames_newline. Can you try it? Thanks!

Works like a charm! thanks!

nojb commented 1 year ago

Never mind, I was able to to obtain it. I have a tentative fix over at https://github.com/nojb/flexdll/tree/longnames_newline. Can you try it? Thanks!

Works like a charm! thanks!

Thanks for the confirmation! Submitted as PR now: #117