Open orividrisk opened 1 year ago
@saudet FYI
Ran a few parser tests under valgrind with no memory leaks found on this side.
Can you write the above test program in C, compile it, and run under valgrind --leak-check=full
to see if any leaks can be reproduced without roping in any of the JNI bindings?
Not clear on how javacpp-presets interacts with the C functions (it does not use the official jpostal/JNI bindings, instead generating its own automatic JNI bindings from the C API), and whether it's using latest or a tag. If it's not using the latest version of libpostal, would recommend trying that as there were some SSE fixes recently, but none of those were related to a memory leak.
Also noting that libpostal is not thread-safe so should only be used from a single thread.
@albarrentine thank you for the quick response.
My C is quite rusty so I don't think I will be able to provide the C equivalent of my test.
I wrote a very simple program using the javacpp-presets presets, which easily reproduces the issue.
This program upon exiting, produces a leak reports using the macOS leaks util( as unfortunately valgrind
is no longer supported on more advanced versions of macos).
The program can be found in the following github repo , with instruction in the ReadMe file of how to run it.
In the memory leak report I found these two reoccurring stack trace leaks :
Leak: 0x7fcb71281b90 size=16 zone: MallocStackLoggingLiteZone_0x104f34000 malloc in cstring_array_to_strings C libpostal.1.dylib length: 4 "unit" Call stack: 0x11627653a (???) ??? | 0x104e9130c (libjnipostal.dylib) Java_org_bytedeco_libpostal_global_postal_libpostal_1parse_1address__Lorg_bytedeco_javacpp_BytePointer_2Lorg_bytedeco_libpostal_libpostal_1address_1parser_1options_1t_2 | 0x12f57815c (libpostal.1.dylib) libpostal_parse_address | 0x12f59be9b (libpostal.1.dylib) address_parser_parse | 0x12f5904c7 (libpostal.1.dylib) cstring_array_to_strings | 0x7ff816f73a43 (libsystem_c.dylib) strdup | 0x7ff816f19149 (libsystem_malloc.dylib) _malloc_zone_malloc_instrumented_or_legacy
Leak: 0x7fcb71284780 size=32 zone: MallocStackLoggingLiteZone_0x104f34000 malloc in address_parser_parse C libpostal.1.dylib Call stack: 0x7ff8170c6bd3 (libsystem_pthread.dylib) thread_start | 0x7ff8170cb1d3 (libsystem_pthread.dylib) _pthread_start | 0x104e21529 (libjli.dylib) ThreadJavaMain | 0x104e1ec4f (libjli.dylib) JavaMain | 0x1066742d7 (libjvm.dylib) jni_CallStaticVoidMethod | 0x106670e58 (libjvm.dylib) jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, JavaThread*) | 0x10660cb26 (libjvm.dylib) JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*) | 0x116269cc9 (???) ??? | 0x10ef080dc (???) ??? | 0x1168444a0 (???) ??? | 0x104e9130c (libjnipostal.dylib) Java_org_bytedeco_libpostal_global_postal_libpostal_1parse_1address__Lorg_bytedeco_javacpp_BytePointer_2Lorg_bytedeco_libpostal_libpostal_1address_1parser_1options_1t_2 | 0x12f57815c (libpostal.1.dylib) libpostal_parse_address | 0x12f59bd25 (libpostal.1.dylib) address_parser_parse | 0x7ff816f19149 (libsystem_malloc.dylib) _malloc_zone_malloc_instrumented_or_legacy
it looks like in both cases it is related to the address_parser_parse
method in the libpostal library.
@saudet for clarification could you please shed some light as to how the JNI binding works including the used libostal version, and if the stack trace above could be related to this memory leak issue?
In addition I attached bellow a java NTM report which shows that the memory leak is not relevant to the JVM managed memory space, and also the report provided by the macOS leaks util
Please let me know if you need any more information.
@orividrisk Your small example is pretty big. If you're serious about debugging this, reduce it down to only a few lines.
@orividrisk https://github.com/orividrisk/libpostal-poc/blob/bd583351594b729dfa2bb89c6fac8f4ffb67c37d/src/main/java/com/example/test/Address.java#L66 you’re not calling libpostal_address_parser_response_destroy to free the memory allocated in that line, which explains both 1 and 2.
@albarrentine adding libpostal_address_parser_response_destroy
did the trick. the memory leak related to libpostal_parse_address
was resolved. thank you!
In our production env we are using both address normalization & parsing which both presented with memory leak issue. In order to isolate the memory issue I started with address parsing. now that it was resolved I moved on to normalization which also present with memory leak issue.
The following is a call stack example of a leak for the macOS leaks util:
Leak: 0x60000236a580 size=64 zone: DefaultMallocZone_0x107ab9000 realloc in utf8_case C libpostal.1.dylib length: 51 "1142 bellview dr wisconsin rapids wi 54494-9293 usa" Call stack: 0x7ff8170c6bd3 (libsystem_pthread.dylib) thread_start | 0x7ff8170cb1d3 (libsystem_pthread.dylib) _pthread_start | 0x107adc529 (libjli.dylib) ThreadJavaMain | 0x107ad9c4f (libjli.dylib) JavaMain | 0x1093112d7 (libjvm.dylib) jni_CallStaticVoidMethod | 0x10930de58 (libjvm.dylib) jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, JavaThread*) | 0x1092a9b26 (libjvm.dylib) JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*) | 0x118f06cc9 (???) ??? | 0x111ac4ae4 (???) ??? | 0x1194c628f (???) ??? | 0x107b5292c (libjnipostal.dylib) Java_org_bytedeco_libpostal_global_postal_libpostal_1normalize_1string__Lorg_bytedeco_javacpp_BytePointer_2J | 0x132592d59 (libpostal.1.dylib) normalize_string_latin_languages | 0x132592b4e (libpostal.1.dylib) normalize_string_utf8_languages | 0x13258cd30 (libpostal.1.dylib) utf8_case | 0x7ff816efcde6 (libsystem_malloc.dylib) realloc
For the normalized address I didn't see a method to "destroy" the returned normalized string.
I updated the example github repo to now perform both parsing and normalization. @saudet please let me know what you think ( I updated the example to be as concise as possible)
The README instructions in the repo can be used to reproduce the issue, with the detailed leaks instances report printed out to console after the process concludes it's run.
normalized address leak report.txt
Please let me know if you need any more information.
@orividrisk there's a corresponding function libpostal_expansion_array_destroy(expansions, num_expansions)
which cleans up memory for that response. This is used in the expander cli program which builds with libpostal. I'm not seeing any memory leaks for that input either.
Hi, When using the javacpp-presets-libpostal libpostal 1.1-1.5.8 I experience a memory leak in my java process. My code is iterating on a large number of addresses, and over time the java process memory is increasing until it crashes due to OOM. I have been monitored the jvm heap and non heap memory metrics and there is no issue there. This leads me to the conclusion that the issue is with the native allocation of the returned address response, which is not deallocating the parsed addresses. I created an issue in the
javacpp-presets
repo (mentioned in the libpostal readme), where they replied that it could be a potential libpostal c library memory leak: https://github.com/bytedeco/javacpp-presets/issues/1407The following is my code (I am using the default options provided by the library):
If any more information is needed I would be happy to provide it. I would very appreciate your prompt assistance with the issue as it is having a big impact on our production env.
Thank you and have a very nice day.