openvenues / libpostal

A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
MIT License
4.08k stars 421 forks source link

Memory leak in expand_alternative_phrase_option #636

Open fangpings opened 1 year ago

fangpings commented 1 year ago

Hi we have a service that runs AddressExpander java binding. The pods get restarted due to OOM once in a while and we did some analysis on possible memory leak. It turns out that there might be memory leak in expand_alternative_phrase_option function, as shown in the jemalloc analysis output here app-profiling 439

Any idea how we can solve this issue?

albarrentine commented 1 year ago

Ran some expander tests under valgrind, and am not seeing any memory leaks on the C side. The code complexity in expand_address is greater than I'd prefer, so depending on the input (language, script, transliteration) there are a number of different paths it can take. Not seeing anything immediately apparent in expand_alternative_phrase_option but would need a reproducible test case.

It's best if you can reduce the problem to a single input. In order to conclude whether there's a leak in libpostal itself vs. somewhere else, it should be possible to run (without Java, just use the native cli program that builds as part of libpostal after running make), valgrind --leak-check=full ./src/libpostal "YOUR ADDRESS GOES HERE" and see a memory leak that way. If you can find an address where that's true, post it here. Otherwise, it might be in the JNI binding (in that case again, see if you can make a reproducible cli-based Java program which can demonstrate the leak).