openvenues / libpostal

A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
MIT License
4.04k stars 417 forks source link

Running multiple instances of libpostal #390

Open VeridionRO opened 5 years ago

VeridionRO commented 5 years ago

Hello,

I have been testing libpostal for a few days now and I have the following use case:

I run 10 libpostal scripts simultaneously that analyse various documents, after a few seconds for most of them I get the following error:

ERR Averaged perceptron model could not be loaded at address_parser_load (address_parser.c:205) errno: Cannot allocate memory

I have a vague idea of how this could be solved, for example if I could tell all the script instances to share an already loaded model without creating each of them a separate one. As further information I am using pypostal.

I can see that it's because I run too many libpostal scripts at the same time, my question is, did anyone run into something similar and if so what was their solution.

antimirov commented 5 years ago

I think in your case you need to launch a single instance of a docker container with an exposed REST API - https://github.com/johnlonganecker/libpostal-rest-docker

Then you'll need to change your scripts that they send REST queries, for example, using 'requests' library. This way only one instance of libpostal will be running. Typical libpostal requests take less than 20ms so your 10 parallel scripts will be fine. If you need any additional info, ask me here.

VeridionRO commented 5 years ago

What I did was something similar to what you proposed. I created a socket server that has the libpostal loaded and waits for requests and clients connect to it and make queries. Thanks for your feedback