I was checking out libpostal, and saw something that could be improved.
My country is
Canada
Here's how I'm using libpostal
I plan to create a C++ application that uses libpostal to parse international addresses. The C++ application will be running on a fleet of Linux servers, each with 8GB of memory.
Here's what I did
Not really what I did, but I have two questions
Q1: Why does libpostal always load the 1.8G trained model into memory? Can the model be split into smaller parts, which can be loaded when needed? In my case, 1.8GB is ~25% of the memory of our Linux computer. It sounds weird to allocate 20% of memory of a computer to do address parsing.
Q2: In the Why C section of the README file, we mention that
Memory-efficiency: libpostal is designed to run in a MapReduce setting where we may be limited to < 1GB of RAM per process depending on the machine configuration. As much as possible libpostal uses contiguous arrays, tries (built on contiguous arrays), bloom filters and compressed sparse matrices to keep memory usage low. It's possible to use libpostal on a mobile device with models trained on a single country or a handful of countries.
Is libpostal considered memory-efficient if it always loads a model of 1.8GB into memory regardless?
Do we have instruction on how to train the libpostal model with a handful of countries, rather than all countries in OSM?
Here's what I got
N/A
Here's what I was expecting
N/A
For parsing issues, please answer "yes" or "no" to all that apply.
N/A
Here's what I think could be improved
Avoid loading the entire 1.8GB model into memory
Provide instruction on how to train the libpostal model with a handful of countries, rather than all countries in OSM
Hi!
I was checking out libpostal, and saw something that could be improved.
My country is
Canada
Here's how I'm using libpostal
I plan to create a C++ application that uses
libpostal
to parse international addresses. The C++ application will be running on a fleet of Linux servers, each with 8GB of memory.Here's what I did
Not really what I did, but I have two questions
Q1: Why does libpostal always load the 1.8G trained model into memory? Can the model be split into smaller parts, which can be loaded when needed? In my case, 1.8GB is ~25% of the memory of our Linux computer. It sounds weird to allocate 20% of memory of a computer to do address parsing.
Q2: In the Why C section of the README file, we mention that
Here's what I got
N/A
Here's what I was expecting
N/A
For parsing issues, please answer "yes" or "no" to all that apply.
N/A
Here's what I think could be improved