openvenues / libpostal

A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
MIT License
4.08k stars 421 forks source link

README instructions for Ubuntu Linux are missing `build-essentials` package which installs `make` and add `git` which is missing #662

Open rjurney opened 6 months ago

rjurney commented 6 months ago

I want to alter the README to fix a problem in the Ubuntu install isntructions. The README installations instructions for Libpostal do not work on the Ubuntu Docker Hub image ubuntu:22.04. There is


My country is

US of A!


Here's how I'm using libpostal

I used it for address parsing on billions of addresses on PySpark and to subsequently perform geospatial analyses to construct a business knowledge graph of 3 billion nodes and 2.2 billion edges. I've been an enthusiast ever since! Now I use it for entity resolution in general.


Here's what I did

I am building a Libpotal Dockerfile. I ran the ubuntu:22.04 image and tried the Libpostal install instructions:

# Start from a Jupyter Docker Stacks version
# FROM continuumio/anaconda3:2024.02-1
FROM ubuntu:22.04

# Install system dependencies
RUN apt update && \
    apt upgrade -y && \
    apt install curl build-essential autoconf automake libtool pkg-config gmake git -y && \
    rm -rf /var/lib/apt/lists/*

# Install Libpostal - a C library for parsing/normalizing street addresses around the world
RUN git clone https://github.com/openvenues/libpostal.git && \
    cd libpostal && \
    ./bootstrap.sh && \
    ./configure --datadir=/tmp --disable-sse2 MODEL=senzing && \
    make -j12 && \
    make install && \
    ldconfig

Here's what I got

First there is an exception because git is not installed.

Next there is a build error because make is not found. I searched and it isn't in the image.

7.594 /bin/sh: 1: make: not found

Here's what I was expecting

  1. I expect to be able to clone git from the build instructions for Ubuntu.
  2. I expect make to build Libpostal :) I tried the --disable-dependency-tracking option it suggested and it still doesn't work.


Here's what I think could be improved

I am going to patch the README in a PR to fix these problems.

albarrentine commented 6 months ago

Hm is that new? We use ubuntu:latest for the tests in Github Actions and that just installs curl autoconf automake libtool pkg-config but maybe make and a C compiler were default and now aren't or something, or they're default on Github's runners. There seems to be a more specific package for make that doesn't install a C++ compiler and all the debian stuff. Some folks may also want to use other compilers like llvm (may compile the scanner a bit faster, YMMV) so wouldn't need gcc. The instructions are not necessarily for copy/pasting into a Dockerfile and we assume they have a few basic things like git and a C compiler just as a recipe lists ingredients but assumes cutlery, etc. I would tend to put make in that basics category but that's probably not right, and would be equivalent to assuming people have docker to build a Dockerfile.

Note: these are only build requirements. Libpostal has no runtime dependencies other than standard libc (libc-dev). It's a good practice security-wise to use a multi-stage build and copy the binary into a runtime image without a C compiler. Since it's just a library, it didn't have any official Dockerfiles, but suppose we could release some.