pelias / docker

Run the Pelias geocoder in docker containers, including example projects.
MIT License
315 stars 218 forks source link

ARM support #264

Closed davidbarratt closed 7 months ago

davidbarratt commented 2 years ago

Use-cases

I would like to run Pelias on a Raspberry Pi, but that would require ARM support. It doesn't look like all of the docker images support arm64. :/

Attempted Solutions

Proposal

Add ARM builds for all Docker images. :)

References

dimaryaz commented 2 years ago

Just finished setting up an AWS EC2 t4g instance - one of their latest and cheapest instance types - only to run into the same problem.

missinglink commented 2 years ago

We'd be happy to accept a PR for this, I don't have an ARM Mac so I can't test this unfortunately but we should be able to set it up in the CI env.

All Pelias Docker images extend from the baseimage so once that's configured to do either multiarch or a second baseimage is generated for ARM then we can configure the CI to build them and begin testing the application components.

I suspect that major software vendors such as nodejs/elasticsearch with large teams will already support ARM but things like libpostal might require some work.

Tim Cook himself said it would take some time for ARM to be supported everywhere so we'll need some help from the Pelias community to get it working and tested!

orangejulius commented 2 years ago

Hey folks,

As a Macbook M1 owner, I wouldn't mind spending a little bit of my personal time making Pelias work on ARM :)

There's going to be a bunch of aspects to this work, and I'll try to list a general overview here. A big thanks to @jlowe000 for his work apparently getting all of Pelias to work on ARM, and his accompanying blog post https://redthunder.blog/2021/07/04/daysofarm-12-of-x/. Definitely points us in the right direction.

Elasticsearch version

As mentioned in https://github.com/pelias/docker/pull/263, we'll want to upgrade the default Elasticsearch image to 7.10.2 or newer to get ARM compatibility. It sounds like this alone might be enough to get Pelias to at least work on an M1 Mac. Lots of things will still use rosetta2 emulation and be slow and battery draining, but it's a start.

Go binaries and other hardcoding of amd64

We hardcode amd64 download links for various dependencies across the project, for example the Polylines importer Dockerfile:

https://github.com/pelias/polylines/blob/cb0b382af7c5bd8cfe3b607c6b35f6f7417b24bc/Dockerfile#L8

I'd love to hear what best practices there are for this. I think buildx (more on that below) will provide an ARCH environment variable, can we use that?

pbf2json

We'll either need to add arm binaries to anything that uses pbf2json (interpolation, openstreetmap), or start compiling from source during Docker builds on arm

Docker buildx

If we want Docker images to be built for the project with arm support by default then we might want to look at Docker multi-arch builds, presumably using buildx. I haven't had a ton of luck with this yet, but I'm sure it can be done.

Let me know if I'm missing anything!

jlowe000 commented 2 years ago

Hi @orangejulius, I've been able to get a "version" of this up and going.

I wouldn't say it's been fully tested but it's been very stable for the queries and workloads that I've been working with. The repos are checked in and forked from the pelias versions. The last time that I did a fetch was working on the issue with the interpolation.

A couple of things that I had to update.

There was a comment here that pelias on arm64 worked fine - https://github.com/isaacs/nave/pull/111#issuecomment-900422483. I'm not sure whether they were referring to their repo or pelias.

There are definitely some areas of concern that I hadn't had time to look into deeply. The main one is the Valhalla stack.

missinglink commented 2 years ago

regarding pbf2json, we are already building pbf2json.linux-arm and bundling it in the npm module:

path.join(__dirname, 'build', util.format( 'pbf2json.%s-%s', os.platform(), os.arch() ) )
/tmp ❯ ls -lah node_modules/pbf2json/build
total 22792
drwxr-xr-x  6 peter  wheel   192B Nov  4 10:02 .
drwxr-xr-x  7 peter  wheel   224B Nov  4 10:02 ..
-rwxr-xr-x  1 peter  wheel   5.8M Oct 26  1985 pbf2json.darwin-x64
-rwxr-xr-x  1 peter  wheel   1.7M Oct 26  1985 pbf2json.linux-arm
-rwxr-xr-x  1 peter  wheel   1.9M Oct 26  1985 pbf2json.linux-x64
-rwxr-xr-x  1 peter  wheel   1.8M Oct 26  1985 pbf2json.win32-x64

do we need pbf2json.darwin-arm? I'm assuming not since it will run in a linux docker container? if so, please open an issue on that repo and I'll have a look at how much work it is.

ref: https://github.com/pelias/pbf2json/tree/master/build

jlowe000 commented 2 years ago

I'm not 100% sure. But I think I had issues with the prebuilt arm version as I am running on 64bit arm.

missinglink commented 2 years ago

It was simple enough https://github.com/pelias/pbf2json/pull/107

ddelange commented 2 years ago

Add ARM builds for all Docker images.

Does the scope of this issue include derivatives/auxiliaries like pelias/libpostal-service?

missinglink commented 2 years ago

Does the scope of this issue include derivatives/auxiliaries like pelias/libpostal-service?

Yes, I think it's all-or-nothing, partial support isn't particularly helpful.

We would be happy to accept contributions, I doubt this will be picked up by the core team as we don't have a need for it.

ddelange commented 1 year ago

We would be happy to accept contributions

@missinglink could you point me to the CI that builds and pushes to dockerhub? I have some experience running multi-arch builds from raw dockerfile (buildx build) or from docker compose definition (buildx bake), particularly using QEMU emulator on GitHub Actions CI

missinglink commented 1 year ago

Each repo has its own CI script such as this:

https://github.com/pelias/api/blob/master/.github/workflows/push.yml#L42

Rather than repeat everything per-repo it just executes this script:

https://github.com/pelias/ci-tools/blob/master/build-docker-images.sh

ddelange commented 1 year ago

Thanks, so each repo would need:

     steps:
       - uses: actions/checkout@v2
+      - uses: docker/setup-qemu-action@v1
+      - uses: docker/setup-buildx-action@v1
       - name: Build Docker images
       ...

and the script would need:

-  docker build -t $tag .
-  docker push $tag
+  docker buildx build --push --platform=linux/amd64,linux/arm64,linux/arm/v7 -t $tag .

Do you expect build failures in some repos? getting dependencies from official apt repositories will generally be available for arm64 (not so sure about arm v7 but probably also ok), and so adding this platform should work ootb 🤔

Is there a complete list of repos somewhere that would need a PR? 36 of them? you can also move the workflow itself to pelias/ci-tools and call it from the children ref reusable workflows

orangejulius commented 1 year ago

Thanks @ddelange that was super helpful.

I tested that out with https://github.com/pelias/api/tree/arm-build and it worked just fine.

It is very likely there will be some build failures on some repos, so we'll have to start sorting through that now.

ddelange commented 1 year ago

Awesome, let me know if I can provide any further support!

ddelange commented 1 year ago

I would be happy to contribute and get this through: would you prefer a core contributor to take care of it, or should I just open 36 PRs? If yes, would the PRs use the feature branch from ci-tools until further notice, like your PoC @orangejulius? Or what is your preferred order of things?

ddelange commented 1 year ago

A second option is to pin to a specific commit of ci-tools. That way the PRs could be tested, but you'd be left having to do 36 PRs everytime something changes in ci-tools, something I guess you wanted to avoid.

A third option: the PRs could also leave the link untouched (pointing to master), as my diff above is a non-breaking change. The potentially breaking change would then only come once ci-tools feature branch merges into master.

A fourth option: temporarily point to the feature-branch, test the PRs, once green, point back to master, merge it, and it only goes live once ci-tools feature branch merges into master.

I tend towards number 4 but I'm curious about your thoughts!

ddelange commented 1 year ago

Opening 34 PRs under option 1 or 4. Probably, https://github.com/pelias/docker-baseimage/pull/26 needs to merge (and release?) before most of them can be tested.

There's some repos that don't use github actions, I won't touch them for now (they'll break once ci-tools feature branch merges):

missinglink commented 1 year ago

Hi @ddelange, we are planning to discuss this in a team meeting today, can you please hold off any more PRs until we chat about what we'd like to do.

ddelange commented 1 year ago

Hi @missinglink,

These should be all (except 3, see comment above) -- feel free to close them if you want to approach it differently!

missinglink commented 1 year ago

I didn't have time to look through them yet, were there any PRs where the CI failed?

ddelange commented 1 year ago

I think the PRs need CI approval from an org member, potentially even for each commit I push depending on your org settings

ddelange commented 1 year ago

Realised the CI is not being triggered on my PRs because they're coming from a fork, so it's not technically a push to the repo.

Adding the pull_request trigger on top of the push trigger will work. It should error for me on the pushing to dockerhub phase, because PRs coming from a fork won't have access to the base repo's github secrets.

There is also the (dangerous) pull_request_target trigger where you'd need to bar the permissions carefully.

For PRs not from a fork, the push event will run in parallel to the pull_request triggered job, doing double minutes. To avoid that, a branch filter can be added to the push trigger to only run on master commits.

The triggering setup could be:

on:
  pull_request:
  push:
    branches: master

  # optional
  release:
    types: [released, prereleased]  # triggers workflow using the release tag as ref
  workflow_dispatch:  # allows running workflow manually from the Actions tab

edit: and I think it doesn't make sense for me to push this snippet, because I'm pretty sure the modified event triggers won't go into effect when coming from a fork :')

ddelange commented 1 year ago

Hi @ddelange, we are planning to discuss this in a team meeting today, can you please hold off any more PRs until we chat about what we'd like to do.

Hi @missinglink 👋

Any updates thus far?

missinglink commented 1 year ago

Agh sorry I forgot to write back, the pull requests are a pain to deal with since there's so many repos, so we'd like to avoid having to do them multiple times.

As they stand they are targeting ci-tools/buildx (a branch of ci-tools) and I'm guessing we'd have to do them all again to switch them back to master?

I think this is the right direction to go but I'm still a little concerned that maybe one or two repos might end up being more difficult (likely anything to do with libpostal).

So what I think is a good solution is to merge a ci-tools PR first which contains the change but also has the ability to disable ARM support via env var.

That would allow us to roll it out across the board and also have the ability to disable it for any repos where there are issues.

It seems to make sense to have multi-arch builds enabled by default and optionally disabled via an env var.

Does that sound like a plan?

missinglink commented 1 year ago

Looking at the code again it might make sense to have a default "platforms" string and then allow it to be overwritten by an env var as this gives a lot more granular control.

https://github.com/pelias/ci-tools/compare/master...buildx

missinglink commented 1 year ago

Question: is there any functional difference between docker build and docker buildx build --platform=linux/amd64?

ddelange commented 1 year ago

the env var idea with multi-arch enabled by default sounds like a good plan! I recently took a similar approach here

I'm guessing we'd have to do them all again to switch them back to master

yeah, the rollout here is a bit iffy. I would tend to option 4 but maybe your env var suggestion opens the door to even more possibilities? 🤔

Question: is there any functional difference between docker build and docker buildx build --platform=linux/amd64?

short answer: no! buildx does couple the pushing of multi-arch manifest (so separating build and push instead of build --push would be a pain, for that you'd need regctl i think, but haven't tried) but here that's no problem

missinglink commented 1 year ago

In CI there is no need for separation of build and push so we can move forward on this path, where we migrate to buildx and include the required build dependencies without any negative impact.

Of the two options, being enabling multi-arch by default or multi-arch by config, I'm confident that multi-arch is the preferred option so would advocate it being enabled by default and disabled/adapted via config.

missinglink commented 1 year ago

So the immediate next step is opening a PR on ci-tools which defines a variable with default --platform flags and making that variable overloadable via the :- bash convention.

Once that is merged to master we can go ahead and merge these PRs once they point to the master branch and we're done for now, with the option of testing and reconfiguring in the future with minimal effort

cc/ @orangejulius

missinglink commented 1 year ago

I'm off to bed now but I can open that PR tomorrow

ddelange commented 1 year ago

Hi @missinglink 👋 was there any decision about the order in which to roll this out?

ddelange commented 1 year ago

I think it doesn't make sense for me to push this snippet, because I'm pretty sure the modified event triggers won't go into effect when coming from a fork :')

Does it make sense to keep all my PRs open?

Or should I close them and leave it up to the maintainers when and in which order to get this done?

schmidp commented 11 months ago

Hey is there any plan to enable multiarch soon? Trying to decide if I should built the images myself or can I help somehow to get the multiarch build merged?

amuedespacher commented 10 months ago

ARM support would be greatly appreciated!

missinglink commented 10 months ago

Hi all,

I had a look over the outstanding ARM tickets today and there's a path forward with https://github.com/pelias/ci-tools/pull/11 but also a significant amount of testing and possibly some dev work involved to fully support ARM and have it stable enough to use in a production environment.

It's unfortunately not as simple as just using buildx as the docker images contain a bunch of different tools and software out of our control which all needs to be multiarch capable and compiled accordingly.

Due to the amount of time required to set up and maintain ARM builds, I'm sorry to say Julian & I likely won't be able to take this on any time soon. I recently got a new M2 Mac so I can slowly chip away at these issues.

If you're in a position to sponsor some developer time to work on this, please email us hello@geocode.earth

missinglink commented 7 months ago

Hi all, I merged https://github.com/pelias/ci-tools/pull/11 today, this is the first step to getting multi-arm builds working.

From now on, when PRs are merged in the Pelias repos it will produce a multiarch docker image which will run on both amd64 and arm64 :tada:

The exceptions are the following repos which I believe are still not capable of running on ARM because they depend on libpostal and there are downstream issues blocking it, once those are resolved we should be able to get full ARM support.

ddelange commented 7 months ago

Nice! And what about all my closed PRs linked above? Mainly the

- uses: docker/setup-qemu-action@v1
- uses: docker/setup-buildx-action@v1

addition which makes docker buildx build --platform work?

missinglink commented 7 months ago

@ddelange those don't seem to be required, we handle it all here: https://github.com/pelias/ci-tools/blob/master/build-docker-images.sh#L61-L68

missinglink commented 7 months ago

I've updated the pelias/baseimage to be multiarch, so as of today any derivative images can also be multiarch.

As a test I've managed to produce multiarch builds of both pelias/elasticsearch and pelias/polylines.

The latter required a conditional statement in the Dockerfile to detect which file to download, an example of this can be found in https://github.com/pelias/polylines/pull/273

missinglink commented 7 months ago

I think the final hurdle is going to be libpostal, there was some progress in supporting the Mac M1 chipset but it caused a regression and IIRC was removed.

It might be as simple as detecting the architecture and using the --disable-sse2 build flag.

ddelange commented 7 months ago

for libpostal that flag should indeed work, we are producing (OpenBLAS accelerated) x86_64/arm64 images for libpostal here using TARGETARCH which is exposed by default by buildx.

@ddelange those don't seem to be required, we handle it all here: https://github.com/pelias/ci-tools/blob/master/build-docker-images.sh#L61-L68

that's good to know, seems github is now bundling qemu and buildx in the default runner. nice!

missinglink commented 7 months ago

I just published pelias/elasticsearch:7.17.15 which supports multiarch. We can optionally rebuild older versions for multiarch if requested in the future.

missinglink commented 7 months ago

I managed to complete a portland-metro build on my Macbook M2 this morning with a couple of small changes to the docker-compose.yml file 🎉

It seems that Docker is now able to emulate AMD64 on ARM64 using Qemu (not Rosetta unfortunately), so a bunch of the images seem to 'just work', although I bet they are much slower under emulation.

In order to fully support ARM64 (for Graviton servers, Raspberry PI etc.) we will need native ARM64 support for all the images, some just need to be rebuilt in CI, some need some adjustments to work.

For reference, this command is handy, after pulling down the latest images (to grab the multiarch versions I've been building) you can run this to list what architecture docker pulled down for you:

docker image inspect --format "{{.Architecture}} {{.RepoTags}}" $(docker image ls -q -f 'reference=pelias/*')

If you're on an ARM64 machine you should see some of them prefixed with ARM64.

ddelange commented 7 months ago

I also wrote a one-liner to check out available architectures and sizes on the registry before pull: https://stackoverflow.com/a/73108928/5511061

missinglink commented 7 months ago

@ddelange I'm trying the workflow you have to disable SSE as per your Dockerfile but it results in an error, maybe a more recent commit of libpostal broke it?

Does the master branch of libpostal work for your build?

#17 140.4 gcc: error: unrecognized command-line option ‘-mfpmath=sse’
#17 140.4 gcc: error: unrecognized command-line option ‘-mfpmath=sse’
#17 140.4 gcc: error: unrecognized command-line option ‘-msse2’
#17 140.4 gcc: error: unrecognized command-line option ‘-msse2’
#17 140.4 make[2]: *** [Makefile:3956: libpostal-strndup.o] Error 1
#17 140.4 make[2]: *** Waiting for unfinished jobs....
#17 140.4 make[2]: *** [Makefile:3970: libpostal-main.o] Error 1
#17 140.4 gcc: error: unrecognized command-line option ‘-mfpmath=sse’
#17 140.4 gcc: error: unrecognized command-line option ‘-mfpmath=sse’
#17 140.4 gcc: error: unrecognized command-line option ‘-msse2’
#17 140.4 gcc: error: unrecognized command-line option ‘-msse2’
#17 140.4 make[2]: *** [Makefile:3998: libpostal-file_utils.o] Error 1
#17 140.4 make[2]: *** [Makefile:3984: libpostal-json_encode.o] Error 1
#17 140.4 make[2]: Leaving directory '/code/libpostal/src'
#17 140.4 make[1]: *** [Makefile:464: all-recursive] Error 1
#17 140.4 make[1]: Leaving directory '/code/libpostal'
#17 140.4 make: *** [Makefile:373: all] Error 2
#17 ERROR: process "/dev/.buildkit_qemu_emulator /bin/sh -c ./bootstrap.sh &&     ([ \"$TARGETARCH\" == \"arm64\" ] && ./configure --datadir=\"$DATADIR\" --disable-sse2 || ./configure --datadir=\"$DATADIR\") &&     make -j4 &&     make install &&     ldconfig" did not complete successfully: exit code: 2
ddelange commented 7 months ago

all green on our side... fwiw running on a ubuntu jammy image with latest build-essential installed

missinglink commented 7 months ago

ok cool, it took a couple of days dev work but now all images are available on ARM 🎉

I'm going to close this PR, please open individual bug reports if you have any issues.

arm64 [pelias/fuzzy-tester:master]
arm64 [pelias/transit:master]
arm64 [pelias/pip-service:master]
arm64 [pelias/placeholder:master]
arm64 [pelias/csv-importer:master]
arm64 [pelias/whosonfirst:master]
arm64 [pelias/elasticsearch:7.16.1]
arm64 [pelias/openaddresses:master]
arm64 [pelias/openstreetmap:master]
arm64 [pelias/api:master]
arm64 [pelias/interpolation:master]
arm64 [pelias/libpostal-service:latest]
arm64 [pelias/schema:master]
arm64 [pelias/elasticsearch:7.17.15]
arm64 [pelias/polylines:master]
ddelange commented 7 months ago

awesome work! :boom: