wiktorn / Overpass-API

Overpass API docker image
MIT License
134 stars 48 forks source link

Running the overpass-api image in a serverless environment #34

Closed cielo closed 4 years ago

cielo commented 4 years ago

Hello,

I have been trying to see if I can run the image on Google Cloud Run - fully managed.

It seems that docker-entrypoint.sh does the heavy lifting of setting up overpass environment during the first run.

I was curious if this overpass-db setup process could be done as part of docker image building process, such that overpass-db is ready to be used with the image, and the entry point script will only start the nginx server.

This is because Google Cloud Run - fully managed does not provide a mount, and there is no concept of 'first run', 'non-first run' as everything is ephemeral except the image itself.

e.g. If I use the current image, it will attempt to initialize overpass-db by downloading planet file per every request.

Do you have any guidance on how to do that? (or if this cannot be done for some reason?)

wiktorn commented 4 years ago

It is possibile, I've tried this approach here. and for Poland - the resulting container image was around ~10GB. The only problem is keeping it up to date, which requires some some instance to push updated container images.

I've created a cloud function that spins an instance with local SSD. For updates the instance size was n1-highcpu-2 and as far as I remember - for creation I used n1-highcpu-4 or highmem.

Then used cloud scheduler to spin the instance every day to apply updates and squash docker image (otherwise images are getting bigger and bigger). This in the end I found too costly to run.

But I was positively surprised with instance startup time.

cielo commented 4 years ago

Thanks for the reference. In my case, my osm extract is really small (<10MB), and it will rarely change, so I am fine without minute updates at least for now.

I checked your scripts and created a Dockerfile like below.

FROM wiktorn/overpass-api:0.7.55.9

ENV OVERPASS_META no
ENV OVERPASS_MODE init
ENV OVERPASS_PLANET_URL https://storage.googleapis.com/mybucket/public/small_extract.osm.bz2
ENV OVERPASS_RULES_LOAD 1

RUN "/app/docker-entrypoint.sh"

COPY supervisord.conf /etc/supervisor/conf.d/supervisord.conf
COPY update_nginx_port.sh /docker-entrypoint-initdb.d/update_nginx_port.sh

EXPOSE 80
CMD ["/app/docker-entrypoint.sh"]

It does not look pretty, but the line with RUN command runs and builds overpass-db during docker image compilation time. I added 2 COPY commands to overwrite 2 files similar to what you did in your setup. When container runs, CMD ["/app/docker-entrypoint.sh"] will execute supervisord.

Using above 'Dockerfile', I can successfully run container in local docker environment in my desktop. The container on its first run already has access to the overpass-db (created during image build time), and it no longer needs to build it. Querying /api/interpreter works as expected in local docker environment.

However, I am having trouble making it run in Cloud Run. It almost runs, such as nginx... but when I attempt to run a query (e.g. https://.../api/interpreter?data=...), I get error like this on the webpage.

The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.

Error: runtime error: The dispatcher (i.e. the database management system) is turned off.

If I check out the cloud run logs, below lines show up.


2020-02-23 03:53:28.437 PST INFO success: overpass_dispatch entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-02-23 03:53:29.936 PST File_Error Function not implemented 38 /osm3s_v0.7.55_osm_base Dispatcher_Server::1
2020-02-23 03:53:29.937 PST INFO exited: overpass_dispatch (exit status 0; expected)```

Do you happen to recall any similar issue encountering above error lines?

wiktorn commented 4 years ago

Yes, I recognize this error. This is due to the fact, that Google Cloud Run doesn't support unix domain sockets.

As you may see in update_nginx_port I changed unix domain socket to tcp socket. The same change is in supervisord.conf in fcgiwrap section.

cielo commented 4 years ago

Yes, I noticed that. I have two COPY commands in Dockerfile that apply changes in fcgiwrap to use tcp as well. I verified in the built image that the tcp is used instead of a socket.

Other than that, I spent most of the time today trying to make it work, but have had no luck. My image runs perfectly fine in my desktop & Google's compute vm. It just does not work in Google Cloud Run - Managed when deployed. The dispatcher starts, but shutdown after few seconds, and supervisord repeatedly restarts it per its configuration.

The only interesting bits that I found is that if I use wiktorn/overpass-api:0.7.53 instead of wiktorn/overpass-api:0.7.55.9, error message is little bit different.

The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.

Error: runtime error: open64: 38 Function not implemented /osm3s_v0.7.53_osm_base Dispatcher_Client::1

However, there is no stacktrace or any other error details so it is difficult to debug the root cause of the issue.

Do you have any suggestion in this case? Is there a flag or option to display full stack trace, or logs for overpass?

wiktorn commented 4 years ago

I must have misremembered that I've got overpass running on Cloud Run. When I've checked my deployment indeed it also fails the same way as yours.

Looking deeper into it I was self-contradicting using tcp in fcgiwarp, when Overpass uses unix domain sockets internally. So it's not the case of unix domain sockets.

OTOH, Overpass also uses shared memory IPC. I've created small Python app trying to create shared memory segment and it fails with:

 OSError: [Errno 38] Function not implemented
        at
        tester (/app/app.py:9)
        at
        hello_world (/app/app.py:17)

And the whole testing app is:

from flask import Flask
import os
import posix_ipc

app = Flask(__name__)

def tester():
    shm = posix_ipc.SharedMemory(name="tst", flags=posix_ipc.O_CREAT)
    mem = os.fdopen(shm.fd, "w+")
    print("\n".join(mem.readlines()))
    mem.write("tester tester tester\n")

@app.route('/')
def hello_world():
    tester()
    return 'Hello, World!'

So it looks like shared memory is not implemented / not allowed within Cloud Run.

Maybe it's possible to patch Overpass-API so, that shared memory is not used at all but you'd need to consult that in drolbr/Overpass-API

cielo commented 4 years ago

Ah I see. Thank you for the help and for the examples.