wiktorn / Overpass-API

Overpass API docker image
MIT License
134 stars 48 forks source link

Dispatcher Client timeout #48

Closed ghowen closed 4 years ago

ghowen commented 4 years ago

I successfully created an overpass instance with this Docker image about a month ago. When I was testing different parameters/configurations the past weeks, I deleted the working container and volume and created new containers. But since a few weeks they all fail to work.

The creation of the overpass instance seems to go through fine, nginx is responding to queries, but all queries time out and nginx returns a 504. Creating the overpass_monaco example does work. Using this or similar commands cloning the planet reproduce the timeout issue:

docker run \
  -e OVERPASS_META=yes \
  -e OVERPASS_MODE=clone \
  -e OVERPASS_DIFF_URL=https://planet.openstreetmap.org/replication/minute/ \
  -e OVERPASS_MAX_TIMEOUT=14400s \
  -e OVERPASS_SPACE=16777216 \
  -p 12345:80 \
  -d \
  --mount source=overpass,target=/db \
  --restart unless-stopped \
  --name overpass \
  wiktorn/overpass-api

The response from nginx is always

<p>The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.</p>
<p><strong style="color:#FF0000">Error</strong>: runtime error: open64: 0 Success /osm3s_v0.7.55_osm_base Dispatcher_Client::request_read_and_idx::timeout. The server is probably too busy to handle your request. </p>

"Too busy" cannot be the real reason as I am the only user running simple queries to test.

http://myhost.com:12345/api/status returns

Connected as: 1559890362
Current time: 2020-07-16T15:51:08Z
Rate limit: 0
Currently running queries (pid, space limit, time limit, start time):

so all good here.

changes.log looks OK. Updates are being made. transactions.log looks a bit suspicious:

2020-07-16 15:17:20 [16] Dispatcher version 0.7.56.3 eb200aebdae27bd8635c5829a2bdfcdb06d0ab5c just started.
2020-07-16 15:17:20 [15] Dispatcher version 0.7.56.3 eb200aebdae27bd8635c5829a2bdfcdb06d0ab5c just started.
2020-07-16 15:17:20 [33] request_read_and_idx() start
2020-07-16 15:17:21 [16] hangup of process 27.
2020-07-16 15:17:21 [16] waited idle for 8 cycles.
2020-07-16 15:17:21 [16] waited idle for 8 cycles.
[...]
2020-07-16 15:17:56 [16] waited idle for 8 cycles.
2020-07-16 15:17:57 [33] Dispatcher_Client::request_read_and_idx::timeout /osm3s_v0.7.55_osm_base 0 Success 
2020-07-16 15:17:57 [16] waited idle for 8 cycles.
2020-07-16 15:17:57 [16] hangup of process 33.
2020-07-16 15:18:14 [818] write_start() start version='2020-07-16T06:23:01Z'
2020-07-16 15:18:14 [16] waited idle for 173 cycles.
2020-07-16 15:18:14 [16] write_start of process 818. Considered as reading:.
2020-07-16 15:18:14 [818] write_start() end
2020-07-16 15:34:20 [819] request_read_and_idx() start
2020-07-16 15:34:20 [16] waited idle for 9625 cycles.
[...]
2020-07-16 15:34:56 [16] waited idle for 8 cycles.
2020-07-16 15:34:56 [819] Dispatcher_Client::request_read_and_idx::timeout /osm3s_v0.7.55_osm_base 0 Success 92.250.9.186
2020-07-16 15:34:56 [16] waited idle for 8 cycles.
2020-07-16 15:34:56 [16] hangup of process 819.
2020-07-16 15:40:34 [832] request_read_and_idx() start
2020-07-16 15:40:34 [16] waited idle for 3365 cycles.
[...]
2020-07-16 15:48:28 [16] waited idle for 8 cycles.
2020-07-16 15:48:28 [833] Dispatcher_Client::request_read_and_idx::timeout /osm3s_v0.7.55_osm_base 0 Success 92.250.9.186
2020-07-16 15:48:28 [16] waited idle for 8 cycles.
2020-07-16 15:48:28 [16] hangup of process 833.
2020-07-16 15:50:14 [16] waited idle for 1057 cycles.
2020-07-16 15:50:14 [16] query_my_status by process 836.
2020-07-16 15:50:14 [16] waited idle for 1 cycles.
2020-07-16 15:50:14 [16] hangup of process 836.
2020-07-16 15:51:08 [16] waited idle for 546 cycles.
2020-07-16 15:51:08 [16] query_my_status by process 839.
2020-07-16 15:51:08 [16] waited idle for 1 cycles.
2020-07-16 15:51:08 [16] hangup of process 839.
2020-07-16 15:51:44 [818] write_commit() start 149937 1838062 420189 1134147 270038
2020-07-16 15:51:44 [16] waited idle for 357 cycles.
2020-07-16 15:51:44 [16] write_commit of process 818.
2020-07-16 15:51:44 [818] write_commit() end
2020-07-16 15:51:44 [16] hangup of process 818.
2020-07-16 15:52:22 [1267] write_start() start version='2020-07-16T09:50:01Z'
2020-07-16 15:52:22 [16] waited idle for 388 cycles.
2020-07-16 15:52:22 [16] write_start of process 1267. Considered as reading:.
2020-07-16 15:52:22 [1267] write_start() end

I googled all over the place but cannot find a clue.

Thanks for any pointers!

ghowen commented 4 years ago

Anyone, please? @wiktorn?

This issue is constant now. I cannot create a working container with a cloned planet file, even if I delete every old image, container and volume. I have tried probably 20 times with varying parameters, but no luck.

The maintainer of overpass cannot support unofficial Docker images (https://github.com/drolbr/Overpass-API/issues/587#issuecomment-664532894) so this place seems to be the only place to solve this.

wiktorn commented 4 years ago

Regarding the comment that was here (and original subject) - the container doesn't run overpass as root. nginx processes are run as UID 101, as well as FastCGI wrapper, and overpass dispatcher and update run as UID 1000.

So nothing to change here.

I really can't tell what's wrong with your setup and original @mmd-osm recommendation would be something that I'd also give - try setting this outside the container and check if it works. More less, all the steps that you need to do are already in the scripts within repository.

I do not have machine large enough to test this case and wasn't motivated enough to pay for one in Cloud.

How big is the machine where you're trying to run planet clone? Did you try creating overpass instance using planet file instead of cloning? Did it work?

ghowen commented 4 years ago

It's amazing how one can shoot oneself in the foot... I am going to take a break from work now for a week and get some rest... In the 40+ years that I am in IT I don't remember a case where I spent so much time "debugging" an "issue" that is just some stupid oversight on my part...

To make a long story short: The problem was in the line -e OVERPASS_SPACE=16777216 \ when creating the container.

I intended to give Overpass 16 GBytes of RAM and instead gave 16 MBytes, which explains why the CGI process timed out.

@wiktorn Thanks for your response and thanks for asking about the machine, which somehow triggered the solution. I am running my tests on a virtual cloud server with 8 CPU cores, 32 GBytes of RAM and 800 GBytes of SSD space - which should be ample for Overpass - if configured correctly...

I did try to create an instance using the planet file, unfortunately with the same settings. It was running for about a week (!) and then failed somewhere after updating the database and started over from the beginning - most probably due to the same issue. To setup an Overpass planet clone with meta data takes only about 2-3 hours on the test machine. Closing this now and going to bed :)

wiktorn commented 4 years ago

Happy to hear that you find solution. Compared to other public instances I'd say that this is minimal configuration I'd try. I'm using 8GB ram for single-country extract and this is more less, minimal amount to initialize the database. I'd assume, that for whole planet you will need more RAM. Good to know that 32GB RAM is enough.

ghowen commented 4 years ago

Thanks Wiktor, and also thanks again for your work on the image. I truly believe that an official Overpass Docker image would be of great benefit to the OSM community as it would facilitate setting up public and private instances. And I think your image is a good candidate. I have no clue how an image would become official, but maybe @mmd-osm would know if you are interested.

I was monitoring RAM usage during setup and cloning and it did not find it to be too excessive. My machine rarely showed more than 25% percent usage which would be ~ 8 GBytes. Maybe you need more RAM for public instances to support more simultaneous queries?