wiktorn / Overpass-API

Overpass API docker image
MIT License
136 stars 48 forks source link

Latest Europe continent database indexing taking too much time #124

Open akki401 opened 6 months ago

akki401 commented 6 months ago

Hello Team, I am using overpass docker image(https://hub.docker.com/r/wiktorn/overpass-api) to launch the overpass docker service for Europe continent and the plant URL I am using is https://download.geofabrik.de/europe-latest.osm.bz2 which is latest up to date. It took more than 48 hours to do indexing and still doing indexing. Find the below docker command to start the overpass docker and EC2 hardware config. I have given diff url to update every 30 days in environment variables. My question is when I am starting the docker image it will start with latest europe continent database but when I see the logs it is considering the updates also and start doing indexing( I see update 30 days interval). Screenshot of last few lines of latest logs image

Docker command: cmd = " ".join( [ "docker", "run", "--restart=always", # starts docker after system reboot "--log-driver json-file", "--log-opt max-size=10m", "--log-opt max-file=3", "-e", "OVERPASS_META=yes", "-e", "OVERPASS_MODE=init", "-e", f"OVERPASS_PLANET_URL=file:///db/{self.region}-latest.osm.bz2", "-e", "OVERPASS_RULES_LOAD=10", # infinite areas update process. Ex: 0-always run, 5-run 5% of the time... "-e", f"OVERPASS_DIFF_URL={self._updates_url}", # https://download.geofabrik.de/europe-updates "-e", f"OVERPASS_UPDATE_SLEEP={3600 24 30}", # update every 30 days "-e", "OVERPASS_STOP_AFTER_INIT=false", "-v", f"{self._dir_overpass}/:/db", "-p", f"{self._overpass_port}:80", "-i", "-t", "-d", "--name", self._overpass_container_name, "wiktorn/overpass-api", ] ) EC2 config: Instance Size: i4i.2xlarge vCPU: 64 Instance Storage (GB): 1 x 1,875 AWS Nitro SSD Network Bandwidth (Gbps): Up to 12 EBS Bandwidth (Gbps): Up to 10

and After very long time the status of docker is: image

I tried mutliple times but got same errors.

Note: I am using url to a diff directory for updating the instance is https://download.geofabrik.de/europe-updates instead of https://planet.openstreetmap.org/replication/minute/ Does it cause any issue? But other continents like northa-merica, asia are working fine with diff url https://download.geofabrik.de/north-americal-updates, https://download.geofabrik.de/asia-updates

wiktorn commented 6 months ago

Hi,

I have problem understanding what the issue is. From the logs you have shared, I don't see any problem using https://download.geofabrik.de/europe-updates - and it makes applying updates easier, as you have less files to download.

Regarding updating every 30 days - if you restart the container, it resets the timer, so it is not super precise, and script starts with the update and then sleeps, so I don't see anything suspicious in what you have reported above.

akki401 commented 6 months ago

Thanks for the reply @wiktorn. As you mentioned even I don't see any issue in logs, but the docker status is unhealthy after longtime. I run the docker multiple times but status is unhealthy. Interesting thing is the same docker run command working fine( with update URL https://download.geofabrik.de/north-americal-updates, https://download.geofabrik.de/asia-updates) for north-america and asia.

Quite confused

wiktorn commented 6 months ago

What is the error reported by healthcheck?

Can you post first 100 lines of logs after the container restart and maybe last 100 lines after few minutes of running, but excluding lines containing compute_geometry?

akki401 commented 6 months ago

I see the below log statements repeatedly seeing in logs image

image

image

image

image

wiktorn commented 6 months ago

If you sort your logs by time, it just looks like it is applying updates one by one, so it doesn't look like anything unusual.

Why do you think, that this logs are of an issue?

akki401 commented 6 months ago

Because I didn't invoke the updates diff url flag "OVERPASS_DIFF_URL" environment variable flag while running the docker.

wiktorn commented 6 months ago

So how the container knows, that it needs to fetch updates from https://download.geofabrik.de/europe-updates? It's nowhere in the defaults.

akki401 commented 6 months ago

I assume if I don't pass OVERPASS_DIFF_URL then container does not look for updates. Am I wrong? What if I assign empty to OVERPASS_DIFF_URL (OVERPASS_DIFF_URL = "") does it not look for updates? what would be the default action?

akki401 commented 6 months ago

It took almost and more than 2 days but still indexing the database. Is it usual or unusual? image

Finally the docker status is "unhealthy" image

and the last logs are: image

wiktorn commented 6 months ago

I'm not sure from where your update process has started. If it is still part of initial update during startup or it is part of update loop running alongside the daemon itself.

If it is the latter, then the container should be healthy. If it is still updating as a part of initial update, then it should be in the "starting" state.

The other thing is Europe should be taking that long to update. I do not have enough hardware to test it, but I'd expect that most of the time is spent on processing planet file and the updates should be applied pretty quickly.

From the logs you have shared it looks like it takes 30 minutes to process 30MB of updates. If that's the case, that's very slow.

The other thing that could be happening here, that for some strange reason, the updates are in infinite loop, but it's hard to tell whether this is the case or not.

akki401 commented 5 months ago

I was wondered when I see the database still indexing the DB/updating the updates even after docker health status is unhealthy. The docker status was unhealthy at 7th April 2024 but still there are still some updates are seen in db and log file also updated(till now 10th April 2024) what does it mean? find the screenshots: image

image

image