Closed BathoryPeter closed 5 months ago
I can see a significant drop in S3 graphs.
Well you seem to have narrowed in on one tiny window - if you look at the last 24 hours it all looks normal and all our replication feeds are running fine so it seems to be an issue specific to your connection to AWS.
The issue is still present. With maxInterval=120 about half of the requests times out, and my replag is continuously increasing:
My server connects from Frankfurt, but I experiencing the same problem here, from Budapest, Hungary.
As I say we have machines in at least six locations on five different networks that are pulling from the feed with no problem - they're using osmium not osmosis of course but I don't see why that would make a difference.
Well you seem to have narrowed in on one tiny window
The drop on the graph coincides exactly with the first error in my logs.
Would a verbose osmosis log help?
Well you seem to have narrowed in on one tiny window
The drop on the graph coincides exactly with the first error in my logs.
There are brief ups and downs all the time though - the long term average clearly doesn't show any significant decrease.
Would a verbose osmosis log help?
No, it would not. None of us have used osmosis for years and in any case it's a network timeout so what exactly do you expect a verbose log to show? There is a problem with packets from your network getting to and/or from Amazon and there is nothing much we can do to help with that.
Hmm, I did an attempt replacing baseUrl to amazon, and that completely solved the problem:
#baseUrl=https://planet.openstreetmap.org/replication/minute/
baseUrl=https://osm-planet-eu-central-1.s3.dualstack.eu-central-1.amazonaws.com/planet/replication/minute
maxInterval=3600
So your problem is reaching he.net in Amsterdam then by the sounds of it.
@BathoryPeter Please could you run a traceroute planet.openstreetmap.org
or a mtr --report-wide --report-cycles 10 planet.openstreetmap.org
From Frankfurt Düsseldorf:
traceroute to planet.openstreetmap.org (184.104.179.145), 30 hops max, 60 byte packets
1 ip-161-97-128-11.static.contabo.net (161.97.128.11) 1.512 ms 1.488 ms 1.510 ms
2 et-4-0-8.edge6.Dusseldorf1.Level3.net (62.67.22.193) 1.566 ms 10.0.50.1 (10.0.50.1) 1.411 ms 1.279 ms
3 et-4-0-8.edge6.Dusseldorf1.Level3.net (62.67.22.193) 1.351 ms ae2.3210.edge4.frf1.neo.colt.net (171.75.9.147) 4.772 ms 4.751 ms
4 ae2.3210.edge4.frf1.neo.colt.net (171.75.9.147) 4.723 ms e0-5.core2.fra1.he.net (216.66.87.197) 5.201 ms ae2.3210.edge4.frf1.neo.colt.net (171.75.9.147) 4.690 ms
5 e0-5.core2.fra1.he.net (216.66.87.197) 5.677 ms 5.854 ms *
6 * port-channel1.core3.fra1.he.net (184.104.198.26) 4.823 ms *
7 port-channel2.core3.fra2.he.net (72.52.92.70) 5.476 ms * *
8 openstreetmap-foundation.port-channel7.switch2.ams2.he.net (184.104.202.70) 8.809 ms 8.792 ms *
9 openstreetmap-foundation.port-channel7.switch2.ams2.he.net (184.104.202.70) 13.858 ms * *
10 * * *
11 * * *
12 * * *
13 * * *
14 * * *
15 * * *
16 * * *
17 * * *
18 * * *
19 * * *
20 * * *
21 * * *
22 * * *
23 * * *
24 * * *
25 * * *
26 * * *
27 * * *
28 * * *
29 * * *
30 * * *
Start: 2024-06-10T10:51:17+0200
HOST: carto-map Loss% Snt Last Avg Best Wrst StDev
1.|-- 2a02:c206::a 0.0% 10 1.1 1.3 1.1 1.8 0.2
2.|-- ge-7-0-6.bar1.Munich1.Level3.net 0.0% 10 1.3 5.6 1.2 21.7 6.9
3.|-- lo-0-0-v6.edge4.Frankfurt1.Level3.net 0.0% 10 9.7 5.3 4.6 9.7 1.6
4.|-- e0-6.core2.fra1.he.net 10.0% 10 5.8 6.1 5.3 10.4 1.6
5.|-- ??? 100.0 10 0.0 0.0 0.0 0.0 0.0
6.|-- ??? 100.0 10 0.0 0.0 0.0 0.0 0.0
7.|-- ??? 100.0 10 0.0 0.0 0.0 0.0 0.0
8.|-- openstreetmap-foundation.port-channel7.switch2.ams2.he.net 0.0% 10 13.3 11.4 8.8 13.8 1.9
9.|-- norbert.openstreetmap.org 0.0% 10 7.5 7.6 7.5 8.0 0.1
Similar reports from @pa5cal in https://community.openstreetmap.org/t/what-is-the-preferred-way-to-download-planet-diff-files/108854/10
Thanks for linking me here @mmd-osm !
I also get a lot of connection timeouts when using OSMOSIS and other tools for minutely and changeset diffs .
Today I temporarily switched to https://download.openstreetmap.fr/replication/planet/minute
It appears there may have been an issue with apache on the planet.openstreetmap.org webserver. The log was being flooded by AH03490: scoreboard is full, not at MaxRequestWorkers.Increase ServerLimit.
but apache and server otherwise appeared ok.
I have restarted apache and the logged error has gone away for now.
I can confirm that the issue is gone.
Thank you very much, @Firefishy !
I have not had a single timeout in the last 15 minutes. At least my services are running normally again and the downloads are available as fast as usual.
For your information: At least on my server, the timeouts described here occur about every three months. As mentioned, they disappear after about 24 hours. I don't know if the Apache has been restarted or something.
I suspect this is due to a faulty version of apache, we run a custom build to workaround some other apache bugs. We move back to distro release in Debian 12 and/or Ubuntu 24.04
I don't think it's custom as such, it's just a backport of a later version.
Closing. If the issue returns feel free to re-open ticket.
Osmosis fails to download minute diffs from the planet server. Not all, but most update attempts run into a connection timeout.
Lowering the
maxInterval
in configuration.txt to 60s helps a bit, but increasing to 1h always results a timeout. Checked on my production server and local PC, the result was the same.I first noticed the issue on 2024-06-09 at 0:15 UTC.