osm2pgsql-dev / osm2pgsql

OpenStreetMap data to PostgreSQL converter
https://osm2pgsql.org
GNU General Public License v2.0
1.51k stars 474 forks source link

Very slow node parsing on updates (caused by PG's jit) #1045

Closed cquest closed 4 years ago

cquest commented 4 years ago

On 2 worldwide rendering servers, I'm facing a very slow parsing of nodes during updates. I've moved to postgresql 12 recently.

These servers are running 0.96 from the osmadmins ppa, with flatnodes and the whole PG database on SSD.

Here is a log extract:

Node stats: total(63541), max(7100611033) in 719s
Way stats: total(7330), max(760101527) in 15s
Relation stats: total(103), max(10525425) in 2s
...
Osm2pgsql took 767s overall

Node parsing takes 90% of the overall update time. Am I the only one ?

On a third worldwide rendering server, I'm still running a very old 0.86 and the node parsing is way faster !

pnorman commented 4 years ago

What is your osm2pgsql command line, and does it happen with recent versions?

cquest commented 4 years ago

I've checked the update logs we keep on each server which contains timing for each step (osmosis, osm2pgsql, tile expiration) It seems to confirm the change since the PG upgrade from 10 to 12.

So... I've compared the postgresql.conf files and there's only minor differences and nothing that could explain the change.

arvindkartik commented 4 years ago

Slowdown potentially related to fix for #1014

I'd expect a slowdown because deletes are now done sequentially with the data insertion to avoid deadlocks. Before the deletes where done in a different thread via a different connection. I've down a quick measurement with a daily diff. Interestingly, I see quite a significant slowdown when fsync and synchronous commits are off but none when running postgres with fsync. I'll check nonetheless if there is room for optimization of the deletes.

cquest commented 4 years ago

The full osm2pgsql command line...

osm2pgsql diff.osc --number-processes=4 -a -C 64 -k -m -G -s -S ./osm2pgsql-choosen.style -d osm --flat-nodes /ssd/osm2pgsql/flat-nodes.raw --keep-coastlines --tag-transform-script ./config/activate-relation-type-waterway.lua --prefix=planet_osm

I'll do more tests on my workstation... the servers have enough troubles to test news things on them ;) A quick test made yesterday with 0.96 and PG12 gave the same behaviour on my workstation. I'll check with 1.2.1 and PG12, then with PG10...

cquest commented 4 years ago

Here are some test results, all done on the same machine (24 cores, nvme SSD) on a France extract with 1 daily diff:

osm2pgsql PG nodes ways rels
0.96 10 60s 33s 23s
1.2.1 10 11s 4s 6s
0.96 12 575s 43s 32s
1.2.1 12 276s 4s 6s

I've looked at postgres changelog, and maybe found something to look at. PG12 introduced a new parameter (plan_cache_mode) which is said to "allow control over when generic plans are used for prepared statements". Instead of the "auto" default, I set it to force_generic_plan, it was event slower, then to force_custom_plan and got the same result as auto/default. So... it really seems linked and related to some internal change in PG 12 :(

lonvia commented 4 years ago

How did you upgrade form 10 to 12? Did you do a reimport or a database upgrade? Is this postgres 12.0 or 12.1? Did your tests use the same command line as above? If not do you mind running the same test against a standard import with the example lua to exclude that the issue is related to your specific database setup?

akara13 commented 4 years ago

Same problem here, seems to have something to do with changes introduced in PG 11, since problems with slow node parsing during updates were already reported on osm-dev mailing list after update from PG 10 to PG 11.

I was doing a full planet import for tile rendering in a fresh PG 12.1 instance, this import ran as expected and didn't show decreased node parsing speed as when compared to previous full imports.

Before we had osm2pgsql 0.9.4 and PG 9.6, daily updates took 2-3 hours, now they are up to more than 10 hours, due to slow node parsing. Also tried the old osm2pgsql 0.9.4 for updates, still slow node parsing. The parameters for osm2pgsql didn't change, using osm carto style and tag transforms. The new PG 12.1 instance is using the same configuration as before, so seems to be related to some PG change introduced in PG 11.

cquest commented 4 years ago

I've just check what was different in my PG config files between 10 and 12 and I've found what causes the slowdown... JIT !

jit=on (by default)

just set it to

jit=off

And I got 11s for the node parsing, like with PG10 :)

cquest commented 4 years ago

How did you upgrade form 10 to 12? Did you do a reimport or a database upgrade? Is this postgres 12.0 or 12.1? Did your tests use the same command line as above? If not do you mind running the same test against a standard import with the example lua to exclude that the issue is related to your specific database setup?

On the rendering servers, I did the upgrade using pg_upgradecluster. All the tests where done with a freshly imported extract file, exactly the same options for osm2pgsql in all cases...

But as you can see above, the cause it definitely not the upgrade, but the new jit feature of PG which has a very bad side effect on osm2pgsql prepared statements (I presume).

I'm checking in a few minutes on the rendering servers if jit=off solves the problem like on my workstation...

BINGO: confirmed :)

akara13 commented 4 years ago

Hm, I turned JIT off in our PG 12.1 instance, but couldn't see any positive impact on node parsing speed during updates, still at 0.1K/s.

cquest commented 4 years ago

Have you reloaded PG config ?

All my servers are ok since I changed+reloaded the config with jit=off After searching PG bug mailing list, it looks that's quite a common issue.

akara13 commented 4 years ago

Sure reloaded the config. Even restartet the server, just in case.

Anything else in PG configuration, that may affect update performance? My config is: jit = off shared_buffers = 4GB effective_cache_size = 48GB work_mem = 128MB maintenance_work_mem = 4GB random_page_cost = 1.0 wal_buffers = 16MB wal_level = minimal max_wal_senders = 0

cquest commented 4 years ago

Make sure you ANALYZE the tables after a PG upgrade, this can really help the planner.

FYI, I've submit a bug to the postgres dev team... here is the thread on the bugs mailing list:

https://www.postgresql.org/message-id/16183-64843a4ebc44d9d2@postgresql.org

lonvia commented 4 years ago

@cquest thanks for the follow up. I'm copying the query analysis over here:

osm=# PREPARE mark_ways_by_node(bigint) AS select id from 
planet_osm_ways WHERE nodes && ARRAY[$1];
PREPARE
osm=# explain analyze execute mark_ways_by_node(1836953770);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
  Bitmap Heap Scan on planet_osm_ways  (cost=2468.37..305182.32 
rows=301467 width=8) (actual time=5.775..5.905 rows=2 loops=1)
    Recheck Cond: (nodes && '{1836953770}'::bigint[])
    Heap Blocks: exact=2
    ->  Bitmap Index Scan on planet_osm_ways_nodes (cost=0.00..2393.00 
rows=301467 width=0) (actual time=0.512..0.512 rows=2 loops=1)
          Index Cond: (nodes && '{1836953770}'::bigint[])
  Planning Time: 3.667 ms
  JIT:
    Functions: 4
    Options: Inlining false, Optimization false, Expressions true, 
Deforming true
    Timing: Generation 0.466 ms, Inlining 0.000 ms, Optimization 0.354 
ms, Emission 4.634 ms, Total 5.454 ms
  Execution Time: 30.393 ms
(11 rows)

My reading of this is that using prepared statements is not directly responsible for the slow down. The true culprit is the badly balanced way-node index. Even if we get rid of those and simply execute the SQL at the appropriate place, it still would try to use JIT when it shouldn't.

Two things we should do here:

1) Reduce the number of calls to mark_ways_by_node() by querying multiple nodes at once. This should be fairly easy to do and lead to a speed-up of updates in general. 2) Get rid of the way-node index and use a more efficient custom implementation instead. We've had some discussions about this in the past but so far none of the ideas was really practical. So this is really far off for now.

akara13 commented 4 years ago

As mentioned above, turning off jit doesn't improve node update performance on my system. This is a fresh PG 12.1 installation with a full planet import. When using the explain analyze above, my row estimates are even more off than in your example, so that the execution is even using two parallel workers:

osm=> PREPARE mark_ways_by_node(bigint) AS SELECT id FROM
planet_osm_ways WHERE nodes && ARRAY[$1];
PREPARE
osm=> EXPLAIN ANALYZE EXECUTE mark_ways_by_node(1836953770);
QUERY PLAN
---------------------------------------------------------------------------------------------
 Gather  (cost=26807.42..5395421.99 rows=3177861 width=8)
              (actual time=0.464..7.809 rows=2 loops=1)
   Output: id
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Bitmap Heap Scan on public.planet_osm_ways
           (cost=25807.42..5076635.89 rows=1324109 width=8)
           (actual time=0.108..0.109 rows=1 loops=3)
         Output: id
         Recheck Cond: (planet_osm_ways.nodes && '{1836953770}'::bigint[])
         Heap Blocks: exact=2
         Worker 0: actual time=0.022..0.022 rows=0 loops=1
         Worker 1: actual time=0.022..0.022 rows=0 loops=1
         ->  Bitmap Index Scan on planet_osm_ways_nodes
                 (cost=0.00..25012.96 rows=3177861 width=0)
                 (actual time=0.266..0.267 rows=3 loops=1)
               Index Cond: (planet_osm_ways.nodes && '{1836953770}'::bigint[])
 Settings: effective_cache_size = '48GB', jit = 'off', random_page_cost = '1', work_mem = '128MB'
 Planning Time: 0.096 ms
 Execution Time: 7.851 ms
(15 rows)

This was with a daily update running in parallel. After the update has finished, I will ANALYZE the tables and see if it helps.

cquest commented 4 years ago

My reading of this is that using prepared statements is not directly responsible for the slow down. The true culprit is the badly balanced way-node index. Even if we get rid of those and simply execute the SQL at the appropriate place, it still would try to use JIT when it shouldn't.

My understanding is that something is wrong at PG level: estimated plan cost.

In my EXPLAIN ANALYSIS (with or withour jit) rows returned by the index bitmap scan are estimated at 300k where we only got 2. That's a factor of 150 000 x ! If I understand the query, it returns the id of the ways connected to a single node. When would a node be connected to 300000 ways ?

The guess from statistics seems plain wrong 100% of the time. This is causing PG to take bad decisions in the query planning, and activate jit when it should not, etc... to me it looks like bad statistics maintained (or computed) on GIN index or something like that, but I don't know enough about PG internals to do more than a guess.

On @akara13 EXPLAIN, it's even worse (3.1M, 10x more) causing the planner to launch a parallel worker ! Launching a worker has a cost, and in this case, this may explain the slowdown despite the jit=off. I think we have different statistics because I did my test on an extract (France), and he may be doing his EXPLAIN on a full planet db. My advice for @akara13 : try to disable parallel workers...

About improving the update process... I got a couple of ideas after looking at osm2pgsql source code to try to understand the original problem. Better to open a separate https://github.com/openstreetmap/osm2pgsql/issues/1046 issue for that ;)

akara13 commented 4 years ago

Thanks @cquest for the suggestions. I think I will first try to ANALYZE the table after the currently running daily update has finished and see what EXPLAIN ANALYZE says afterwards since planet_osm_ways was last analyzed in December. That may explain the totally wrong row estimate.

lonvia commented 4 years ago

The estimates for the way-node index have always been widely off. It The index is so huge and badly balanced that postgres cannot make sense of it. It just never mattered because the query is simple enough that postgres always figured that it should use the index as opposed to a sequential scan, which were the only choices then. We are running now into trouble only because postgres tries to be even more clever and use such advanced features as workers and jit.

There might be a way to influence the statistics. Newer versions of postgres have more advanced tools for that, see Postgresql Extended Statistics.

akara13 commented 4 years ago

I have analyzed planet_osm_ways, row estimate is still ridicously high at 3.1M. With jit = 'off', max_parallel_workers_per_gather = '0' explain analyze now gives an execution time of 0.079 ms. I've added env PGOPTIONS="-c jit=off -c max_parallel_workers_per_gather=0" to the osm2pgsql call to make shure jit and parallel query are turned off for updates. Thanks everybody for helping.

cquest commented 4 years ago

The way_nodes index is definitely a problem... so slow to build, larger than the data itself, bloating more than others... but hard to replace by something more efficient.

I've not looked yet at how imposm deals with that.

rustprooflabs commented 4 years ago

The row estimates being off is definitely an issue. There was a simple patch proposed on the Postgres bugs list, I was able to test it today and found it does wonders for the estimated row counts.

https://www.postgresql-archive.org/BUG-16183-PREPARED-STATEMENT-slowed-down-by-jit-tp6117538p6118920.html

I tested on a much smaller data set of just the Colorado region, the estimates were showing >10,000 when actual was 1. With the patch applied it estimates 68. Assuming the improvements in estimates scale to the larger data set this should help the planner choose the correct path regarding JIT / parallel query.

lonvia commented 4 years ago

1256 disables JIT and parallel workers for the connections where intarray indexes are used. This should fix the slowness. We can't really require patches to Postgres for running osm2pgsql.