Closed pnorman closed 8 years ago
This seems to be in the wrong place, unless you're seriously suggesting that we turn all that into a chef recipe...
Personally I really don't want to get involved with doing something that insanely complicated and risky.
We have a third tile server coming hopefully in a matter of weeks at which point my plan was to reimport everything on all three servers anyway. I was assuming that would be with the new style but if it has to be with the old one then fine.
Personally I really don't want to get involved with doing something that insanely complicated and risky.
Reclustering and reindexing is standard maintenance. In the future we will be able to do so regularly?
It sounds like, from the above instructions, that this can be done "online" (apart from stopped diffs) and doesn't interrupt rendering so could it potentially be automated?
If it's possible to write a script which does this, and incorporate that into a quarterly / bi-annual / annual run, then that would be great. It sounds like otherwise it's a lot of manual CLI work interspersed with periods of waiting.
Also, doing each table like this means it only requires the overhead of duplicating the largest table, right?
It sounds like, from the above instructions, that this can be done "online" (apart from stopped diffs) and doesn't interrupt rendering so could it potentially be automated?
Yes, doing it offline is much simpler.
Also, doing each table like this means it only requires the overhead of duplicating the largest table, right?
Yes, if there's enough room and IO capacity I'd do all the tables at once in parallel, but this way reduces the extra space required.
Well there's no way that something that complicated and long running can be run directly from chef so the only way it could be done is by writing a script that chef installs which runs from cron.
But it's so intricate that I'm really reluctant to run it automatically with no human supervision. I haven't read it all but the fact there are 13 steps and vast amounts of custom SQL scares the pants of me. What are the recovery measures if any step goes wrong would be my first question? What is the risk of breaking something in a way such that there is no way to back out?
In any case there are all sorts of non-automatable things in there like "check tiles are still rendering" and "wait for updates to complete".
But it's so intricate that I'm really reluctant to run it automatically with no human supervision. I haven't read it all but the fact there are 13 steps and vast amounts of custom SQL scares the pants of me.
Most of the SQL is taken from openstreetmap-carto.
What are the recovery measures if any step goes wrong would be my first question? What is the risk of breaking something in a way such that there is no way to back out?
Recovery steps are there, but not clear enough.
Based on feedback, I'll write a script that handles the SQL. I think I can even handle locking in case updates were not stopped.
Based on feedback, I'll write a script that handles the SQL. I think I can even handle locking in case updates were not stopped.
Great! Thanks.
We do run many "long-running" processes on various servers; database dumps, planet file conversions, etc... in an automated, regular fashion. So we just need to try and make sure that it interferes with the regular rendering as little as possible, and doesn't wedge itself too often.
I was wondering if it's possible to wait for osm2pgsql
to finish, and stop it from running, by watching and writing its lock file? That would mean there's no need to disable its cron, which is what I usually do when I'm stopping it manually.
The first time I did a reclustering, it reduced I/O by 80% :)
Then I can see I/O slowly going up again because of fragmented data in postgresl pages. So I'm doing that every 6 months and usually get I/O divided by 2.
I'm now stopping the updates, recluster one table, let updates run again, then do another one, etc... so all the process is done online and updates are not stopped too long.
Regarding slim tables, there is a huge benefit to reindex planet_osm_ways... its GIN index gets larger than the data part !
I'm also planning to test pg_repack which is supposed to do the same as RECLUSTER but without requiring exclusive access lock on the table.
Don't forget the GRANT on the new tables ;)
On more step is ext4 defrag on the postgresql files... it helps the kernel merging I/Os to more than 8KB pages which even for many SSD are very small chunks of data where they are not so efficient. This is theorical, I've not checked the real benefit.
Thanks! That's the first time I've heard of pg_repack and it looks really useful. It sounds like it might be an excellent option for regular online re-clustering. Please let us know how the testing goes.
I was wondering if it's possible to wait for osm2pgsql to finish, and stop it from running, by watching and writing its lock file? That would mean there's no need to disable its cron, which is what I usually do when I'm stopping it manually.
I thought updates were a daemon, not a cron job?
Writing the lock file would stop it if you write it between runs. If you write it during a run, it'll do nothing since it'll clean up at the end.
Thanks! That's the first time I've heard of pg_repack and it looks really useful.
pg_repack won't work for the rendering tables as it requires a UNIQUE index on a NOT NULL column. It would be useful for the slim tables, and on the API database.
I thought updates were a daemon, not a cron job?
Yup, you're right, it is. Although it's very simple so would be easy to convert to cron if we wanted. But it seems like there's no reason to at the moment.
Writing the lock file would stop it if you write it between runs. If you write it during a run, it'll do nothing since it'll clean up at the end.
What I had in mind was something like:
osm2pgsql.lock
. If it already exists, wait on it being deleted (with inotify or something) and retry until it exists because we created it.osm2pgsql
won't run.The only, rather major, flaw is that this isn't how we run osm2pgsql
at all! So it looks like table-level exclusive locking would work better, assuming that osm2pgsql will just wait for the lock rather than error.
pg_repack won't work for the rendering tables as it requires a UNIQUE index on a NOT NULL column.
That's a shame. I'm almost tempted to see how long an ALTER TABLE ADD COLUMN tmp_id bigserial
would take, but the answer is probably "too long". And dropping it afterwards might just leave a bunch of bloat in its place.
According to @cquest the OSM FR servers gain over 25%-50% rendering throughput when they recluster them after about a year by reducing table and index bloat. This can be done without a full outage and only stopping updates, but we probably want to wait until #79 is done to increase capacity. The reclustering depends on database IO and CPU, which are not maxed out.
The overall pan is to create a new copy of tables, build new indexes, then replace old tables. Because update frequency is more important for osm.org than other hosts, I'd recommend doing it slightly differently. Instead of reclustering all the tables, do one table, resume updates and let them catch up, do another, etc.
Starting with the points table and progressing by table size minimizes the disk usage. I think there's enough free space that it doesn't matter, but this is a best practice.
Process
My recommendation is the following is done on both servers, starting with whichever has gone the longest since the initial import
Record the results of
\dt+
and\di+
. It would also be useful to have the results of the following SQL for future planning purposesStart by creating a schema to do work in
Starting with the smallest table, recluster it into the new schema.
Create indexes. The indexes here are the recommended ones for OpenStreetMap Carto. If you want to use others you can.
Replace the table in the public schema in a transaction, keeping the old one
Drop the old table
For planet_osm_roads
Test, then
For planet_osm_line, resume updates, wait for updates to catch up, then
Test then
Polygons will take the longest. Resume updates and let them catch up, then stop them and
Test then
\dt+
and\di+
again.Ref: http://paulnorman.ca/blog/2016/06/improving-speed-with-reclustering/
Other options
REINDEX
statement which is much simpler.maintainance_work_mem
should probably be increasedRollback
In case of a problem a rollback can be done by restoring the table from the backup schema.
If diffs are mistakenly restarted early the state file needs to be reset and diffs re-run.
Why not wait for a reimport?
The OpenStreetMap Carto Lua branch which will require a reimport with hstore is not out of development. We have a few open issues before we can merge and are lacking in deveoper time for these issues. Once the lua branch is merged we will still be releasing 2.x releases which will work with the old database to allow time to change over.
Doing a reimport with the current settings is in some ways better, but requires either a full outage of the server, a fair amount of database disk space, or the possibility of updates being down for an extended time[1], and the certainty that updates will be stopped for about a day.
[1] If the old DB slim tables are dropped this saves room, but stops any updates on the old DB
Time required
I'm running a test on the server for testing old-style multipolygons. It's got faster single-threaded performance and absurdly faster drives, but it should give an indication. I'll add times when it's done.