openstreetmap / openstreetmap-website

The Rails application that powers OpenStreetMap
https://www.openstreetmap.org/
GNU General Public License v2.0
2.21k stars 918 forks source link

Future of db/functions? #3110

Closed mmd-osm closed 1 year ago

mmd-osm commented 3 years ago

Now that we have migrated from osmosis to osmdbt for replication diffs, and quadtile calculation resides in its own Gem, we could think about removing db/functions altogether, like shown in https://github.com/mmd-osm/openstreetmap-website/commit/d5c3828db15f5aa1e0fd90baef486e8966fdfaa4

Also, we started a discussion on this topic in https://github.com/openstreetmap/openstreetmap-website/issues/3081#issuecomment-773210432, which can continue here now.

Before creating a pull request for this topic, I'd like to get some early feedback, if this really makes sense.

Some points to consider:

tomhughes commented 3 years ago

Please don't remove anything yet - the production database still has those functions for now.

gravitystorm commented 3 years ago

Before creating a pull request for this topic, I'd like to get some early feedback, if this really makes sense.

I'm always in favour of cleanup, and removing complexity in particular :smile:

* People might still want to try out osmosis replication in their dev environment. Is this in scope for this repo?

We should push the concept of setting up replication out of this repo. I think the best approach is for osmosis to ship the functions necessary for osmosis-based replication, and osmdbt to ship whatever it needs too. We should update CONFIGURE.md to indicate there are two options to consider (osmosis and osmdbt) depending on your PostgreSQL verison and point people to the right place, but otherwise remove ourselves from that.

* do we still need those old db functions to run old migrations?

This is the core of the discussion, I think. We need to keep the db functions for as long as they are used in the old migrations.

We could refactor the old migrations, to remove the functions. But this raises the question of what the old migrations are there for? Two possibilities:

I believe we should just ditch the old migrations, say every migration over 2 years old. If someone needs them (if they have a really old database dump, or they only update their code every 3 years) then they are in source control anyway, just like our old unused code is. We can keep the reasonably new migrations to help other developers and deployments who don't update every month.

The other alternative is to keep the old migrations, but continue to refactor them over the years, either for things like this, or rubocop, or to work with new database systems, or whatever. But I think this is just makework, like keeping old code in comments or like maintaining amf_controller even after we remove it from the API. The code in migrations is old, it has done its job, it's their in the git history if anyone needs it.

I think removing the old migrations will mean we need some tweaks to CI and documentation, e.g. to load empty databases from the structure.sql (and use commands like db:structure:load) instead of migrations. There is also the 'double check' that we do in CI where we test that the sum of the migrations is equal to the structure.sql, so we need an alternative (e.g. a check that old_structure.sql + migrations == structure.sql).

tordans commented 3 years ago

I believe we should just ditch the old migrations, say every migration over 2 years old.

What we do at betterplace.org is: We run https://github.com/jalkoby/squasher about once a year or so which gives us a single migration file which creates the DB in just the way that it would look like after running all single migrations. This still allows to run migrations and end up with a correct DB setup.

I did in fact try out to use squasher around last December/January for this project but ran into errors that looked like they are PostgreSQL related … or maybe had something to do with special indexes or primay keys(?) (It might be the same as the composite_primary_keys gem you guys talk about in the Rails6.1 branch).

tomhughes commented 3 years ago

We have various extra migration features monkey patched into rails (by https://github.com/openstreetmap/openstreetmap-website/blob/master/config/initializers/migrate.rb) which are probably confusing it.

mmd-osm commented 3 years ago

I believe tile_for_point is the only function that has been used for back-filling data as part of a migration. A "grep" shows 3 occurrences of that function, all of them are older than 3 years.

$ grep -rin "tile_for_point"
006_tile_nodes.rb:16:             tile_for_point(CAST(ROUND(latitude * 10000000) AS INTEGER),
005_tile_tracepoints.rb:12:      Tracepoint.update_all("latitude = latitude * 10, longitude = longitude * 10, tile = tile_for_point(latitude * 10, longitude * 10)")
20180204153242_tile_users.rb:10:      User.update_all("home_tile = tile_for_point(cast(round(home_lat * #{GeoRecord::SCALE}) as integer), cast(round(home_lon * #{GeoRecord::SCALE}) as integer))")
gravitystorm commented 3 years ago

I did in fact try out to use squasher around last December/January for this project but ran into errors that looked like they are PostgreSQL related … or maybe had something to do with special indexes or primay keys(?) (It might be the same as the composite_primary_keys gem you guys talk about in the Rails6.1 branch).

I got squasher working today - you can see the results at https://github.com/gravitystorm/openstreetmap-website/commit/9890f754c012266617000c1fca025a93bd432ac4 . The trick was to add a create_extension "btree_gist" to the first migration, since squasher creates its own database automatically, and that extension is needed for some index stuff later in the 028 migration. The primary key and enum monkey patching doesn't seem to cause any problems.

So squasher seems like a reasonable compromise between removing all the old migrations, but without having to change any workflows, since rake db:migrate will still work on an empty database. I'm interested to hear what other people think?

Additional points to consider:

tomhughes commented 3 years ago

The other customisations we have are the ability to create/alter/drop primary keys which I don't think rails can do other than as part of creating a table and then only for a single field (unless CPK changes that) and support for :columns when creating an index to specify exotic indexes with expressions.

mmd-osm commented 3 years ago

I'm also interested if we can move away from SQL format db dumps, which are quite verbose files.

I’m aware of at least 3 projects that depend on that file being available for some very occasional updates: osmdbt, cgimap and osmosis. Those would use their own copy of structure.sql for setting up a unit test database. If there’s some way to create the SQL structure dump on demand, that would still be fine, though.

mmd-osm commented 3 years ago

First bit addressed in https://github.com/openstreetmap/openstreetmap-website/commit/81c3ebe03c229c980908502f3735cdf1636dd5cc