openstreetmap / operations

OSMF Operations Working Group issue tracking
https://operations.osmfoundation.org/
99 stars 13 forks source link

API DB size growth has doubled #931

Open pnorman opened 1 year ago

pnorman commented 1 year ago

In May 2021 the API DB growth was 13% year on year. I just measured it as 25% or so[1]. This is a significant increase, and means having to allocate 3x current usage instead of 2x current usage when buying a new machine to last 5 years, so I'd like to figure out why before looking at disk costs.

Since I last checked the change, we've had a significant technical change with the upgrade to PG15, which saved several TB from index space savings. There's a few possibilities for why we're gaining in space so much faster

@tomhughes could you run reindexdb -d openstreetmap --concurrently -j 8 and both before and after the reindex, run the following SQL and save it to a file?

SELECT c.relname as "Name",
  c2.relname as "Table",
  pg_table_size(c.oid) as "Index size",
  pg_table_size(c2.oid) as "Table size"
FROM pg_catalog.pg_index i 
  JOIN pg_catalog.pg_class c ON i.indexrelid = c.oid
  JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
  JOIN pg_catalog.pg_class c2 ON i.indrelid = c2.oid
WHERE n.nspname <> 'pg_catalog' AND n.nspname <> 'information_schema' AND n.nspname !~ '^pg_toast';

Given snap-01's load of under 50%, -j8 should be safe and it won't block any transactions while it runs.

If the growth rate change is due to index bloat, it will clean it up and reveal it with the sizes. If not, well, it'll save a bit of space.

[1]: 10.9TB to 11.6TB filesystem change over 100 days for 25.5% or 10.47TB to 11.62TB DB size over 0.5 year for 23.2%

tomhughes commented 1 year ago

The -j8 might be a bit dangerous - not because of the CPU but because of the potential peak disk usage from the all the temporary data during big index rebuilds.

I can certainly do that for most tables but I may need to separate out some of the big tables and do them a bit more conservatively.

tomhughes commented 1 year ago

Here's the before data:

                              Name                              |             Table              |  Index size   |  Table size   
----------------------------------------------------------------+--------------------------------+---------------+---------------
 acls_k_idx                                                     | acls                           |        163840 |       1761280
 acls_pkey                                                      | acls                           |        466944 |       1761280
 index_acls_on_address                                          | acls                           |        884736 |       1761280
 index_acls_on_domain                                           | acls                           |        671744 |       1761280
 index_acls_on_mx                                               | acls                           |        155648 |       1761280
 active_storage_attachments_pkey                                | active_storage_attachments     |     613277696 |    2171207680
 index_active_storage_attachments_on_blob_id                    | active_storage_attachments     |     612917248 |    2171207680
 index_active_storage_attachments_uniqueness                    | active_storage_attachments     |    1750810624 |    2171207680
 active_storage_blobs_pkey                                      | active_storage_blobs           |     738148352 |    6423920640
 index_active_storage_blobs_on_key                              | active_storage_blobs           |    1667653632 |    6423920640
 active_storage_variant_records_pkey                            | active_storage_variant_records |      13033472 |      44195840
 index_active_storage_variant_records_uniqueness                | active_storage_variant_records |      33669120 |      44195840
 ar_internal_metadata_pkey                                      | ar_internal_metadata           |         16384 |         49152
 changeset_comments_pkey                                        | changeset_comments             |      24125440 |     372072448
 index_changeset_comments_on_changeset_id_and_created_at        | changeset_comments             |      34824192 |     372072448
 index_changeset_comments_on_created_at                         | changeset_comments             |      24125440 |     372072448
 changeset_tags_id_idx                                          | changeset_tags                 |    7721467904 |   43894415360
 changesets_bbox_idx                                            | changesets                     |   10194657280 |   13502210048
 changesets_closed_at_idx                                       | changesets                     |    3549102080 |   13502210048
 changesets_created_at_idx                                      | changesets                     |    3312992256 |   13502210048
 changesets_pkey                                                | changesets                     |   74331930624 |   13502210048
 changesets_user_id_created_at_idx                              | changesets                     |    4796899328 |   13502210048
 changesets_user_id_id_idx                                      | changesets                     |    4829036544 |   13502210048
 index_changesets_subscribers_on_changeset_id                   | changesets_subscribers         |    3152216064 |    6491078656
 index_changesets_subscribers_on_subscriber_id_and_changeset_id | changesets_subscribers         |    4627464192 |    6491078656
 client_applications_pkey                                       | client_applications            |        286720 |       2719744
 index_client_applications_on_key                               | client_applications            |       1236992 |       2719744
 index_client_applications_on_user_id                           | client_applications            |        270336 |       2719744
 current_node_tags_pkey                                         | current_node_tags              |   35098607616 |   54820200448
 current_nodes_pkey1                                            | current_nodes                  |  225180450816 |  833129971712
 current_nodes_tile_idx                                         | current_nodes                  |   78964768768 |  833129971712
 current_nodes_timestamp_idx                                    | current_nodes                  |  153879134208 |  833129971712
 current_relation_members_member_idx                            | current_relation_members       |    4549951488 |    9775022080
 current_relation_members_pkey                                  | current_relation_members       |   12512206848 |    9775022080
 current_relation_tags_pkey                                     | current_relation_tags          |    1879195648 |    3276980224
 current_relations_pkey                                         | current_relations              |     327172096 |    1025597440
 current_relations_timestamp_idx                                | current_relations              |     281550848 |    1025597440
 current_way_nodes_node_idx                                     | current_way_nodes              |  241501855744 |  570199367680
 current_way_nodes_pkey                                         | current_way_nodes              |  350738210816 |  570199367680
 current_way_tags_pkey                                          | current_way_tags               |   89976496128 |  140468199424
 current_ways_pkey                                              | current_ways                   |   24399601664 |   74808958976
 current_ways_timestamp_idx                                     | current_ways                   |   17183817728 |   74808958976
 delayed_jobs_pkey                                              | delayed_jobs                   |         16384 |     232292352
 delayed_jobs_priority                                          | delayed_jobs                   |         16384 |     232292352
 diary_comment_user_id_created_at_index                         | diary_comments                 |       1744896 |      25124864
 diary_comments_entry_id_idx                                    | diary_comments                 |       1744896 |      25124864
 diary_comments_pkey                                            | diary_comments                 |       1253376 |      25124864
 diary_entries_pkey                                             | diary_entries                  |       9052160 |     915988480
 diary_entry_created_at_index                                   | diary_entries                  |       9052160 |     915988480
 diary_entry_language_code_created_at_index                     | diary_entries                  |      12746752 |     915988480
 diary_entry_user_id_created_at_index                           | diary_entries                  |      12673024 |     915988480
 diary_entry_subscriptions_pkey                                 | diary_entry_subscriptions      |      13107200 |      38666240
 index_diary_entry_subscriptions_on_diary_entry_id              | diary_entry_subscriptions      |       9191424 |      38666240
 friends_pkey                                                   | friends                        |       3997696 |       9674752
 index_friends_on_user_id_and_created_at                        | friends                        |       3268608 |       9674752
 user_id_idx                                                    | friends                        |       3022848 |       9674752
 points_gpxid_idx                                               | gps_points                     |  175209054208 | 1848585453568
 points_tile_idx                                                | gps_points                     |  180966776832 | 1848585453568
 gpx_file_tags_gpxid_idx                                        | gpx_file_tags                  |     108642304 |     423780352
 gpx_file_tags_pkey                                             | gpx_file_tags                  |     130113536 |     423780352
 gpx_file_tags_tag_idx                                          | gpx_file_tags                  |      56803328 |     423780352
 gpx_files_pkey                                                 | gpx_files                      |     211378176 |    1476960256
 gpx_files_timestamp_idx                                        | gpx_files                      |     204046336 |    1476960256
 gpx_files_user_id_idx                                          | gpx_files                      |      77078528 |    1476960256
 gpx_files_visible_visibility_idx                               | gpx_files                      |      71901184 |    1476960256
 index_issue_comments_on_issue_id                               | issue_comments                 |        253952 |       1794048
 index_issue_comments_on_user_id                                | issue_comments                 |        106496 |       1794048
 issue_comments_pkey                                            | issue_comments                 |        294912 |       1794048
 index_issues_on_assigned_role                                  | issues                         |        204800 |       2908160
 index_issues_on_reportable_type_and_reportable_id              | issues                         |        860160 |       2908160
 index_issues_on_reported_user_id                               | issues                         |        540672 |       2908160
 index_issues_on_status                                         | issues                         |        204800 |       2908160
 index_issues_on_updated_by                                     | issues                         |        204800 |       2908160
 issues_pkey                                                    | issues                         |        614400 |       2908160
 languages_pkey                                                 | languages                      |         16384 |         57344
 languages_pkey_old                                             | languages                      |         16384 |         57344
 messages_from_user_id_idx                                      | messages                       |       9732096 |    1383997440
 messages_pkey                                                  | messages                       |      25280512 |    1383997440
 messages_to_user_id_idx                                        | messages                       |      13287424 |    1383997440
 node_tags_pkey                                                 | node_tags                      |  139547262976 |  211070828544
 nodes_changeset_id_idx                                         | nodes                          |   95281111040 | 1202013102080
 nodes_pkey                                                     | nodes                          |  421277581312 | 1202013102080
 nodes_tile_idx                                                 | nodes                          |  102119120896 | 1202013102080
 nodes_timestamp_idx                                            | nodes                          |  209760067584 | 1202013102080
 index_note_comments_on_author_id_and_created_at                | note_comments                  |     278839296 |    1287790592
 index_note_comments_on_body                                    | note_comments                  |     483770368 |    1287790592
 index_note_comments_on_created_at                              | note_comments                  |     187031552 |    1287790592
 note_comments_note_id_idx                                      | note_comments                  |     158883840 |    1287790592
 note_comments_pkey                                             | note_comments                  |     187031552 |    1287790592
 notes_created_at_idx                                           | notes                          |      92856320 |     395337728
 notes_pkey                                                     | notes                          |     101941248 |     395337728
 notes_tile_status_idx                                          | notes                          |     110641152 |     395337728
 notes_updated_at_idx                                           | notes                          |     100352000 |     395337728
 index_oauth_access_grants_on_application_id                    | oauth_access_grants            |       2375680 |      52961280
 index_oauth_access_grants_on_resource_owner_id                 | oauth_access_grants            |       4358144 |      52961280
 index_oauth_access_grants_on_token                             | oauth_access_grants            |      19521536 |      52961280
 oauth_access_grants_pkey                                       | oauth_access_grants            |       5685248 |      52961280
 index_oauth_access_tokens_on_application_id                    | oauth_access_tokens            |      40124416 |     975683584
 index_oauth_access_tokens_on_refresh_token                     | oauth_access_tokens            |      89407488 |     975683584
 index_oauth_access_tokens_on_resource_owner_id                 | oauth_access_tokens            |     130195456 |     975683584
 index_oauth_access_tokens_on_token                             | oauth_access_tokens            |     464429056 |     975683584
 oauth_access_tokens_pkey                                       | oauth_access_tokens            |     115081216 |     975683584
 index_oauth_applications_on_owner_type_and_owner_id            | oauth_applications             |         98304 |        704512
 index_oauth_applications_on_uid                                | oauth_applications             |        221184 |        704512
 oauth_applications_pkey                                        | oauth_applications             |         73728 |        704512
 index_oauth_nonces_on_nonce_and_timestamp                      | oauth_nonces                   |     142991360 |      93323264
 oauth_nonces_pkey                                              | oauth_nonces                   |      68960256 |      93323264
 index_oauth_tokens_on_token                                    | oauth_tokens                   |    1961107456 |    8270159872
 index_oauth_tokens_on_user_id                                  | oauth_tokens                   |     287670272 |    8270159872
 oauth_tokens_pkey                                              | oauth_tokens                   |     424263680 |    8270159872
 redactions_pkey                                                | redactions                     |         16384 |         65536
 relation_members_member_idx                                    | relation_members               |   47458426880 |  383724142592
 relation_members_pkey                                          | relation_members               |  363714404352 |  383724142592
 relation_tags_pkey                                             | relation_tags                  |   15138111488 |   22104006656
 relations_changeset_id_idx                                     | relations                      |     596942848 |    3418898432
 relations_pkey                                                 | relations                      |    1661059072 |    3418898432
 relations_timestamp_idx                                        | relations                      |     913424384 |    3418898432
 index_reports_on_issue_id                                      | reports                        |        737280 |       6168576
 index_reports_on_user_id                                       | reports                        |        425984 |       6168576
 reports_pkey                                                   | reports                        |        876544 |       6168576
 schema_migrations_pkey                                         | schema_migrations              |         16384 |         49152
 index_user_blocks_on_user_id                                   | user_blocks                    |        163840 |       7233536
 user_blocks_pkey                                               | user_blocks                    |        180224 |       7233536
 user_preferences_pkey                                          | user_preferences               |      36601856 |      81952768
 user_roles_id_role_unique                                      | user_roles                     |         16384 |         40960
 user_roles_pkey                                                | user_roles                     |         16384 |         40960
 user_tokens_pkey                                               | user_tokens                    |       6184960 |      12951552
 user_tokens_token_idx                                          | user_tokens                    |      12083200 |      12951552
 user_tokens_user_id_idx                                        | user_tokens                    |       7331840 |      12951552
 users_auth_idx                                                 | users                          |     486744064 |   27355463680
 users_display_name_idx                                         | users                          |     779476992 |   27355463680
 users_display_name_lower_idx                                   | users                          |     792592384 |   27355463680
 users_email_idx                                                | users                          |    1200881664 |   27355463680
 users_email_lower_idx                                          | users                          |    1178222592 |   27355463680
 users_home_idx                                                 | users                          |     177848320 |   27355463680
 users_pkey                                                     | users                          |     494460928 |   27355463680
 way_nodes_node_idx                                             | way_nodes                      |  405005434880 | 1727394054144
 way_nodes_pkey                                                 | way_nodes                      | 1211717287936 | 1727394054144
 way_tags_pkey                                                  | way_tags                       |  253548429312 |  353833230336
 ways_changeset_id_idx                                          | ways                           |   15595323392 |  128393117696
 ways_pkey                                                      | ways                           |   60320063488 |  128393117696
 ways_timestamp_idx                                             | ways                           |   31652945920 |  128393117696
(142 rows)
simonpoole commented 1 year ago

A quick check of the usual suspect wouldn't indicate any extraordinary growth https://taginfo.openstreetmap.org/keys/building#chronology

And element growth rate would seem to be only slightly ticking upward: https://osmstats.neis-one.org/?item=elements

But naturally API DB size growth could be effected by more versions being created, this could easily be caused by the trend in editing tools to not "batch" edits, more reverts etc. Maybe we could track growth of the 'non-current' tables for a while?

SomeoneElseOSM commented 1 year ago

But naturally API DB size growth could be effected by more versions being created

There was a major vandalism incident in the last 100 days where "previously created, previously dormant accounts" were able to remove name:ru (sometimes several times) from many objects worldwide (taginfo suggests 1.6M). The usage of that tag is small beer compared to "building" though (570M worldwide).

mmd-osm commented 1 year ago

What about GPS data? Is there any benefit of having more than 13 billion GPS points added by the top 2 contributors, which appears to be mostly automated uploads with an excessive amount of nodes (-> https://planet.openstreetmap.org/statistics/data_stats.html ). If I get the figures above right, we need about 2.2TB for GPS traces in total, or roughly 20% of the db size. This topic might be worth revisiting.

pnorman commented 1 year ago

What about GPS data? Is there any benefit of having more than 13 billion GPS points added by the top 2 contributors, which appears to be mostly automated uploads with an excessive amount of nodes (-> https://planet.openstreetmap.org/statistics/data_stats.html ). If I get the figures above right, we need about 2.2TB for GPS traces in total, or roughly 20% of the db size. This topic might be worth revisiting.

This issue is about the growth rate of size, not the current size. Have those contributors accelerated their uploads in the last year?

tomhughes commented 1 year ago

I think it's quite possible - there is sunnypilot and a couple of clones and a couple of new ones in India as well.

mmd-osm commented 1 year ago

https://web.archive.org/web/20220611051939/https://planet.openstreetmap.org/statistics/data_stats.html has captured a few stats from last year. It looks like the total number of gps points has more than doubled since then, ie. more than 50% of all gps points have been added during the last year only.

Given the amount of data involved here, it might have some impact on the overall YoY db growth rates.

pnorman commented 1 year ago

I had missed this among the other graphs, but it seems the most likely cause. image

If this the cause, the ongoing costs to the new types of usage of GPX traces are likely to be about €1200/year in deprecation from needing more or bigger drives across DB servers.

I'd let the reindex finish so we can be 100% sure, and it could help things anyways.

mmd-osm commented 1 year ago

In that case I’d propose to reach out to them and ask them to simplify their traces before uploading or do the simplification on our own. Quite frequently a 6000 point sunnypilot trace ends up with little more than 20 points, still with a maximum error of 1m.

It seems quite wasteful to me to store such overnoded data, and likely not a sustainable approach in the long run. Ideally, simplifying traces should also happen to the 13 billion nodes that have already been uploaded. If that’s not an option we should also look into deleting traces.

I’m not convinced this whole approach is adding much value for mappers. In some areas I’ve checked, JOSM had to download almost 200k trace points for an area as small as 300m x 300m. On the respective GPS tiles, a few dozen traces were completely covering a motorway, including a fair amount of the surrounding area.

https://wiki.openstreetmap.org/w/index.php?title=GPSBabel/Using_filters&mobileaction=toggle_view_desktop#Simplifying_tracks

tomhughes commented 1 year ago

There are actually two users doing the same things based on different forks of the same software:

There is a previous discussion at https://github.com/sunnyhaibin/sunnypilot/issues/133 where I dealt with sunnypilot.

tomhughes commented 1 year ago

Only about 200Gb was saved in total. Here's the after list:

                              Name                              |             Table              |  Index size   |  Table size   
----------------------------------------------------------------+--------------------------------+---------------+---------------
 acls_k_idx                                                     | acls                           |        163840 |       1761280
 acls_pkey                                                      | acls                           |        466944 |       1761280
 index_acls_on_address                                          | acls                           |        884736 |       1761280
 index_acls_on_domain                                           | acls                           |        671744 |       1761280
 index_acls_on_mx                                               | acls                           |        155648 |       1761280
 active_storage_attachments_pkey                                | active_storage_attachments     |     616505344 |    2201714688
 index_active_storage_attachments_on_blob_id                    | active_storage_attachments     |     616505344 |    2201714688
 index_active_storage_attachments_uniqueness                    | active_storage_attachments     |    1384865792 |    2201714688
 active_storage_blobs_pkey                                      | active_storage_blobs           |     617594880 |    6423920640
 index_active_storage_blobs_on_key                              | active_storage_blobs           |    1354768384 |    6423920640
 active_storage_variant_records_pkey                            | active_storage_variant_records |      12935168 |      44195840
 index_active_storage_variant_records_uniqueness                | active_storage_variant_records |      33857536 |      44195840
 ar_internal_metadata_pkey                                      | ar_internal_metadata           |         16384 |         49152
 changeset_comments_pkey                                        | changeset_comments             |      24092672 |     372277248
 index_changeset_comments_on_changeset_id_and_created_at        | changeset_comments             |      33783808 |     372277248
 index_changeset_comments_on_created_at                         | changeset_comments             |      24092672 |     372277248
 changeset_tags_id_idx                                          | changeset_tags                 |    7719813120 |   43956731904
 changesets_bbox_idx                                            | changesets                     |    9977987072 |   13502210048
 changesets_closed_at_idx                                       | changesets                     |    3152797696 |   13502210048
 changesets_created_at_idx                                      | changesets                     |    3149389824 |   13502210048
 changesets_pkey                                                | changesets                     |    3149520896 |   13502210048
 changesets_user_id_created_at_idx                              | changesets                     |    4426924032 |   13502210048
 changesets_user_id_id_idx                                      | changesets                     |    4427849728 |   13502210048
 index_changesets_subscribers_on_changeset_id                   | changesets_subscribers         |    3155730432 |    6491078656
 index_changesets_subscribers_on_subscriber_id_and_changeset_id | changesets_subscribers         |    4438130688 |    6491078656
 client_applications_pkey                                       | client_applications            |        286720 |       2727936
 index_client_applications_on_key                               | client_applications            |        827392 |       2727936
 index_client_applications_on_user_id                           | client_applications            |        278528 |       2727936
 current_node_tags_pkey                                         | current_node_tags              |   32210108416 |   55081156608
 current_nodes_pkey1                                            | current_nodes                  |  218953154560 |  833129971712
 current_nodes_tile_idx                                         | current_nodes                  |   73471311872 |  833129971712
 current_nodes_timestamp_idx                                    | current_nodes                  |  150855081984 |  833129971712
 current_relation_members_member_idx                            | current_relation_members       |    2728255488 |    9775022080
 current_relation_members_pkey                                  | current_relation_members       |    7126237184 |    9775022080
 current_relation_tags_pkey                                     | current_relation_tags          |    1605033984 |    3276980224
 current_relations_pkey                                         | current_relations              |     317603840 |    1026981888
 current_relations_timestamp_idx                                | current_relations              |     244006912 |    1026981888
 current_way_nodes_node_idx                                     | current_way_nodes              |  218715783168 |  570199367680
 current_way_nodes_pkey                                         | current_way_nodes              |  321845854208 |  570199367680
 current_way_tags_pkey                                          | current_way_tags               |   83507920896 |  140468199424
 current_ways_pkey                                              | current_ways                   |   23943176192 |   74808958976
 current_ways_timestamp_idx                                     | current_ways                   |   16342409216 |   74808958976
 delayed_jobs_pkey                                              | delayed_jobs                   |       1744896 |     232292352
 delayed_jobs_priority                                          | delayed_jobs                   |       3186688 |     232292352
 diary_comment_user_id_created_at_index                         | diary_comments                 |       1753088 |      25182208
 diary_comments_entry_id_idx                                    | diary_comments                 |       1753088 |      25182208
 diary_comments_pkey                                            | diary_comments                 |       1253376 |      25182208
 diary_entries_pkey                                             | diary_entries                  |       9052160 |     915546112
 diary_entry_created_at_index                                   | diary_entries                  |       9052160 |     915546112
 diary_entry_language_code_created_at_index                     | diary_entries                  |      12705792 |     915546112
 diary_entry_user_id_created_at_index                           | diary_entries                  |      12697600 |     915546112
 diary_entry_subscriptions_pkey                                 | diary_entry_subscriptions      |      13107200 |      38666240
 index_diary_entry_subscriptions_on_diary_entry_id              | diary_entry_subscriptions      |       9199616 |      38666240
 friends_pkey                                                   | friends                        |       3997696 |       9674752
 index_friends_on_user_id_and_created_at                        | friends                        |       3268608 |       9674752
 user_id_idx                                                    | friends                        |       3022848 |       9674752
 points_gpxid_idx                                               | gps_points                     |  170777264128 | 1861028487168
 points_tile_idx                                                | gps_points                     |  171579342848 | 1861028487168
 gpx_file_tags_gpxid_idx                                        | gpx_file_tags                  |     109568000 |     423780352
 gpx_file_tags_pkey                                             | gpx_file_tags                  |     130842624 |     423780352
 gpx_file_tags_tag_idx                                          | gpx_file_tags                  |      51511296 |     423780352
 gpx_files_pkey                                                 | gpx_files                      |     196583424 |    1476960256
 gpx_files_timestamp_idx                                        | gpx_files                      |     196567040 |    1476960256
 gpx_files_user_id_idx                                          | gpx_files                      |      65454080 |    1476960256
 gpx_files_visible_visibility_idx                               | gpx_files                      |      61128704 |    1476960256
 index_issue_comments_on_issue_id                               | issue_comments                 |        253952 |       1794048
 index_issue_comments_on_user_id                                | issue_comments                 |        106496 |       1794048
 issue_comments_pkey                                            | issue_comments                 |        294912 |       1794048
 index_issues_on_assigned_role                                  | issues                         |        204800 |       2908160
 index_issues_on_reportable_type_and_reportable_id              | issues                         |        860160 |       2908160
 index_issues_on_reported_user_id                               | issues                         |        540672 |       2908160
 index_issues_on_status                                         | issues                         |        204800 |       2908160
 index_issues_on_updated_by                                     | issues                         |        204800 |       2908160
 issues_pkey                                                    | issues                         |        614400 |       2908160
 languages_pkey                                                 | languages                      |         16384 |         57344
 languages_pkey_old                                             | languages                      |         16384 |         57344
 messages_from_user_id_idx                                      | messages                       |       9732096 |    1384652800
 messages_pkey                                                  | messages                       |      25296896 |    1384652800
 messages_to_user_id_idx                                        | messages                       |      13287424 |    1384652800
 node_tags_pkey                                                 | node_tags                      |  133508980736 |  211370172416
 nodes_changeset_id_idx                                         | nodes                          |   93823885312 | 1202013102080
 nodes_pkey                                                     | nodes                          |  412555952128 | 1202013102080
 nodes_tile_idx                                                 | nodes                          |   97073758208 | 1202013102080
 nodes_timestamp_idx                                            | nodes                          |  208730316800 | 1202013102080
 index_note_comments_on_author_id_and_created_at                | note_comments                  |     263094272 |    1292967936
 index_note_comments_on_body                                    | note_comments                  |     504258560 |    1292967936
 index_note_comments_on_created_at                              | note_comments                  |     187260928 |    1292967936
 note_comments_note_id_idx                                      | note_comments                  |     154353664 |    1292967936
 note_comments_pkey                                             | note_comments                  |     187260928 |    1292967936
 notes_created_at_idx                                           | notes                          |      86319104 |     395337728
 notes_pkey                                                     | notes                          |      86269952 |     395337728
 notes_tile_status_idx                                          | notes                          |      79847424 |     395337728
 notes_updated_at_idx                                           | notes                          |      86433792 |     395337728
 index_oauth_access_grants_on_application_id                    | oauth_access_grants            |       1777664 |      54018048
 index_oauth_access_grants_on_resource_owner_id                 | oauth_access_grants            |       3244032 |      54018048
 index_oauth_access_grants_on_token                             | oauth_access_grants            |      16719872 |      54018048
 oauth_access_grants_pkey                                       | oauth_access_grants            |       5595136 |      54018048
 index_oauth_access_tokens_on_application_id                    | oauth_access_tokens            |      35627008 |     980959232
 index_oauth_access_tokens_on_refresh_token                     | oauth_access_tokens            |     115310592 |     980959232
 index_oauth_access_tokens_on_resource_owner_id                 | oauth_access_tokens            |     112304128 |     980959232
 index_oauth_access_tokens_on_token                             | oauth_access_tokens            |     347660288 |     980959232
 oauth_access_tokens_pkey                                       | oauth_access_tokens            |     115294208 |     980959232
 index_oauth_applications_on_owner_type_and_owner_id            | oauth_applications             |         90112 |        704512
 index_oauth_applications_on_uid                                | oauth_applications             |        172032 |        704512
 oauth_applications_pkey                                        | oauth_applications             |         73728 |        704512
 index_oauth_nonces_on_nonce_and_timestamp                      | oauth_nonces                   |      37584896 |      48095232
 oauth_nonces_pkey                                              | oauth_nonces                   |      18857984 |      48095232
 index_oauth_tokens_on_token                                    | oauth_tokens                   |    1076756480 |    8270159872
 index_oauth_tokens_on_user_id                                  | oauth_tokens                   |     258129920 |    8270159872
 oauth_tokens_pkey                                              | oauth_tokens                   |     357826560 |    8270159872
 redactions_pkey                                                | redactions                     |         16384 |         65536
 relation_members_member_idx                                    | relation_members               |   42305847296 |  384032882688
 relation_members_pkey                                          | relation_members               |  343070179328 |  384032882688
 relation_tags_pkey                                             | relation_tags                  |   13664157696 |   22130196480
 relations_changeset_id_idx                                     | relations                      |     592322560 |    3422314496
 relations_pkey                                                 | relations                      |    1572716544 |    3422314496
 relations_timestamp_idx                                        | relations                      |     908288000 |    3422314496
 index_reports_on_issue_id                                      | reports                        |        737280 |       6168576
 index_reports_on_user_id                                       | reports                        |        425984 |       6168576
 reports_pkey                                                   | reports                        |        876544 |       6168576
 schema_migrations_pkey                                         | schema_migrations              |         16384 |         49152
 index_user_blocks_on_user_id                                   | user_blocks                    |        163840 |       7266304
 user_blocks_pkey                                               | user_blocks                    |        180224 |       7266304
 user_preferences_pkey                                          | user_preferences               |      33636352 |      81952768
 user_roles_id_role_unique                                      | user_roles                     |         16384 |         40960
 user_roles_pkey                                                | user_roles                     |         16384 |         40960
 user_tokens_pkey                                               | user_tokens                    |       1163264 |      12591104
 user_tokens_token_idx                                          | user_tokens                    |       3080192 |      12591104
 user_tokens_user_id_idx                                        | user_tokens                    |        950272 |      12591104
 users_auth_idx                                                 | users                          |     505208832 |   27342340096
 users_display_name_idx                                         | users                          |     680247296 |   27342340096
 users_display_name_lower_idx                                   | users                          |     680255488 |   27342340096
 users_email_idx                                                | users                          |     970579968 |   27342340096
 users_email_lower_idx                                          | users                          |     970686464 |   27342340096
 users_home_idx                                                 | users                          |     147521536 |   27342340096
 users_pkey                                                     | users                          |     451379200 |   27342340096
 way_nodes_node_idx                                             | way_nodes                      |  392711307264 | 1728639647744
 way_nodes_pkey                                                 | way_nodes                      | 1161818742784 | 1728639647744
 way_tags_pkey                                                  | way_tags                       |  237776658432 |  354138480640
 ways_changeset_id_idx                                          | ways                           |   15353298944 |  128524296192
 ways_pkey                                                      | ways                           |   59168841728 |  128524296192
 ways_timestamp_idx                                             | ways                           |   31471263744 |  128524296192
(142 rows)
mmd-osm commented 1 year ago

In https://github.com/openstreetmap/openstreetmap-website/issues/4188 @Firefishy proposed to compress existing traces. I’d say let’s combine it with the simplification and other cleanup activities being discussed in this issue.

Maybe something along the lines of:

  1. Read gpx file from S3
  2. Simplify traces (either for all users or only our two heavy ones)
  3. Update GPX using the API
  4. API pushes compressed gpx back to S3

Of course, in case we’d decide to delete some of traces, that’s a different story.

tomhughes commented 1 year ago

There's no way to "update" a GPX using the API is there?

If the underlying file was changed then it would be possible to reimport it but that would have to be done via backdoor as there's no API to request it

I'm not sure we should be changing the original GPX anyway - if we're going to simplify ourselves (rather than asking the uploaders to) then I think we should do it in the importer so the GPS points table is simplified but not the raw tracks.

mmd-osm commented 1 year ago

I was assuming that https://wiki.openstreetmap.org/wiki/API_v0.6#Update:_PUT_/api/0.6/gpx/%23id might cover the update scenario already. It’s currently limited to the trace owner, though.

I agree that our primary target for this issue is the GPS trace db table. I thought it would be a welcomed side effect, if we could achieve similar savings on the S3 side as well, since we’re not really obliged to keep the original file (at least not for every user).

tomhughes commented 1 year ago

That only lets you change the metadata doesn't it? It doesn't let you upload a new file.

mmd-osm commented 1 year ago

Ah, you're right, I should have payed a bit more attention to the actual implementation, rather than assuming the Wiki page description "Use this to update a GPX file." was somehow accurate.

In case of an update, we're not triggering the TraceImporterJob anymore. So yes, it's only updating metadata.

Edit: that doesn't mean this has to stay as is. We might as well consider the "file" parameter for the PUT endpoint, and in case of non-empty data trigger another import maybe.

simonpoole commented 1 year ago

I would suggest investigating, based on how things are currently used, if we even really need to have the track points in the database to start with.

tomhughes commented 1 year ago

How do you propose that I investigate that?

I mean I can look at whether people call the trackpoints API to fetch them but I don't need to because I know they do.

simonpoole commented 1 year ago

How do you propose that I investigate that?

I mean I can look at whether people call the trackpoints API to fetch them but I don't need to because I know they do.

Sure, I was thinking more of if we could move to an API that doesn't include API access for the anonymized trackpoints without seriously impacting what users are using the GPX tracks for, essentially reducing the data stored in the DB to the meta data.

This might/will require redoing the GPX tile layer (maybe as vector tiles) and changes in JOSM (is there any other editor that uses the trackpoints from the DB?).

'Investigating' would be the EWG and OWG fleshing out how this could work and then determining how much push back there is.

pnorman commented 1 year ago

Only about 200Gb was saved in total.

I show savings of 282GB from the gist, which brings the y/y growth to 14%-17%. So a portion of the increased growth is probably from GPS behavior changes in users, and a portion from index bloat.

Looking at the data, we should be regularly reindexing

Some of these tables should probably have their fillfactor adjusted, along with their indexes

I don't recommend regularly reindexing the tables for current or historical nodes, ways, and relations or their associated tag and member tables except for current_relation_members, or the gps_points.

We still should reindex everything periodically, maybe annually.

pnorman commented 4 months ago

I will re-check the numbers here when I get home