Open pnorman opened 1 year ago
The -j8
might be a bit dangerous - not because of the CPU but because of the potential peak disk usage from the all the temporary data during big index rebuilds.
I can certainly do that for most tables but I may need to separate out some of the big tables and do them a bit more conservatively.
Here's the before data:
Name | Table | Index size | Table size
----------------------------------------------------------------+--------------------------------+---------------+---------------
acls_k_idx | acls | 163840 | 1761280
acls_pkey | acls | 466944 | 1761280
index_acls_on_address | acls | 884736 | 1761280
index_acls_on_domain | acls | 671744 | 1761280
index_acls_on_mx | acls | 155648 | 1761280
active_storage_attachments_pkey | active_storage_attachments | 613277696 | 2171207680
index_active_storage_attachments_on_blob_id | active_storage_attachments | 612917248 | 2171207680
index_active_storage_attachments_uniqueness | active_storage_attachments | 1750810624 | 2171207680
active_storage_blobs_pkey | active_storage_blobs | 738148352 | 6423920640
index_active_storage_blobs_on_key | active_storage_blobs | 1667653632 | 6423920640
active_storage_variant_records_pkey | active_storage_variant_records | 13033472 | 44195840
index_active_storage_variant_records_uniqueness | active_storage_variant_records | 33669120 | 44195840
ar_internal_metadata_pkey | ar_internal_metadata | 16384 | 49152
changeset_comments_pkey | changeset_comments | 24125440 | 372072448
index_changeset_comments_on_changeset_id_and_created_at | changeset_comments | 34824192 | 372072448
index_changeset_comments_on_created_at | changeset_comments | 24125440 | 372072448
changeset_tags_id_idx | changeset_tags | 7721467904 | 43894415360
changesets_bbox_idx | changesets | 10194657280 | 13502210048
changesets_closed_at_idx | changesets | 3549102080 | 13502210048
changesets_created_at_idx | changesets | 3312992256 | 13502210048
changesets_pkey | changesets | 74331930624 | 13502210048
changesets_user_id_created_at_idx | changesets | 4796899328 | 13502210048
changesets_user_id_id_idx | changesets | 4829036544 | 13502210048
index_changesets_subscribers_on_changeset_id | changesets_subscribers | 3152216064 | 6491078656
index_changesets_subscribers_on_subscriber_id_and_changeset_id | changesets_subscribers | 4627464192 | 6491078656
client_applications_pkey | client_applications | 286720 | 2719744
index_client_applications_on_key | client_applications | 1236992 | 2719744
index_client_applications_on_user_id | client_applications | 270336 | 2719744
current_node_tags_pkey | current_node_tags | 35098607616 | 54820200448
current_nodes_pkey1 | current_nodes | 225180450816 | 833129971712
current_nodes_tile_idx | current_nodes | 78964768768 | 833129971712
current_nodes_timestamp_idx | current_nodes | 153879134208 | 833129971712
current_relation_members_member_idx | current_relation_members | 4549951488 | 9775022080
current_relation_members_pkey | current_relation_members | 12512206848 | 9775022080
current_relation_tags_pkey | current_relation_tags | 1879195648 | 3276980224
current_relations_pkey | current_relations | 327172096 | 1025597440
current_relations_timestamp_idx | current_relations | 281550848 | 1025597440
current_way_nodes_node_idx | current_way_nodes | 241501855744 | 570199367680
current_way_nodes_pkey | current_way_nodes | 350738210816 | 570199367680
current_way_tags_pkey | current_way_tags | 89976496128 | 140468199424
current_ways_pkey | current_ways | 24399601664 | 74808958976
current_ways_timestamp_idx | current_ways | 17183817728 | 74808958976
delayed_jobs_pkey | delayed_jobs | 16384 | 232292352
delayed_jobs_priority | delayed_jobs | 16384 | 232292352
diary_comment_user_id_created_at_index | diary_comments | 1744896 | 25124864
diary_comments_entry_id_idx | diary_comments | 1744896 | 25124864
diary_comments_pkey | diary_comments | 1253376 | 25124864
diary_entries_pkey | diary_entries | 9052160 | 915988480
diary_entry_created_at_index | diary_entries | 9052160 | 915988480
diary_entry_language_code_created_at_index | diary_entries | 12746752 | 915988480
diary_entry_user_id_created_at_index | diary_entries | 12673024 | 915988480
diary_entry_subscriptions_pkey | diary_entry_subscriptions | 13107200 | 38666240
index_diary_entry_subscriptions_on_diary_entry_id | diary_entry_subscriptions | 9191424 | 38666240
friends_pkey | friends | 3997696 | 9674752
index_friends_on_user_id_and_created_at | friends | 3268608 | 9674752
user_id_idx | friends | 3022848 | 9674752
points_gpxid_idx | gps_points | 175209054208 | 1848585453568
points_tile_idx | gps_points | 180966776832 | 1848585453568
gpx_file_tags_gpxid_idx | gpx_file_tags | 108642304 | 423780352
gpx_file_tags_pkey | gpx_file_tags | 130113536 | 423780352
gpx_file_tags_tag_idx | gpx_file_tags | 56803328 | 423780352
gpx_files_pkey | gpx_files | 211378176 | 1476960256
gpx_files_timestamp_idx | gpx_files | 204046336 | 1476960256
gpx_files_user_id_idx | gpx_files | 77078528 | 1476960256
gpx_files_visible_visibility_idx | gpx_files | 71901184 | 1476960256
index_issue_comments_on_issue_id | issue_comments | 253952 | 1794048
index_issue_comments_on_user_id | issue_comments | 106496 | 1794048
issue_comments_pkey | issue_comments | 294912 | 1794048
index_issues_on_assigned_role | issues | 204800 | 2908160
index_issues_on_reportable_type_and_reportable_id | issues | 860160 | 2908160
index_issues_on_reported_user_id | issues | 540672 | 2908160
index_issues_on_status | issues | 204800 | 2908160
index_issues_on_updated_by | issues | 204800 | 2908160
issues_pkey | issues | 614400 | 2908160
languages_pkey | languages | 16384 | 57344
languages_pkey_old | languages | 16384 | 57344
messages_from_user_id_idx | messages | 9732096 | 1383997440
messages_pkey | messages | 25280512 | 1383997440
messages_to_user_id_idx | messages | 13287424 | 1383997440
node_tags_pkey | node_tags | 139547262976 | 211070828544
nodes_changeset_id_idx | nodes | 95281111040 | 1202013102080
nodes_pkey | nodes | 421277581312 | 1202013102080
nodes_tile_idx | nodes | 102119120896 | 1202013102080
nodes_timestamp_idx | nodes | 209760067584 | 1202013102080
index_note_comments_on_author_id_and_created_at | note_comments | 278839296 | 1287790592
index_note_comments_on_body | note_comments | 483770368 | 1287790592
index_note_comments_on_created_at | note_comments | 187031552 | 1287790592
note_comments_note_id_idx | note_comments | 158883840 | 1287790592
note_comments_pkey | note_comments | 187031552 | 1287790592
notes_created_at_idx | notes | 92856320 | 395337728
notes_pkey | notes | 101941248 | 395337728
notes_tile_status_idx | notes | 110641152 | 395337728
notes_updated_at_idx | notes | 100352000 | 395337728
index_oauth_access_grants_on_application_id | oauth_access_grants | 2375680 | 52961280
index_oauth_access_grants_on_resource_owner_id | oauth_access_grants | 4358144 | 52961280
index_oauth_access_grants_on_token | oauth_access_grants | 19521536 | 52961280
oauth_access_grants_pkey | oauth_access_grants | 5685248 | 52961280
index_oauth_access_tokens_on_application_id | oauth_access_tokens | 40124416 | 975683584
index_oauth_access_tokens_on_refresh_token | oauth_access_tokens | 89407488 | 975683584
index_oauth_access_tokens_on_resource_owner_id | oauth_access_tokens | 130195456 | 975683584
index_oauth_access_tokens_on_token | oauth_access_tokens | 464429056 | 975683584
oauth_access_tokens_pkey | oauth_access_tokens | 115081216 | 975683584
index_oauth_applications_on_owner_type_and_owner_id | oauth_applications | 98304 | 704512
index_oauth_applications_on_uid | oauth_applications | 221184 | 704512
oauth_applications_pkey | oauth_applications | 73728 | 704512
index_oauth_nonces_on_nonce_and_timestamp | oauth_nonces | 142991360 | 93323264
oauth_nonces_pkey | oauth_nonces | 68960256 | 93323264
index_oauth_tokens_on_token | oauth_tokens | 1961107456 | 8270159872
index_oauth_tokens_on_user_id | oauth_tokens | 287670272 | 8270159872
oauth_tokens_pkey | oauth_tokens | 424263680 | 8270159872
redactions_pkey | redactions | 16384 | 65536
relation_members_member_idx | relation_members | 47458426880 | 383724142592
relation_members_pkey | relation_members | 363714404352 | 383724142592
relation_tags_pkey | relation_tags | 15138111488 | 22104006656
relations_changeset_id_idx | relations | 596942848 | 3418898432
relations_pkey | relations | 1661059072 | 3418898432
relations_timestamp_idx | relations | 913424384 | 3418898432
index_reports_on_issue_id | reports | 737280 | 6168576
index_reports_on_user_id | reports | 425984 | 6168576
reports_pkey | reports | 876544 | 6168576
schema_migrations_pkey | schema_migrations | 16384 | 49152
index_user_blocks_on_user_id | user_blocks | 163840 | 7233536
user_blocks_pkey | user_blocks | 180224 | 7233536
user_preferences_pkey | user_preferences | 36601856 | 81952768
user_roles_id_role_unique | user_roles | 16384 | 40960
user_roles_pkey | user_roles | 16384 | 40960
user_tokens_pkey | user_tokens | 6184960 | 12951552
user_tokens_token_idx | user_tokens | 12083200 | 12951552
user_tokens_user_id_idx | user_tokens | 7331840 | 12951552
users_auth_idx | users | 486744064 | 27355463680
users_display_name_idx | users | 779476992 | 27355463680
users_display_name_lower_idx | users | 792592384 | 27355463680
users_email_idx | users | 1200881664 | 27355463680
users_email_lower_idx | users | 1178222592 | 27355463680
users_home_idx | users | 177848320 | 27355463680
users_pkey | users | 494460928 | 27355463680
way_nodes_node_idx | way_nodes | 405005434880 | 1727394054144
way_nodes_pkey | way_nodes | 1211717287936 | 1727394054144
way_tags_pkey | way_tags | 253548429312 | 353833230336
ways_changeset_id_idx | ways | 15595323392 | 128393117696
ways_pkey | ways | 60320063488 | 128393117696
ways_timestamp_idx | ways | 31652945920 | 128393117696
(142 rows)
A quick check of the usual suspect wouldn't indicate any extraordinary growth https://taginfo.openstreetmap.org/keys/building#chronology
And element growth rate would seem to be only slightly ticking upward: https://osmstats.neis-one.org/?item=elements
But naturally API DB size growth could be effected by more versions being created, this could easily be caused by the trend in editing tools to not "batch" edits, more reverts etc. Maybe we could track growth of the 'non-current' tables for a while?
But naturally API DB size growth could be effected by more versions being created
There was a major vandalism incident in the last 100 days where "previously created, previously dormant accounts" were able to remove name:ru (sometimes several times) from many objects worldwide (taginfo suggests 1.6M). The usage of that tag is small beer compared to "building" though (570M worldwide).
What about GPS data? Is there any benefit of having more than 13 billion GPS points added by the top 2 contributors, which appears to be mostly automated uploads with an excessive amount of nodes (-> https://planet.openstreetmap.org/statistics/data_stats.html ). If I get the figures above right, we need about 2.2TB for GPS traces in total, or roughly 20% of the db size. This topic might be worth revisiting.
What about GPS data? Is there any benefit of having more than 13 billion GPS points added by the top 2 contributors, which appears to be mostly automated uploads with an excessive amount of nodes (-> https://planet.openstreetmap.org/statistics/data_stats.html ). If I get the figures above right, we need about 2.2TB for GPS traces in total, or roughly 20% of the db size. This topic might be worth revisiting.
This issue is about the growth rate of size, not the current size. Have those contributors accelerated their uploads in the last year?
I think it's quite possible - there is sunnypilot and a couple of clones and a couple of new ones in India as well.
https://web.archive.org/web/20220611051939/https://planet.openstreetmap.org/statistics/data_stats.html has captured a few stats from last year. It looks like the total number of gps points has more than doubled since then, ie. more than 50% of all gps points have been added during the last year only.
Given the amount of data involved here, it might have some impact on the overall YoY db growth rates.
I had missed this among the other graphs, but it seems the most likely cause.
If this the cause, the ongoing costs to the new types of usage of GPX traces are likely to be about €1200/year in deprecation from needing more or bigger drives across DB servers.
I'd let the reindex finish so we can be 100% sure, and it could help things anyways.
In that case I’d propose to reach out to them and ask them to simplify their traces before uploading or do the simplification on our own. Quite frequently a 6000 point sunnypilot trace ends up with little more than 20 points, still with a maximum error of 1m.
It seems quite wasteful to me to store such overnoded data, and likely not a sustainable approach in the long run. Ideally, simplifying traces should also happen to the 13 billion nodes that have already been uploaded. If that’s not an option we should also look into deleting traces.
I’m not convinced this whole approach is adding much value for mappers. In some areas I’ve checked, JOSM had to download almost 200k trace points for an area as small as 300m x 300m. On the respective GPS tiles, a few dozen traces were completely covering a motorway, including a fair amount of the surrounding area.
There are actually two users doing the same things based on different forks of the same software:
There is a previous discussion at https://github.com/sunnyhaibin/sunnypilot/issues/133 where I dealt with sunnypilot.
Only about 200Gb was saved in total. Here's the after list:
Name | Table | Index size | Table size
----------------------------------------------------------------+--------------------------------+---------------+---------------
acls_k_idx | acls | 163840 | 1761280
acls_pkey | acls | 466944 | 1761280
index_acls_on_address | acls | 884736 | 1761280
index_acls_on_domain | acls | 671744 | 1761280
index_acls_on_mx | acls | 155648 | 1761280
active_storage_attachments_pkey | active_storage_attachments | 616505344 | 2201714688
index_active_storage_attachments_on_blob_id | active_storage_attachments | 616505344 | 2201714688
index_active_storage_attachments_uniqueness | active_storage_attachments | 1384865792 | 2201714688
active_storage_blobs_pkey | active_storage_blobs | 617594880 | 6423920640
index_active_storage_blobs_on_key | active_storage_blobs | 1354768384 | 6423920640
active_storage_variant_records_pkey | active_storage_variant_records | 12935168 | 44195840
index_active_storage_variant_records_uniqueness | active_storage_variant_records | 33857536 | 44195840
ar_internal_metadata_pkey | ar_internal_metadata | 16384 | 49152
changeset_comments_pkey | changeset_comments | 24092672 | 372277248
index_changeset_comments_on_changeset_id_and_created_at | changeset_comments | 33783808 | 372277248
index_changeset_comments_on_created_at | changeset_comments | 24092672 | 372277248
changeset_tags_id_idx | changeset_tags | 7719813120 | 43956731904
changesets_bbox_idx | changesets | 9977987072 | 13502210048
changesets_closed_at_idx | changesets | 3152797696 | 13502210048
changesets_created_at_idx | changesets | 3149389824 | 13502210048
changesets_pkey | changesets | 3149520896 | 13502210048
changesets_user_id_created_at_idx | changesets | 4426924032 | 13502210048
changesets_user_id_id_idx | changesets | 4427849728 | 13502210048
index_changesets_subscribers_on_changeset_id | changesets_subscribers | 3155730432 | 6491078656
index_changesets_subscribers_on_subscriber_id_and_changeset_id | changesets_subscribers | 4438130688 | 6491078656
client_applications_pkey | client_applications | 286720 | 2727936
index_client_applications_on_key | client_applications | 827392 | 2727936
index_client_applications_on_user_id | client_applications | 278528 | 2727936
current_node_tags_pkey | current_node_tags | 32210108416 | 55081156608
current_nodes_pkey1 | current_nodes | 218953154560 | 833129971712
current_nodes_tile_idx | current_nodes | 73471311872 | 833129971712
current_nodes_timestamp_idx | current_nodes | 150855081984 | 833129971712
current_relation_members_member_idx | current_relation_members | 2728255488 | 9775022080
current_relation_members_pkey | current_relation_members | 7126237184 | 9775022080
current_relation_tags_pkey | current_relation_tags | 1605033984 | 3276980224
current_relations_pkey | current_relations | 317603840 | 1026981888
current_relations_timestamp_idx | current_relations | 244006912 | 1026981888
current_way_nodes_node_idx | current_way_nodes | 218715783168 | 570199367680
current_way_nodes_pkey | current_way_nodes | 321845854208 | 570199367680
current_way_tags_pkey | current_way_tags | 83507920896 | 140468199424
current_ways_pkey | current_ways | 23943176192 | 74808958976
current_ways_timestamp_idx | current_ways | 16342409216 | 74808958976
delayed_jobs_pkey | delayed_jobs | 1744896 | 232292352
delayed_jobs_priority | delayed_jobs | 3186688 | 232292352
diary_comment_user_id_created_at_index | diary_comments | 1753088 | 25182208
diary_comments_entry_id_idx | diary_comments | 1753088 | 25182208
diary_comments_pkey | diary_comments | 1253376 | 25182208
diary_entries_pkey | diary_entries | 9052160 | 915546112
diary_entry_created_at_index | diary_entries | 9052160 | 915546112
diary_entry_language_code_created_at_index | diary_entries | 12705792 | 915546112
diary_entry_user_id_created_at_index | diary_entries | 12697600 | 915546112
diary_entry_subscriptions_pkey | diary_entry_subscriptions | 13107200 | 38666240
index_diary_entry_subscriptions_on_diary_entry_id | diary_entry_subscriptions | 9199616 | 38666240
friends_pkey | friends | 3997696 | 9674752
index_friends_on_user_id_and_created_at | friends | 3268608 | 9674752
user_id_idx | friends | 3022848 | 9674752
points_gpxid_idx | gps_points | 170777264128 | 1861028487168
points_tile_idx | gps_points | 171579342848 | 1861028487168
gpx_file_tags_gpxid_idx | gpx_file_tags | 109568000 | 423780352
gpx_file_tags_pkey | gpx_file_tags | 130842624 | 423780352
gpx_file_tags_tag_idx | gpx_file_tags | 51511296 | 423780352
gpx_files_pkey | gpx_files | 196583424 | 1476960256
gpx_files_timestamp_idx | gpx_files | 196567040 | 1476960256
gpx_files_user_id_idx | gpx_files | 65454080 | 1476960256
gpx_files_visible_visibility_idx | gpx_files | 61128704 | 1476960256
index_issue_comments_on_issue_id | issue_comments | 253952 | 1794048
index_issue_comments_on_user_id | issue_comments | 106496 | 1794048
issue_comments_pkey | issue_comments | 294912 | 1794048
index_issues_on_assigned_role | issues | 204800 | 2908160
index_issues_on_reportable_type_and_reportable_id | issues | 860160 | 2908160
index_issues_on_reported_user_id | issues | 540672 | 2908160
index_issues_on_status | issues | 204800 | 2908160
index_issues_on_updated_by | issues | 204800 | 2908160
issues_pkey | issues | 614400 | 2908160
languages_pkey | languages | 16384 | 57344
languages_pkey_old | languages | 16384 | 57344
messages_from_user_id_idx | messages | 9732096 | 1384652800
messages_pkey | messages | 25296896 | 1384652800
messages_to_user_id_idx | messages | 13287424 | 1384652800
node_tags_pkey | node_tags | 133508980736 | 211370172416
nodes_changeset_id_idx | nodes | 93823885312 | 1202013102080
nodes_pkey | nodes | 412555952128 | 1202013102080
nodes_tile_idx | nodes | 97073758208 | 1202013102080
nodes_timestamp_idx | nodes | 208730316800 | 1202013102080
index_note_comments_on_author_id_and_created_at | note_comments | 263094272 | 1292967936
index_note_comments_on_body | note_comments | 504258560 | 1292967936
index_note_comments_on_created_at | note_comments | 187260928 | 1292967936
note_comments_note_id_idx | note_comments | 154353664 | 1292967936
note_comments_pkey | note_comments | 187260928 | 1292967936
notes_created_at_idx | notes | 86319104 | 395337728
notes_pkey | notes | 86269952 | 395337728
notes_tile_status_idx | notes | 79847424 | 395337728
notes_updated_at_idx | notes | 86433792 | 395337728
index_oauth_access_grants_on_application_id | oauth_access_grants | 1777664 | 54018048
index_oauth_access_grants_on_resource_owner_id | oauth_access_grants | 3244032 | 54018048
index_oauth_access_grants_on_token | oauth_access_grants | 16719872 | 54018048
oauth_access_grants_pkey | oauth_access_grants | 5595136 | 54018048
index_oauth_access_tokens_on_application_id | oauth_access_tokens | 35627008 | 980959232
index_oauth_access_tokens_on_refresh_token | oauth_access_tokens | 115310592 | 980959232
index_oauth_access_tokens_on_resource_owner_id | oauth_access_tokens | 112304128 | 980959232
index_oauth_access_tokens_on_token | oauth_access_tokens | 347660288 | 980959232
oauth_access_tokens_pkey | oauth_access_tokens | 115294208 | 980959232
index_oauth_applications_on_owner_type_and_owner_id | oauth_applications | 90112 | 704512
index_oauth_applications_on_uid | oauth_applications | 172032 | 704512
oauth_applications_pkey | oauth_applications | 73728 | 704512
index_oauth_nonces_on_nonce_and_timestamp | oauth_nonces | 37584896 | 48095232
oauth_nonces_pkey | oauth_nonces | 18857984 | 48095232
index_oauth_tokens_on_token | oauth_tokens | 1076756480 | 8270159872
index_oauth_tokens_on_user_id | oauth_tokens | 258129920 | 8270159872
oauth_tokens_pkey | oauth_tokens | 357826560 | 8270159872
redactions_pkey | redactions | 16384 | 65536
relation_members_member_idx | relation_members | 42305847296 | 384032882688
relation_members_pkey | relation_members | 343070179328 | 384032882688
relation_tags_pkey | relation_tags | 13664157696 | 22130196480
relations_changeset_id_idx | relations | 592322560 | 3422314496
relations_pkey | relations | 1572716544 | 3422314496
relations_timestamp_idx | relations | 908288000 | 3422314496
index_reports_on_issue_id | reports | 737280 | 6168576
index_reports_on_user_id | reports | 425984 | 6168576
reports_pkey | reports | 876544 | 6168576
schema_migrations_pkey | schema_migrations | 16384 | 49152
index_user_blocks_on_user_id | user_blocks | 163840 | 7266304
user_blocks_pkey | user_blocks | 180224 | 7266304
user_preferences_pkey | user_preferences | 33636352 | 81952768
user_roles_id_role_unique | user_roles | 16384 | 40960
user_roles_pkey | user_roles | 16384 | 40960
user_tokens_pkey | user_tokens | 1163264 | 12591104
user_tokens_token_idx | user_tokens | 3080192 | 12591104
user_tokens_user_id_idx | user_tokens | 950272 | 12591104
users_auth_idx | users | 505208832 | 27342340096
users_display_name_idx | users | 680247296 | 27342340096
users_display_name_lower_idx | users | 680255488 | 27342340096
users_email_idx | users | 970579968 | 27342340096
users_email_lower_idx | users | 970686464 | 27342340096
users_home_idx | users | 147521536 | 27342340096
users_pkey | users | 451379200 | 27342340096
way_nodes_node_idx | way_nodes | 392711307264 | 1728639647744
way_nodes_pkey | way_nodes | 1161818742784 | 1728639647744
way_tags_pkey | way_tags | 237776658432 | 354138480640
ways_changeset_id_idx | ways | 15353298944 | 128524296192
ways_pkey | ways | 59168841728 | 128524296192
ways_timestamp_idx | ways | 31471263744 | 128524296192
(142 rows)
In https://github.com/openstreetmap/openstreetmap-website/issues/4188 @Firefishy proposed to compress existing traces. I’d say let’s combine it with the simplification and other cleanup activities being discussed in this issue.
Maybe something along the lines of:
Of course, in case we’d decide to delete some of traces, that’s a different story.
There's no way to "update" a GPX using the API is there?
If the underlying file was changed then it would be possible to reimport it but that would have to be done via backdoor as there's no API to request it
I'm not sure we should be changing the original GPX anyway - if we're going to simplify ourselves (rather than asking the uploaders to) then I think we should do it in the importer so the GPS points table is simplified but not the raw tracks.
I was assuming that https://wiki.openstreetmap.org/wiki/API_v0.6#Update:_PUT_/api/0.6/gpx/%23id might cover the update scenario already. It’s currently limited to the trace owner, though.
I agree that our primary target for this issue is the GPS trace db table. I thought it would be a welcomed side effect, if we could achieve similar savings on the S3 side as well, since we’re not really obliged to keep the original file (at least not for every user).
That only lets you change the metadata doesn't it? It doesn't let you upload a new file.
Ah, you're right, I should have payed a bit more attention to the actual implementation, rather than assuming the Wiki page description "Use this to update a GPX file." was somehow accurate.
In case of an update, we're not triggering the TraceImporterJob anymore. So yes, it's only updating metadata.
Edit: that doesn't mean this has to stay as is. We might as well consider the "file" parameter for the PUT endpoint, and in case of non-empty data trigger another import maybe.
I would suggest investigating, based on how things are currently used, if we even really need to have the track points in the database to start with.
How do you propose that I investigate that?
I mean I can look at whether people call the trackpoints API to fetch them but I don't need to because I know they do.
How do you propose that I investigate that?
I mean I can look at whether people call the trackpoints API to fetch them but I don't need to because I know they do.
Sure, I was thinking more of if we could move to an API that doesn't include API access for the anonymized trackpoints without seriously impacting what users are using the GPX tracks for, essentially reducing the data stored in the DB to the meta data.
This might/will require redoing the GPX tile layer (maybe as vector tiles) and changes in JOSM (is there any other editor that uses the trackpoints from the DB?).
'Investigating' would be the EWG and OWG fleshing out how this could work and then determining how much push back there is.
Only about 200Gb was saved in total.
I show savings of 282GB from the gist, which brings the y/y growth to 14%-17%. So a portion of the increased growth is probably from GPS behavior changes in users, and a portion from index bloat.
Looking at the data, we should be regularly reindexing
Some of these tables should probably have their fillfactor adjusted, along with their indexes
I don't recommend regularly reindexing the tables for current or historical nodes, ways, and relations or their associated tag and member tables except for current_relation_members, or the gps_points.
We still should reindex everything periodically, maybe annually.
I will re-check the numbers here when I get home
In May 2021 the API DB growth was 13% year on year. I just measured it as 25% or so[1]. This is a significant increase, and means having to allocate 3x current usage instead of 2x current usage when buying a new machine to last 5 years, so I'd like to figure out why before looking at disk costs.
Since I last checked the change, we've had a significant technical change with the upgrade to PG15, which saved several TB from index space savings. There's a few possibilities for why we're gaining in space so much faster
@tomhughes could you run
reindexdb -d openstreetmap --concurrently -j 8
and both before and after the reindex, run the following SQL and save it to a file?Given snap-01's load of under 50%, -j8 should be safe and it won't block any transactions while it runs.
If the growth rate change is due to index bloat, it will clean it up and reveal it with the sizes. If not, well, it'll save a bit of space.
[1]: 10.9TB to 11.6TB filesystem change over 100 days for 25.5% or 10.47TB to 11.62TB DB size over 0.5 year for 23.2%