internetstandards / Internet.nl

Internet standards compliance test suite
https://internet.nl
178 stars 38 forks source link

Look into database size, storage and archiving term #684

Open stitch opened 2 years ago

stitch commented 2 years ago

The production instance uses 80 gig in /var and 50 gig for the database. This will continue to grow at an unknown rate, for reasons that might be nonsensical (such as excessive logging that is never pruned, or storing metrics that are not useful anymore).

A quick win could be to index the disk usage with ncdu and check if there is useless junk. Next is to inspect the database to see what can be reduced. As an ultimate approach historic reports could be deleted, but exactly this history is reported as being (very) valuable for some end users.

user@int-prod:~# df -h /var
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VG0-var 109G 80G 25G 77% /var
root@int-prod:~# du -sh /var/lib/postgresql/
50G /var/lib/postgresql/
mxsasha commented 2 years ago

For the database, it seems like we just have a lot of data in there:

    table_schema    |         table_name          | row_estimate |   total    |   index    |   toast    |   table    |   total_size_share
--------------------+-----------------------------+--------------+------------+------------+------------+------------+----------------------
 public             | checks_domaintesttls        |  5.87121e+06 | 11 GB      | 920 MB     | 53 MB      | 9811 MB    |     0.19757024975081
 public             | checks_domaintestdnssec     |  6.55164e+06 | 7461 MB    | 445 MB     | 8192 bytes | 7016 MB    |    0.136694670574313
 public             | checks_domaintestipv6       |  3.17311e+06 | 6796 MB    | 73 MB      | 64 kB      | 6724 MB    |    0.124513858155647
 public             | checks_domaintestappsecpriv |  2.92914e+06 | 5258 MB    | 132 MB     | 38 MB      | 5089 MB    |   0.0963380988068091
 public             | checks_webtesttls           |  2.90879e+06 | 3567 MB    | 66 MB      | 4848 kB    | 3497 MB    |   0.0653475402496535
 public             | checks_nsdomain             |  1.31668e+07 | 3371 MB    | 1758 MB    | 8192 bytes | 1612 MB    |   0.0617551144726996
 public             | checks_webtestappsecpriv    |  1.84553e+06 | 3324 MB    | 43 MB      | 11 MB      | 3271 MB    |   0.0609007700979183
 public             | checks_mailtestauth         |  1.36015e+06 | 2776 MB    | 31 MB      | 728 kB     | 2744 MB    |   0.0508495924509679
 public             | checks_mailtestipv6         |  1.49707e+06 | 2603 MB    | 34 MB      | 8192 bytes | 2569 MB    |   0.0476891337242888
 public             | checks_mailtestdnssec       |  1.43883e+06 | 2563 MB    | 32 MB      | 8192 bytes | 2531 MB    |   0.0469626046496716
 public             | checks_connectiontest       |       824699 | 1974 MB    | 226 MB     | 8192 bytes | 1748 MB    |   0.0361587050950356
 public             | checks_mailtesttls          |  1.35696e+06 | 1307 MB    | 30 MB      | 1200 kB    | 1275 MB    |   0.0239416807268268
 public             | checks_webdomain            |  2.99028e+06 | 611 MB     | 253 MB     | 32 kB      | 359 MB     |   0.0111963626599552

In checks_domaintesttls, presumably our oldest test, there are 9 million rows. 4.6M are from before 2021 though, so if we want to save disk space, expiring tests to an inactive history after 2+ years could help.

Outside of SQL db, it seems our host makes daily database backups for the last 3 days as gzipped sql, which accounts for another 21GB. Perhaps we can save something there. Other /var things aren't significant compared to this.

mxsasha commented 2 years ago

Discussed: we should bring up archiving terms in the steering committee.

mxsasha commented 1 year ago

For batch, table sizes are (first query from here):

    table_schema    |         table_name          | row_estimate |   total    |   index    |   toast    |  table  |   total_size_share
--------------------+-----------------------------+--------------+------------+------------+------------+---------+----------------------
 public             | checks_domaintestdnssec     |  8.23058e+06 | 12 GB      | 1307 MB    | 8192 bytes | 10 GB   |    0.144176948726993
 public             | checks_domaintesttls        |  6.01754e+06 | 10 GB      | 1131 MB    | 16 MB      | 9158 MB |     0.12387370636111
 public             | checks_mailtestauth         |  3.11716e+06 | 6909 MB    | 244 MB     | 3688 kB    | 6661 MB |   0.0830458788931248
 public             | checks_domaintestappsecpriv |  3.35727e+06 | 6450 MB    | 329 MB     | 24 MB      | 6097 MB |   0.0775309044043542
 public             | checks_domaintestipv6       |  2.72941e+06 | 6402 MB    | 228 MB     | 8192 bytes | 6174 MB |   0.0769519574511083
 public             | checks_mailtestdnssec       |  3.13744e+06 | 5948 MB    | 245 MB     | 8192 bytes | 5703 MB |   0.0715014982189282
 public             | checks_mailtestipv6         |  3.13552e+06 | 5857 MB    | 264 MB     | 8192 bytes | 5593 MB |   0.0704057718522659
 public             | checks_webtestappsecpriv    |  2.83874e+06 | 5155 MB    | 221 MB     | 10 MB      | 4924 MB |   0.0619640397406468
 public             | checks_mailtesttls          |  4.07792e+06 | 4353 MB    | 318 MB     | 2960 kB    | 4033 MB |   0.0523275076150541
 public             | checks_nsdomain             |   1.6678e+07 | 4171 MB    | 1802 MB    | 8192 bytes | 2369 MB |   0.0501343645256617
 public             | checks_webtesttls           |  2.83329e+06 | 3815 MB    | 226 MB     | 1504 kB    | 3587 MB |   0.0458550403225049
 public             | checks_batchmailtest        |  5.13604e+06 | 3496 MB    | 3014 MB    |            | 482 MB  |   0.0420172746877443
 public             | checks_batchdomain          |  8.54135e+06 | 3170 MB    | 2469 MB    |            | 701 MB  |   0.0381079506461104
 public             | checks_batchwebtest         |  3.50514e+06 | 2561 MB    | 2233 MB    |            | 328 MB  |   0.0307784728271506
 public             | checks_mailtestreport       |  3.57264e+06 | 925 MB     | 473 MB     | 8192 bytes | 452 MB  |    0.011123106378615
 public             | checks_domaintestreport     |  2.60565e+06 | 635 MB     | 320 MB     | 8192 bytes | 316 MB  |  0.00763777998400548
 public             | checks_webdomain            |  2.57425e+06 | 415 MB     | 120 MB     | 8192 bytes | 295 MB  |  0.00498495395282254
 public             | checks_mxdomain             |  2.30569e+06 | 414 MB     | 113 MB     | 8192 bytes | 301 MB  |  0.00498176105802767
 public             | checks_batchrequest         |        80158 | 206 MB     | 142 MB     |            | 64 MB   |  0.00247871935619221

The distribution by year for single TLS (largest table):

internetnl=> select extract(year from timestamp) as year, count(*) from checks_domaintesttls group by year order by year;
 year |  count
------+---------
 2015 |   32311
 2016 |  132845
 2017 |  823911
 2018 | 1287284
 2019 | 1350664
 2020 |  978265
 2021 | 1195942
 2022 | 1119097

Therefore, a rough estimate is that by only keeping the last 2 years in the active database, we can cut the records down to 30% of what they are now.

The distribution by year for batch dnssecc (largest table):

internetnl=> select extract(year from timestamp) as year, count(*) from checks_domaintestdnssec group by year order by year;
 year |  count
------+---------
 2020 |  994479
 2021 | 3714144
 2022 | 3923974

So a lot less to be gained here. Though we could set the limit for batch to one year, cutting it to 45%.

mxsasha commented 1 year ago

In December we agreed in a retention term for 2 years in the steering committee.

mxsasha commented 1 year ago

We discussed this several times and came up with a number of storage mechanisms:

The main issue is that none of these formats are particularly friendly to making multiple exports in the future and merging them. However, a PostgreSQL dump is probably the friendliest to future schema changes (and contains the current schema version in the django_migrations table data), so that is probably the best option for now.

Another inconvenience is that these dumps are hard to filter for specific dates, whereas after the dump, we probably only want to delete records beyond a certain age.