graphprotocol / support

Community support for the Hosted Service of The Graph
https://thegraph.com/
10 stars 3 forks source link

Cleanup ethereum_blocks table to save disk #45

Closed Amxx closed 4 years ago

Amxx commented 4 years ago

After just 10 days running my own graph-node, the VM ran out of disk. The source is the ethereum_blocks in the postgres table being over 8GB from ~850 000 blocks.

  oid  |    table_schema    |         table_name         | row_estimate | total_bytes | index_bytes | toast_bytes | table_bytes |   total    |   index    |   toast    |   table
-------+--------------------+----------------------------+--------------+-------------+-------------+-------------+-------------+------------+------------+------------+------------
 16465 | public             | ethereum_blocks            |       878275 |  8538898432 |   155918336 |  7082663936 |  1300316160 | 8143 MB    | 149 MB     | 6755 MB    | 1240 MB

My graphnode is looking at 5 blockchains, 4 of which have ~4sec between blocks. The fifth one is mainnet.

This is clearly not sustainable. We have dedicated (archive) nodes that have large disks, but we expected the graphnode to be light (which is essential to decentralization).

I understand these blocks are used for handling reorgs, so maybe have a rolling window that drop an old one when a new block is received (about a few hours should be enough). The simplest way might be to have a cron job running every hours and dropping all block older then ~6 hours.

Jannis commented 4 years ago

A good way to solve this in graph-node would be delete all but the REORG_THRESHOLD/ANCESTOR_COUNT blocks in the database when ingesting blocks. This would establish a fixed-size window of blocks that are kept around. I'd even say this should be the default mode.

Cleaning up old blocks is fairly easy. A CLI flag + env var + cleanup query that is executed frequently enough is all it should take.

I wrote a quick temporary cleanup script that can be used in cron jobs:

#!/bin/bash

# For each network, delete blocks older than 75 blocks
psql graph 2>&1 >/dev/null <<EOF
DELETE FROM ethereum_blocks
      USING ethereum_networks as network
      WHERE network_name = network.name
        AND number < network.head_block_number - 75
EOF

# Reclaim disk space
psql graph <<EOF
VACUUM (FULL, VERBOSE, ANALYZE) ethereum_blocks;
EOF

# Confirm the number of remaining blocks per network
psql graph <<EOF
  SELECT network_name, count(*)
    FROM ethereum_blocks
GROUP BY network_name
EOF
lutter commented 4 years ago

The difficult bit is that we need to be careful to not delete blocks that a subgraph that is currently syncing needs (i.e., in the time between a subgraph finding out there is a new block and trying to get it from the DB)

One simple way to avoid that is to prune all blocks that come before the minimum of the current head of any syncing subgraphs, or before the chain head if there are no syncing subgraphs. In either case, we'd define 'before block' with some reasonable backoff (like 1 block in the case of syncing subgraphs and ANCESTOR_COUNT if there are no syncing subgraphs) If this is done continually (say, every 100 blocks as we ingest them) we should never have a large number of blocks in the database.

One warning about vacuum full: that should not be used on a production system as it takes an exclusive lock and will block all access to the table; it also takes more disk space since it copies the live contents of the table to a new file; worst case you'll temporarily need 2x the disk space.