vitabaks / postgresql_cluster

PostgreSQL High-Availability Cluster (based on Patroni). Automating with Ansible.
https://postgresql-cluster.org
MIT License
1.69k stars 411 forks source link

etcd DB file 2GB #528

Closed mobius77 closed 9 months ago

mobius77 commented 10 months ago

I've got error from patroni: etcdserver: mvcc: database space exceeded

etcdctl endpoint status -w table
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
|    ENDPOINT    |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX |             ERRORS             |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
| 127.0.0.1:2379 | c8811744a9ea5c81 |   3.5.7 |  2.1 GB |     false |      false |       124 |    7336826 |            7336826 |   memberID:5014542757107301667 |
|                |                  |         |         |           |            |           |            |                    |                 alarm:NOSPACE  |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+

etcdctl compact and etcdctl defrag didn't help.

How i can increase limit for etcd or how to compress DB?

Please help.

vitabaks commented 10 months ago

@mobius77 What is stored in this etcd cluster?

In my case, the etcd cluster used by two Patroni clusters for the last 5 years takes about 370M

postgres@pgnode01:~$ sudo du -sh /var/lib/etcd/
370M    /var/lib/etcd/

Generated by GPT

The error you're encountering, "etcdserver: mvcc: database space exceeded," indicates that your etcd database has reached its storage limit. This can hinder the operation of services relying on etcd, like Patroni. Here are some steps to address this issue:

  1. Increase the Database Size Limit:

    • By default, etcd has a 2 GB limit for its database size. You can increase this limit using the --quota-backend-bytes flag when starting the etcd server. For example, to increase it to 4 GB, you would use --quota-backend-bytes=4294967296 (as 4 GB is 4,294,967,296 bytes).
    • Be cautious when increasing this limit, as it can affect the performance and stability of etcd, especially in clusters with limited resources.
  2. Compact and Defrag the Database:

    • You mentioned that you've already run etcdctl compact and etcdctl defrag. Compacting the database removes old revisions of keys and values, which can free up space. After compacting, defragmentation is necessary to reclaim this space physically.
    • Ensure you're compacting up to a recent revision to effectively free up space. For example: etcdctl compact <revision>.
    • After compacting, run etcdctl defrag on each member of the etcd cluster.
  3. Free up Space by Deleting Unnecessary Data:

    • Review the data stored in etcd and delete any unnecessary or stale keys. This can help in reducing the database size.
  4. Snapshot and Restore:

    • If the above methods do not significantly reduce the size, consider taking a snapshot of the etcd database and then restoring it. This process can help in cleaning up fragmented space and reducing the overall database size.
    • Use etcdctl snapshot save to take a snapshot and etcdctl snapshot restore to restore it.
  5. Monitor and Manage Storage:

    • Regularly monitor the size of the etcd database and set up alerts for when it approaches the limit. This proactive approach can prevent the database from reaching its maximum size and causing operational issues.
  6. Review etcd Usage:

    • If you find that the database size grows rapidly, it may be necessary to review how applications and services interact with etcd. This includes the amount of data being stored, the frequency of updates, and the retention policies for keys and values.
  7. Upgrade etcd (if applicable):

    • Ensure that you are using a recent version of etcd, as newer versions may have improvements in storage management and performance.

Before making any changes, especially those involving data deletion or snapshot and restore, ensure you have a recent backup of the etcd data. Also, consider testing these procedures in a non-production environment first to understand their impact and effectiveness.

vitabaks commented 10 months ago

How to debug large db size issue

vitabaks commented 10 months ago

@mobius77 You can increase the database quota in etcd (e.q. to 4 GB) by adding an option to the configuration file /etc/etcd/etcd.conf

ETCD_QUOTA_BACKEND_BYTES="4294967296"

This setting sets the etcd database size quota to 4 GB. After adding this line to the configuration file, you will need to restart etcd for the changes to take effect.

https://etcd.io/docs/v3.4/op-guide/configuration/#--quota-backend-bytes

vitabaks commented 9 months ago

PR (auto_compaction): https://github.com/vitabaks/postgresql_cluster/pull/562