spacemeshos / go-spacemesh

Go Implementation of the Spacemesh protocol full node. 💾⏰💪
https://spacemesh.io
MIT License
760 stars 211 forks source link

Use integer IDs for the references in the database #6027

Open ivan4th opened 5 months ago

ivan4th commented 5 months ago

Description

Currently, go-spacemesh state database size keeps growing and accordingly the performance continues to degrade as less and less of SQLite index data fits in RAM.

A substantial part of the database contents are different hash based IDs that are repeated over and over again. After we had to disable clustered indices on hash IDs (remove WITHOUT ROWID), each row in each table contains an internal integer rowid value. If we add an integer id to each table and mark it as the primary key instead of the hash id, that integer id (let's call it nid, "numeric id" from now on) will not add any extra bytes to the records.

The nids can be used as references instead of the hash IDs, saving a lot of space as in most cases, SQLite will encode them using 2-to-5 bytes sized representation instead of 32 or 24 byte IDs, which are stored as blobs and take up 33 and 25 bytes respectively.

The schema change will be like this (atxs table as an example):

CREATE TABLE atxs
(
    id                  CHAR(32),
    prev_id             CHAR(32),
    epoch               INT NOT NULL,
    effective_num_units INT NOT NULL,
    commitment_atx      CHAR(32),
    nonce               UNSIGNED LONG INT,
    base_tick_height    UNSIGNED LONG INT,
    tick_count          UNSIGNED LONG INT,
    sequence            UNSIGNED LONG INT,
    pubkey              CHAR(32),
    coinbase            CHAR(24),
    received            INT NOT NULL,
    validity            INTEGER DEFAULT false
);

--->

CREATE TABLE atxs
(
    nid                 INT NOT NULL AUTO_INCREMENT,
    id                  CHAR(32) UNIQUE,
    prev_nid            INT,
    epoch               INT NOT NULL,
    effective_num_units INT NOT NULL,
    commitment_atx_nid  INT,
    nonce               UNSIGNED LONG INT,
    base_tick_height    UNSIGNED LONG INT,
    tick_count          UNSIGNED LONG INT,
    sequence            UNSIGNED LONG INT,
    pubkey_nid          INT,
    coinbase_nid        INT,
    received            INT NOT NULL,
    validity            INTEGER DEFAULT false,

    PRIMARY KEY (nid)
);

Inserting the records becomes a bit more tricky, but still can be done in a single SQL command (column order changed for simplicity):

insert into atxs (id, commitment_atx, epoch, …)
select ?1, (select nid from atxs where id = ?2), ?3, …

This involves querying some other ATXs, and SELECTing back the records will involve extra joins as well:

select a.id, c.id, a.epoch, ...
from atxs a
left join atxs c on a.commitment_atx_nid = c.nid
...

... but with the atxs table being much more compact this will not cause substantial performance impact as it happens with joins on the atxs table currently, and the gains due to reduced database size and correspondingly more efficient SQLite cache usage will outweigh these issues. We must do some benchmarks before finalizing the transition, though.

Bringing this a step further, we could augment go-scale with a possibility to replace hash-based IDs with nids in the blobs which are stored in the database. This will mean the need to re-encode the blobs when serving them to other peers, but that probably would not be overly costly operation (this needs to be carefully verified, though).

There are additional possibilities, as suggested by @poszu:

Another place to save storage in the blobs are the poet membership proofs. They are essentially merkle proofs, most of them proving (a different) leaf from the same merkle tree. There's a ton of duplication of the nodes of these merkle proofs. Currently, nodes store a merkle proof for nearly every leaf of the membership merkle tree separately in the blobs.

When implementing these changes the most problematic thing is are slow database migrations. We can do the transition in multiple stages but we must keep in mind that each stage might entail a hour+ database migration with a need for VACUUM. Also we must keep in mind that the later we implement the changes, the longer it will take to update the DBs. On the other hand, if we do not perform this transitions, we might face 200 GiB database a bit too soon, and migrating that amount of data will take unreasonably long time.

fasmat commented 5 months ago

I assume the new schema should look like this?

CREATE TABLE atxs
(
    nid                 INT NOT NULL AUTO_INCREMENT,
    id                  CHAR(32) UNIQUE,
    prev_nid            INT,
    epoch               INT NOT NULL,
    effective_num_units INT NOT NULL,
    commitment_atx_nid  INT,
    nonce               UNSIGNED LONG INT,
    base_tick_height    UNSIGNED LONG INT,
    tick_count          UNSIGNED LONG INT,
    sequence            UNSIGNED LONG INT,
    pubkey_nid          INT,
    coinbase_nid        INT,
    received            INT NOT NULL,
    validity            INTEGER DEFAULT false,

    PRIMARY KEY (nid)
);

if yes I suppose we call the nid field ID and the id field ATXID or similar to make the distinction clearer?

Regarding the suggestion with the ATX merkle proofs: the idea Bartosz and I had today in our call were to only store each PoET proof once and reconstruct the membership proofs on demand when gossiping / syncing the ATX to a peer.

The node already keeps the full PoET proofs for all PoET (rounds) it participated in and was able to fetch the proof afterwards. So any membership proof that uses a root that we have the full proof for doesn't need to be stored in the DB since it can be reconstructed when the ATX needs to be gossiped / broadcast to peers.

For PoET proofs where the node did not participate in its a bit more difficult, since the PoET might not be accessible to the node (i.e. a private PoET). In this case we have 2 options:

  1. request the full PoET proof from the peer that provided the ATX after successfully validating the membership proof and storing that PoET proof locally as well. This has the advantage of being straight forward but less efficient as the second option
  2. Store a partial PoET proof with all paths / leaves that are known. More membership proofs for the same root extend the local representation of that PoET proof and for all ATXs that the node has it can reconstruct the membership proofs, since the existing (partial) merkle-tree contains all the leaves needed for those identities.

The second option can theoretically also be used for all poet proofs (including the nodes own) as it results in the smallest amount of data that needs to be stored to be able to rebroadcast a (valid) ATX and not store any duplicated data. At the drawback that every new (valid) ATX needs to update an existing PoET proof and add leaves to it's merkle tree.

ivan4th commented 5 months ago

@fasmat

I assume the new schema should look like this?

Yes, thanks, updated the schema

if yes I suppose we call the nid field ID and the id field ATXID or similar to make the distinction clearer?

Here I'm not fully convinced we should do that. I'd prefer to avoid using nids directly in Go code as much as possible (except for blob re-encoding, should we do that), making them a "database schema implementation detail", thus I'd use nid column name. Hence hash-based ID remains the ATX ID from the code perspective.

Re PoET proof storage optimization: it looks like that deserves a separate issue indeed, and should probably be implemented separately from the rest of this optimization.

fasmat commented 5 months ago

Re PoET proof storage optimization: it looks like that deserves a separate issue indeed, and should probably be implemented separately from the rest of this optimization.

sure 🙂

pigmej commented 5 months ago

I think before doing that we need to remember about https://github.com/spacemeshos/go-spacemesh/issues/6069