Open ivan4th opened 5 months ago
I assume the new schema should look like this?
CREATE TABLE atxs
(
nid INT NOT NULL AUTO_INCREMENT,
id CHAR(32) UNIQUE,
prev_nid INT,
epoch INT NOT NULL,
effective_num_units INT NOT NULL,
commitment_atx_nid INT,
nonce UNSIGNED LONG INT,
base_tick_height UNSIGNED LONG INT,
tick_count UNSIGNED LONG INT,
sequence UNSIGNED LONG INT,
pubkey_nid INT,
coinbase_nid INT,
received INT NOT NULL,
validity INTEGER DEFAULT false,
PRIMARY KEY (nid)
);
if yes I suppose we call the nid
field ID
and the id
field ATXID
or similar to make the distinction clearer?
Regarding the suggestion with the ATX merkle proofs: the idea Bartosz and I had today in our call were to only store each PoET proof once and reconstruct the membership proofs on demand when gossiping / syncing the ATX to a peer.
The node already keeps the full PoET proofs for all PoET (rounds) it participated in and was able to fetch the proof afterwards. So any membership proof that uses a root that we have the full proof for doesn't need to be stored in the DB since it can be reconstructed when the ATX needs to be gossiped / broadcast to peers.
For PoET proofs where the node did not participate in its a bit more difficult, since the PoET might not be accessible to the node (i.e. a private PoET). In this case we have 2 options:
The second option can theoretically also be used for all poet proofs (including the nodes own) as it results in the smallest amount of data that needs to be stored to be able to rebroadcast a (valid) ATX and not store any duplicated data. At the drawback that every new (valid) ATX needs to update an existing PoET proof and add leaves to it's merkle tree.
@fasmat
I assume the new schema should look like this?
Yes, thanks, updated the schema
if yes I suppose we call the
nid
fieldID
and the id fieldATXID
or similar to make the distinction clearer?
Here I'm not fully convinced we should do that. I'd prefer to avoid using nids directly in Go code as much as possible (except for blob re-encoding, should we do that), making them a "database schema implementation detail", thus I'd use nid
column name. Hence hash-based ID
remains the ATX ID from the code perspective.
Re PoET proof storage optimization: it looks like that deserves a separate issue indeed, and should probably be implemented separately from the rest of this optimization.
Re PoET proof storage optimization: it looks like that deserves a separate issue indeed, and should probably be implemented separately from the rest of this optimization.
sure 🙂
I think before doing that we need to remember about https://github.com/spacemeshos/go-spacemesh/issues/6069
Description
Currently, go-spacemesh state database size keeps growing and accordingly the performance continues to degrade as less and less of SQLite index data fits in RAM.
A substantial part of the database contents are different hash based IDs that are repeated over and over again. After we had to disable clustered indices on hash IDs (remove
WITHOUT ROWID
), each row in each table contains an internal integer rowid value. If we add an integer id to each table and mark it as the primary key instead of the hash id, that integer id (let's call itnid
, "numeric id" from now on) will not add any extra bytes to the records.The nids can be used as references instead of the hash IDs, saving a lot of space as in most cases, SQLite will encode them using 2-to-5 bytes sized representation instead of 32 or 24 byte IDs, which are stored as blobs and take up 33 and 25 bytes respectively.
The schema change will be like this (
atxs
table as an example):Inserting the records becomes a bit more tricky, but still can be done in a single SQL command (column order changed for simplicity):
This involves querying some other ATXs, and SELECTing back the records will involve extra joins as well:
... but with the
atxs
table being much more compact this will not cause substantial performance impact as it happens with joins on theatxs
table currently, and the gains due to reduced database size and correspondingly more efficient SQLite cache usage will outweigh these issues. We must do some benchmarks before finalizing the transition, though.Bringing this a step further, we could augment
go-scale
with a possibility to replace hash-based IDs with nids in the blobs which are stored in the database. This will mean the need to re-encode the blobs when serving them to other peers, but that probably would not be overly costly operation (this needs to be carefully verified, though).There are additional possibilities, as suggested by @poszu:
When implementing these changes the most problematic thing is are slow database migrations. We can do the transition in multiple stages but we must keep in mind that each stage might entail a hour+ database migration with a need for
VACUUM
. Also we must keep in mind that the later we implement the changes, the longer it will take to update the DBs. On the other hand, if we do not perform this transitions, we might face 200 GiB database a bit too soon, and migrating that amount of data will take unreasonably long time.