We noticed that the transaction_details table in ScyllaDB takes ~2 TB of data, directly affecting the infra cost. We want to reduce the cost even more. Building on our successful experience of serving some data directly from the NEAR Lake AWS S3 storage, we propose to move the TransactionDetails data to object storage (GSC) to further offload ScyllaDB.
Since this structure is stored in bytes (borsh-serialized), it doesn't make any big difference to the performance (though some increase in latency, anyway, is expected, not that significant to overpay).
What is done
I've introduced a small crate (lib) tx-details-storage that interacts with the provided S3-compatible object storage to store and retrieve raw bytes
Extended the configuration crate with the dedicated config section for tx-details-storage. Updated config.example.toml to reflect new accepted parameters
Refactored tx-indexer to store the finished TransactionDetails to the object storage using the tx-details-storage library. The library doesn't handle serialization/deserialization to keep it as simple as possible, this is left for the indexer and rpc-server.
Refactored rpc-server to retrieve TransactionDetails from the object storage using the newly introduced library.
Additionally, I've extended the tx-indexer with a few metrics to monitor what's happening there.
Important note: Recently, we've adjusted the rpc-server to return half-baked (not finished, in progress) transaction details from the database cache table. This logic is preserved.
Next steps
We must wait to deliver this change. We have yet to get the data in an object storage.
I plan to make it in three phases:
✅ Create a small script that will walk over each transaction present in NEAR Protocol and copy the TransactionDetails from ScyllaDB to GCS. This has to be done for both testnet and mainnet
Start the new tx-indexer (from this PR) to continue collecting the data into the object storage
Replace the rpc-server instances with the new ones that can read from the object storage
(cleanup phase) Stop old tx-indexers, drop transaction_details table from ScyllaDB, downscale ScyllaDB
Update from 2024-07-03
The table transaction_details is growing and reaching the limits of the database we have right now. We don't want to scale it, so we need to stop the growth in the short-term. I refactored code a bit by leaving the legacy way of searching for the transaction in the database (Scylla) while we migrate.
I've added an additional metric for this legacy_database_tx_details to monitor the migration period. We expect that counters to be null for some time before we can consider migration as finished.
Rational
We noticed that the
transaction_details
table in ScyllaDB takes ~2 TB of data, directly affecting the infra cost. We want to reduce the cost even more. Building on our successful experience of serving some data directly from the NEAR Lake AWS S3 storage, we propose to move theTransactionDetails
data to object storage (GSC) to further offload ScyllaDB.Since this structure is stored in bytes (borsh-serialized), it doesn't make any big difference to the performance (though some increase in latency, anyway, is expected, not that significant to overpay).
What is done
tx-details-storage
that interacts with the provided S3-compatible object storage to store and retrieve raw bytesconfiguration
crate with the dedicated config section fortx-details-storage
. Updatedconfig.example.toml
to reflect new accepted parameterstx-indexer
to store the finishedTransactionDetails
to the object storage using thetx-details-storage
library. The library doesn't handle serialization/deserialization to keep it as simple as possible, this is left for the indexer and rpc-server.rpc-server
to retrieveTransactionDetails
from the object storage using the newly introduced library.tx-indexer
with a few metrics to monitor what's happening there.Important note: Recently, we've adjusted the
rpc-server
to return half-baked (not finished, in progress) transaction details from the database cache table. This logic is preserved.Next steps
We must wait to deliver this change. We have yet to get the data in an object storage.
I plan to make it in three phases:
TransactionDetails
from ScyllaDB to GCS. This has to be done for both testnet and mainnettx-indexer
(from this PR) to continue collecting the data into the object storagerpc-server
instances with the new ones that can read from the object storagetx-indexer
s, droptransaction_details
table from ScyllaDB, downscale ScyllaDBUpdate from 2024-07-03
The table
transaction_details
is growing and reaching the limits of the database we have right now. We don't want to scale it, so we need to stop the growth in the short-term. I refactored code a bit by leaving the legacy way of searching for the transaction in the database (Scylla) while we migrate.I've added an additional metric for this
legacy_database_tx_details
to monitor the migration period. We expect that counters to be null for some time before we can consider migration as finished.