near / near-indexer-for-explorer

Watch NEAR network and store all the data from NEAR blockchain to PostgreSQL database
https://near-indexers.io/docs/projects/near-indexer-for-explorer
GNU General Public License v3.0
124 stars 56 forks source link

Consider using table partitioning #132

Closed telezhnaya closed 2 years ago

telezhnaya commented 3 years ago

The amount of data grows non-linearly. account_changes table (sorry, that's my favorite one) had 4.5M lines on Jan 1, 2021 (half a year from the genesis, or 2.5 months from enabling transfers). Now we are collecting 5M lines in one week (Jun 28 - Jul 5). It's 49M lines there, and the number grows faster and faster.

I think it's better not to wait for the apocalypse and set up table partitioning. https://www.postgresql.org/docs/10/ddl-partitioning.html The doc says a rule of thumb is that the size of the table should exceed the physical memory of the database server. Could someone please provide the RAM size of our server?

Even without the size of RAM, I want to nominate to partitioning most of our tables: account_changes (49M), action_receipt_actions (29M), action_receipts (28M), blocks (32M), chunks (32M), execution_outcome_receipts (17M), execution_outcomes (28M), receipts (29M), transaction_actions (12M), transactions (12M).

I have never done that on production, but I am ready to learn about it more and play with it on the copy of our DB.

telezhnaya commented 2 years ago

We discussed that in person; main conclusion: INSERT is not our problem for now, and it loks like partitions will not increase the speed of SELECTs. See #189 for more details