waku-org / nwaku

Waku node and protocol.
Other
184 stars 47 forks source link

Add retention policy with GB or MB limitation #1885

Closed Ivansete-status closed 2 months ago

Ivansete-status commented 10 months ago

Background

We came across an issue in HK-prod node where the SQLite database reached 39GB. We need a mechanism to apply a retention policy so that the database doesn't grow more than a certain amount of GB, or MB.

Acceptance criteria

The Waku node with Store mounted should be able to limit the maximum space occupied by the underlying database. This applies to both sqlite and postgres.

The new retention policy should support the next formats:

--store-message-retention-policy=size:25GB
--store-message-retention-policy=size:25gb
--store-message-retention-policy=size:250MB
--store-message-retention-policy=size:250mb
... any of the case combinations
SionoiS commented 10 months ago

If the cleaning (vacuuming?) require space it should also be taken into account.

Ivansete-status commented 10 months ago

If the cleaning (vacuuming?) require space it should also be taken into account.

What do you mean sorry?

richard-ramos commented 10 months ago

Some useful queries for this task:

-- sqlite
SELECT page_count * page_size as size FROM pragma_page_count(), pragma_page_size();

This will return total DB size, including free pages, so it should take into account VACUUM


-- postgresql
SELECT pg_size_pretty( pg_database_size('dbname') );
SionoiS commented 10 months ago

If the cleaning (vacuuming?) require space it should also be taken into account.

What do you mean sorry?

The algo. that delete and move messages around require memory and disk space. If the node is already at max disk capacity it could fail?

Ivansete-status commented 10 months ago

If the cleaning (vacuuming?) require space it should also be taken into account.

What do you mean sorry?

The algo. that delete and move messages around require memory and disk space. If the node is already at max disk capacity it could fail?

ah yes! You are absolutely right! The user might need to leave some space. Interesting to add this recommendation in the wakunode2 description parameter.

ABresting commented 8 months ago

Added the changes, with the test case, and tested on the machine several times. Kindly go through it!

ABresting commented 8 months ago

Weekly update: Added the new retention policy based on DB size. Users can provide the size such as ex. 30gb (which is also the default) --store-message-retention-policy=size:25GB --store-message-retention-policy=size:25gb --store-message-retention-policy=size:250MB --store-message-retention-policy=size:250mb Test case also added. Outdated messages/rows are deleted to suffice the size limit, with 20% size reduction upon overflowing.

ABresting commented 7 months ago

Weekly update: In review: the database bug to delete limited messages/rows Upcoming/working: updated retention policy + test + missing tes on timestamp based retention policy

Undergoing: MUID concept on message level

ABresting commented 7 months ago

Closing this issue, seems feature complete. Also made changes related to Infra-level testing.

Ivansete-status commented 7 months ago

Hey @ABresting! Although it is awesome the implementation so far, I think we cannot close this issue yet because we need to double-check that it works correctly on Postgres.

chair28980 commented 6 months ago

Can be completely de-prioritized per nwaku PM meeting 2023-11-07.

fryorcraken commented 6 months ago

Does not seem to work for PostgreSQL:

ESC[36mnwaku_1              |ESC[0m DBG 2023-11-28 14:11:03.624+00:00 executing message retention policy         topics="waku node" tid=1 file=waku_node.nim:753
ESC[36mnwaku_1              |ESC[0m ERR 2023-11-28 14:11:03.662+00:00 exception in getInt, parseInt              tid=1 file=postgres_driver.nim:424 error="invalid integer: "
export EXTRA_ARGS="--store-message-retention-policy=size:30GB"

Justification of the de-prioritization?

Ivansete-status commented 5 months ago

The current implementation is correct from the nwaku point of view, but we need an additional action to effectively reduce the size occupied by the database.

The database size is not properly updated when deleting rows as per how Postgres works. i.e. the rows are marked as "deletable". Therefore, we have these options:

  1. Perform VACUUM. This doesn't give space back to the OS. Instead, the freed space is reserved to be used again by Postgres.
  2. Perform VACUUM FULL. This frees space stored by the database and brings it back to the OS but has a very big drawback, which is that it blocks the whole database for writing and reading.
  3. Use pg_repack. This should be done manually and the Postgres database needs to have this extension installed.
  4. ...

@fryorcraken @ABresting @apentori @jakubgs @yakimant - what is the best approach from your point of view to maintain the Postgres database size under a certain level?

fryorcraken commented 5 months ago

When using (1) I guess the trick is to ensure it is performed before the database reaches maximum size, right? So we may need to operate with some buffer where we delete and VACUUM before max size is reached. E..g delete 20% of msg and RUN VACUUM when database reached 95% of max size.

the 20% need to be selected so that the VACUUM Works efficiently and also, we don't end up VACUUM ing every 10 s.

ABresting commented 5 months ago

When using (1) I guess the trick is to ensure it is performed before the database reaches maximum size, right? So we may need to operate with some buffer where we delete and VACUUM before max size is reached. E..g delete 20% of msg and RUN VACUUM when database reached 95% of max size.

the 20% need to be selected so that the VACUUM Works efficiently and also, we don't end up VACUUM ing every 10 s.

I believe that initially it was designed to work like this, when retention limit is reached, remove the 20% of rows.

Like you suggest we can now assume if user has given X size limit then from this we can calculate the new retention limit (95% of input) just to be on the safer side. BUT the issue before was with SQLite, may be with this PR it has been resolved.

Ivansete-status commented 5 months ago

Hey there!

imo, the approach 1 - VACUUM would be the best option but we need to adapt the code because right now it would apply the database reduction each time the retention policy is executed. Let me elaborate with an example:

  1. Scenario: a. size limit == 1000 bytes b. database-size measured == 2000 bytes (the Postgres driver gives this info) c. the database has 100 rows
  2. The retention policy is applied because 1.b > 1.a, and then 20% of the rows are deleted
  3. Scenario: a. size limit == 1000 bytes b. database-size measured == 2000 bytes (the Postgres driver still measures 2000 bytes because the VACUUM operation doesn't return the space to the OS) c. the database has 80 rows
  4. This will repeat and the database will become empty in the end.

( cc @fryorcraken @ABresting )

yakimant commented 5 months ago

@Ivansete-status, I need to read more to understand PG vacuuming. But as from you desctription - I like approach 1.

yakimant commented 5 months ago

BTW, did you have a look at autovacuum? https://www.postgresql.org/docs/current/routine-vacuuming.html#AUTOVACUUM

Ivansete-status commented 5 months ago

BTW, did you have a look at autovacuum? https://www.postgresql.org/docs/current/routine-vacuuming.html#AUTOVACUUM

Thanks for the comments and suggestion! At first I was kind of reluctant of having an automatic process but is something it worth considering too. Apparently it doesn't cause too much read/write blocks: https://stackoverflow.com/questions/73655953/is-postgres-autovacuum-process-blocking-reading-and-writing-the-table#:~:text=It%20will%20take%20light%2Dweight,block%20ordinary%20reads%20and%20writes.

jakubgs commented 5 months ago

I don't really see the point of an automatic VACUUM FULL if that space is going to get filled again shortly.

We don't care if filesystem space is used, we care that we don't hit the filesystem size limit and crash and burn.

yakimant commented 4 months ago

Best article on SQLITE vacuuming I've seen: https://blogs.gnome.org/jnelson/2015/01/06/sqlite-vacuum-and-auto_vacuum/ Posting for history.

fryorcraken commented 3 months ago

Note we are rescoping this under https://github.com/waku-org/pm/issues/119 to match new process.

Ivansete-status commented 2 months ago

I'm closing this issue as per the work made in https://github.com/waku-org/nwaku/pull/2506