Skip obsolete logs when recovering

paritytech / parity-db

Experimental blockchain database

Apache License 2.0

263 stars 59 forks source link

Skip obsolete logs when recovering #201

Closed arkpar closed 1 year ago

arkpar commented 1 year ago

When recoveing logs, the database was operating under assumption that re-applying older commits is generally fine. In reality there are consistency issues when a older commit is re-applied without also replaying the whole sequence that leads to the latest commit. Here I've added a file that keeps the last record id. This file gets fsynced after syncing all the data files. Which should guarantee that all previous records are solid. When recovring logs there's now a check for such obsolete records and all such records are skipped.

arkpar commented 1 year ago

Imagine the database has commits A, B, and C fully applied in order. Than after the restart the logs for B and C are still arround. Re-enacting B and C would simply overwrite the same records and the database still ends up in the same state C. The problem is when the log for B is present, but the log for C is missing. We don't want to apply just B because it will break consistency.

Tpt commented 1 year ago

We don't want to apply just B because it will break consistency.

Genuine question (sorry if it's a dumb question): I am not sure to grasp how it breaks consistency. If we apply just B it means the database will be recovered up to B and not up to A if we don't recover. How is it breaking consistency more to recover up to B?

arkpar commented 1 year ago

Genuine question (sorry if it's a dumb question): I am not sure to grasp how it breaks consistency. If we apply just B it means the database will be recovered up to B and not up to A if we don't recover. How is it breaking consistency more to recover up to B?

Remember, that in this premise "C" was also applied previously. So the database has entries added in C. Aplying B over C (without also re-applying C) may violate app logics. E.g. in substrtate C sets both the best block and best finalized block pointer to 2. B sets the best block to 1. If you re-apply B over C you'll end up with best block=1 and best finalized=2 which violates substrate invariants. It is better to just not allow that on the database level.

arkpar commented 1 year ago

On the second thought this implementaiton does not guarantee that the record file contents matches the state on-disk. The problem here is that we dont fsync individual commits, but rather just fsync periodically. There's a simpler way to solve the issue I think. That is to simply make sure the logs are dropped always in order, so that recovery ends up with the actual latest log.

cheme commented 1 year ago

On the second thought this implementaiton does not guarantee that the record file contents matches the state on-disk. The problem here is that we dont fsync individual commits, but rather just fsync periodically. There's a simpler way to solve the issue I think. That is to simply make sure the logs are dropped always in order, so that recovery ends up with the actual latest log.

I did like the direction where we got specific record id register and checked (eg for debugging purpose), not sure if it can be kept in some way.