man-group / ArcticDB

ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.
http://arcticdb.io
Other
1.51k stars 93 forks source link

Staged writes can end up in invalid state #1993

Open vasil-pashov opened 1 week ago

vasil-pashov commented 1 week ago

Describe the bug

Currently the flow for compact_incomplete/sort_merge is

  1. Read APPEND_DATA keys
  2. Compact the result into TABLE_DATA
  3. Create INDEX_DATA
  4. Remove all APPEND_DATA keys
  5. Create new version key
  6. Update ref key

If an the program dies right after APPEND_DATA keys are deleted but before the version key is created the symbol is in incomplete/unreadable state. The symbol won't appear in the list of incomplete symbols (because there are no append data keys) it won't be in symbol list as well, but there will be orphaned table data and index keys.

Steps/Code to Reproduce

Start compaction and kill the process right after sort_merge_impl or compact_incomplete_impl.

Expected Results

The symbol should be either in the list of incomplete symbols or in the symbol list.

To achieve this the sequence of operations must be changed. So that the append data keys are deleted after the ref key is updated.

OS, Python Version and ArcticDB Version

all

Backend storage used

No response

Additional Context

No response