sirixdb / sirix

SirixDB is an an embeddable, bitemporal, append-only database system and event store, storing immutable lightweight snapshots. It keeps the full history of each resource. Every commit stores a space-efficient snapshot through structural sharing. It is log-structured and never overwrites data. SirixDB uses a novel page-level versioning approach.
https://sirix.io
BSD 3-Clause "New" or "Revised" License
1.11k stars 250 forks source link

Copy a resource to a new resource #589

Closed JohannesLichtenberger closed 9 months ago

JohannesLichtenberger commented 1 year ago

We should be able to reclaim space if desired and to be able to copy a resource to a new resource asynchronously starting at a given revision or point in time, while preserving the revision history. We have to specify if it's okay to generate new nodeKeys, which might not be the same as in the resource to copy from regarding a given subtree.

Nishant763 commented 1 year ago

Hi @JohannesLichtenberger , I would like to contribute to this repo.

JohannesLichtenberger commented 1 year ago

You'd like to contribute to this issue or you may also work on other issues, if you like!?

JohannesLichtenberger commented 1 year ago

@Nishant763 let me know. I might otherwise assign this one to myself, as I want to get this done ASAP :+1:

you may also work on other issues?

PrathyushaModala commented 1 year ago

Hey @JohannesLichtenberger Is this issue still open?? I would like to work on it..

JohannesLichtenberger commented 1 year ago

Let me know if you need help :-)

PrathyushaModala commented 1 year ago

Can you brief me more about this issue. please direct me an approach on how to solve it.

JohannesLichtenberger commented 1 year ago

The problem we try to solve is, two-fold:

As we store file offsets and due to the sliding snapshot algorithm it would be infeasable to reclaim space directly. The usual solution for the second problem would be to encrypt revisions and to delete the encryption key.

To reclaim space we can copy the resource to a new temp resource starting from a given revision, apply the change-sets from newer revisions and to delete the old resource and to rename tthenew resource to the old resource.

We store JSON files as change-sets (you can commit a few changed and have a look at the format).

PrathyushaModala commented 1 year ago

The problem we try to solve is, two-fold:

  • reclaim space through deletion of old revisions
  • data protection regulations, which might state, that old data must not be preserved for more than n months

As we store file offsets and due to the sliding snapshot algorithm it would be infeasable to reclaim space directly. The usual solution for the second problem would be to encrypt revisions and to delete the encryption key.

To reclaim space we can copy the resource to a new temp resource starting from a given revision, apply the change-sets from newer revisions and to delete the old resource and to rename tthenew resource to the old resource.

We store JSON files as change-sets (you can commit a few changed and have a look at the format).

Okay. On which resource should I work??

JohannesLichtenberger commented 1 year ago

You can import a JSON file you like and make some modifications via the JSONiq API for instance and a couple of commits.

JohannesLichtenberger commented 1 year ago

The change-tracking stores JSON files in the resources/resourceName/update-operations subfolder for JSON databases. For each trx commit a new file is generated which stores the kind of changes, the ctx nodes and the root nodes of the changes made. I think the format of the files are self explanatory, but ask questions in case if not.

JohannesLichtenberger commented 1 year ago

@PrathyushaModala any news?