njaard / sonnerie

A simple timeseries database
Other
266 stars 19 forks source link

Allow realtime erase #12

Closed njaard closed 2 years ago

njaard commented 3 years ago

It should be possible to create a special entry that indicates "delete a range", which causes that range to simply not be read.

The special entry can be compacted away.

This will be very difficult.

Rationale

Sometimes you have a lot of records that you want to delete. Right now, you can delete them with a strategic compact --gegnum. For example, I often do compact -M --gegnum 'awk { if ($1 ~ /^prefix/) { } else { print } }' to delete all the records with keys that begin with prefix. This awk-based command can also filter records based on time and date. However, this is slow (linear time). You have to wait for the entire database to be scanned and rewritten before the deletion is complete.

This proposal instead suggests a different kind of record that marks a range to be deleted. This would be a small amount of data that instructs the reader to "ignore these records". That record is written in constant time and quickly, it essentially takes effect immediately and still conforms to all the transactional expectations.

Implementation

I propose a record with a format string of simply "\u007F", (i.e., the "DELETE" character).

The code in merge.rs would have special behavior when it sees the DELETE record; it holds it and applies its effect to each record that meets the criteria encoded in that DELETE record. When it can prove that no more following record can possibly meet the criteria of deletion, it can remove it. More than one deletions can be in effect at any given time, and so it may need to apply and hold multiple deletion operations.

To prevent complexities about having to read-ahead in a transaction file (or in that segment), a DELETE record must be the only record in both its segment and its file-of-segments.

Here you see that there are 5 distinct fields that can be used to do deletions, and they must all simultaneously complied with:

  1. The record's actual key representing "first key"
  2. First timestamp
  3. Last timestamp
  4. Key wildcard
  5. Last key

All of these fields are also possible to, somehow, mark as "we don't actually delete on this criteria".

When we are reading a database and processing a deletion, only the "last key" and "key wildcard" fields can actually have the effect of causing the Merge code to forget about the deletion.

When a major compaction is done, the DELETE command is applied, and then the actual DELETE record is removed in the final task. However when a minor compaction is performed, the DELETE record, and therefor its transaction file, must be retained.

API

The CreateTx struct shall be extended with a delete function that accepts the parameters specifying the criteria for deleted records.

All of the reading functions in DatabaseReader will transparently apply the deletions and never generate the record representing the deletion, except under the circumstance where the DatabaseReader is being used while in the process of a minor compaction (as the record must be retained). I don't know how that mode should be indicated, maybe when include_main_db==false?

All of the internal structs, such as in key_reader.rs will necessarily need to output the DELETE records.

User Interface

In the sonnerie CLI tool, the command can be called delete and has the same parameters as read. It produces no other output other than explaining why it may have failed (which can only be due to impossible constraints like --first-key came after --last-key.

Future Possibilities

I do not propose a way to delete records based on their value. That could be possible one day. Or maybe that's better to do as part of a compaction and just add a special UI for that that's a little easier than awk.

Per this proposal, a deletion must be the one and only record in a transaction file, that means that trying to perform a delete after any record has been added is invalid. Instead, we could have it rewrite the transaction on the fly (i.e., in linear time), while having the first record in the transaction be the DELETE so it can still be applied to previous, already-committed transactions.

njaard commented 2 years ago

Done in https://github.com/njaard/sonnerie/commit/1271faa78792248982742ac9bcd1fca249d98fc1