Closed ysono closed 3 years ago
Maybe this is a silly question: This consideration seems to stem from the need for an empty value. Is there a use-case you have in mind for empty values?
If we're choosing to not distinguish empty-valued entries and nonexistence, then that's a contract we need to specify. The client needs to be aware that using the put
api may result in data deletion (no-op if there was no data before).
An empty value can be interpreted in several ways:
{
"reservation:room:123:alice": _,
"reservation:room:123:bob": _
}
The "room 123" set contains alice and bob.
There can be arbitrary business logic that differentiates nonexistence and emptiness, say a previously deleted shopping cart and a newly created shopping cart.
When an operation reads one key, it might be fine -- say we're trying to append on item to a list-valued entry. But for an operation that reads two keys, user may desire different handling based on whether zero or one or both entries are missing.
If we were building a production-worthy db, I wouldn't make a low-level design decision that impacts high levels so widely. But since we're writing a tool to learn on, design decisions should keep things from becoming too complicated and should enable us to learn more stuff.
I think I'll read up on how DBs handle types, before moving forward.
If we're choosing to not distinguish empty-valued entries and nonexistence, then that's a contract we need to specify.
i agree it would be helpful to establish an API contract for public methods, and lean on those to guide design decisions.
(establishing a contract for get, put, scan, delete, and perhaps has_key)
then maybe it's clear what the underlying implementation must do in order to satisfy the contract?
I think I'll read up on how DBs handle types, before moving forward.
Here's one from Scylla DB: https://docs.scylladb.com/architecture/sstable/sstable2/sstable-data-file/
and Cassandra: http://distributeddatastore.blogspot.com/2013/08/cassandra-sstable-storage-format.html
pub fn get(&self, k: Key) -> Result<Option<Value>>
Perhaps get ought to return Result<Value>
? and get(k)
a non-existent key k
returns an "Err::NotExists"?
Am I misunderstanding the use-case for Option
again? lol
I agree with the need to define a tombstone to be different from the empty value. Sorry it took me a while to wrap around it.
Perhaps now that I am onboard, we can bring the discussion back to your original offering of ways to represent tombstones and values.
🙏🏽
Is the discriminant essentially an enum
which specifies the type of the value?
No problem. I agree that Option
can hide intention. IMO we should go ahead and convert it to explicit enums. We'd lose monad methods like .map()
but IMO they make things harder to read.
The type that the pub fn get()
returns should prob be different from the internal storage::api::*
.
mod storage {
pub enum Datum {
...
}
pub struct Key(Datum)
pub enum Value {
Tombstone,
Datum(Datum),
}
}
mod frontend {
pub use storage::Datum;
pub enum Value {
None, // http 404
Datum(Datum), // http 200 or 204
}
}
Yeah by discriminant I mean what rust calls the integer version of an enumerated type.
The scylla and cassandra docs (thanks for sharing) validated the need for there to be type info serialized. Cassandra uses a one-byte discriminant. Scylla seems to punt it to some deserialization mechanism, which I couldn't figure out at a quick look at the source code. I argue it's a good interface for our db to encode primitive info, support tuple, and make client be responsible for parsing tuples --ie clients don't have to parse primitives.
To clarify - most use cases would put externally serialized data into a bytes
-typed blob in the db. The tuple thing is for db-aware separation of data, like searching by a non-primary key.
Terminology:
Let me attempt to clarify the terminology first. (I'm open to calling things differently.)
One "item" is either a key or a value, and it's represented as
|datum_size|datum|
-- format|size_of(usize)|(variable length)|
-- byte sizeA key-value entry is presented as
|datum_size|datum|datum_size|datum|
-- format|size_of(usize)|(variable length)|size_of(usize)|(variable length)|
-- byte sizeIssue:
Right now, both the tombstone item and the "empty datum" item are presented as
|0usize||
-- serializedSolutions:
Option 1: Change datum_size to be typed
isize
.|size_of(isize)|(variable length)|
-- byte size|-1isize||
-- serialized tombstone|0isize||
-- seriazlied "empty datum"Option 2: Add discriminant into the serialization format.
The presence of discriminant allows us to add explicit data type info in the serialized format, such as integer, string, tuple, maybe even custom types. I'd like to know if this is what other storage engines do, or whether they punt the interpretation to some mechanism outside serde, and if so what this mechanism is.
Let's call "span" := discriminant + datum.
"item" := span_size + span
|span_size|discriminant|datum|
-- format|size_of(usize)|size_of(u16)|(variable length)|
-- byte sizeSpan size needs to come first (i.e. before discriminant) b/c that allows a reading client that is not interested in deserializing datum to seek ahead in file.
The choice of
u16
to represent discriminant is arbitrary. In case we need to deprecate supported discriminants over time, this allows us(pow(2,16) - count_of_active_discriminants)
deprecations, before rolling over to zero.If we go with option 2, we should go ahead and add discriminant for both key and value. Not only would doing so simplify serde, it would add a new feature that keys can be integer, tuple, etc.
For in-memory interpretation, we would have a related change: Both key and value wrap datum.
The primary index (i.e.
struct LSM
) concepts should not need to change.How would this affect secondary indexing?
A secondary index maps { a portion of a value extracted in a specific way : associated values }. It seems to me that Option 2 offers an opportunity to specify the extraction.
At a hand-waving level, one secondary index needs to have the following things:
("pk_cl_offset" := identifier of primary key's commit log and the file offset therein)
("pk_sst_offset" := identifier of one of primary key's sstables and the file offset therein)
Tuple(Str, I64, Str)
, extract the 2ndStr
"This has been an idea dump based on a lot of guesswork in the head. Please point out blindspots, inconsistencies, gotchas, and also examples of rela-life DBs if you find them!