kkli08 / KV-Store

Key-Value Storage Database
MIT License
1 stars 0 forks source link

Serialize kv pairs #47

Closed kkli08 closed 1 week ago

kkli08 commented 1 week ago
  1. Keys and Values as Byte Strings
    • treat keys and values as byte sequences (i.e., arrays of bytes). This means they don't impose any type constraints on the key or value.
    • serialize it into a byte array (using protobuf)
  2. Data Stored in Sorted Order
kkli08 commented 1 week ago

How to handle different data types

Strategy 1: Serialization with Type Prefixing

Type-Agnostic Keys: The keys in RocksDB/LevelDB are treated as opaque byte arrays. If you want to support multiple data types (int, string, char, double), you serialize the key along with a type prefix.

Type Prefix: A prefix byte (or a few bytes) is used to indicate the type of the key (e.g., 0x01 for int, 0x02 for string, etc.). This allows the database to know how to interpret the rest of the byte array.

Serialization Format: Each key is serialized into a format where it begins with a type prefix, followed by a type-specific serialized value. You might also need to ensure that numeric types like int and double are serialized in big-endian order to preserve their natural ordering.

enum KeyType {
    INT = 0x01,
    STRING = 0x02,
    CHAR = 0x03,
    DOUBLE = 0x04
};

// Serialize an int key
int key = 42;
std::string serialized_key;
serialized_key.push_back(static_cast<char>(KeyType::INT));  // Add type prefix
serialized_key.append(serializeFixedLengthInt(key));  // Serialize key as big-endian

// Serialize a string key
std::string string_key = "user1";
std::string serialized_string_key;
serialized_string_key.push_back(static_cast<char>(KeyType::STRING));  // Add type prefix
serialized_string_key.append(string_key);  // Append string

Strategy 2: Fixed-Length Encoding for Numeric Keys

Numeric types such as int and double are serialized into fixed-length byte arrays to ensure they can be compared lexicographically in a meaningful way. For example:

int32 and int64 are serialized into 4-byte and 8-byte arrays, respectively, using big-endian order. Big-endian ensures that numbers maintain their natural ordering when compared byte by byte, which is important for sorting and binary search.

// Serialize an int64 into big-endian fixed length
int64_t key = 123456789;
std::string serializeFixedLengthInt(int64_t key) {
    std::string result(8, 0);  // 8 bytes for int64
    for (int i = 0; i < 8; ++i) {
        result[7 - i] = static_cast<char>((key >> (i * 8)) & 0xFF);
    }
    return result;
}
kkli08 commented 1 week ago

Update class with KeyValue Class