spirit-labs / tektite

Tektite DB
http://www.tektitedb.com
Other
173 stars 26 forks source link

SSTable: reduce serialized size by using varints #270

Open purplefox opened 1 week ago

purplefox commented 1 week ago

The serialized size of an SSTable can be reduced by using varints for some fields.

rajatkb commented 1 week ago

@purplefox Wanted to take a jab at this issue. Can you give some pointers on where and what should I get started on ?

purplefox commented 1 week ago

Hi @rajatkb sstable is here https://github.com/spirit-labs/tektite/blob/main/sst/sstable.go

As you can see the metadata fields are fixed length int fields, the idea was to save some bytes by using varints instead.

rajatkb commented 1 week ago

Got it thanx . @jlerche pointed me to the code point. Going through the layout comments and the code two things caught my eye

EDIT : @jlerche pointed out over slack that choice of uint32 is because of smaller memtable sizes so we would not ever need large addressable space

purplefox commented 6 days ago
rajatkb commented 6 days ago

Makes sense. So right now for this issue I can only focus on the metadata fields : https://github.com/spirit-labs/tektite/blob/b6552f7244fc5eeedda0607afa7ccfce767f213e/sst/sstable.go#L18

And the key length / offset fields ?

Also for implementation , having the individual uint32 fields translated back forth between []byte using varint package for now will do in that case ?

rajatkb commented 2 days ago

https://github.com/spirit-labs/tektite/pull/301

I have raised a first version for addressing this. Although, I am not sure if doing this for the metadata field is worth the extra complexity in the code. Maybe if we can try it for the indexes for the keyOffset data or keyLength and valueLength data. Might save more space there.