Open prologic opened 9 months ago
That's better. To test this branch (not yet fully working):
git clone https://git.mills.io/prologic/bitcask
cd bitcask
git checkout refactor_trie
cd ..
git clone https://github.com/ostafen/clover
cd clover
go work init
go work use .
go work ../bitcask
Ahh I think I've found my first problem. These lines https://github.com/ostafen/clover/blob/aa688ad9b8b26dddfe7432c5d94a403c8ad9c2c4/db.go#L92-L95 assume that all databases don't return an error for "key not found". Bitcask does, it returns an bitcask.ErrKeyNotFound
error a a nil
value. We don't assume values that are nil are "not found" Hmmm what to do... 🤔
Nice, that got things working a little better 🥳
$ bitcask -p test.db dump | jq '. | map_values(@base64d)'
{
"key": "coll:todos",
"value": "{\"Size\":0,\"Indexes\":null}"
}
Most tests pass now, except thi sone:
=== RUN TestUpdateCollection/bitcask
Which appears to be "handing" hmmm
Hey, @prologic, first of all thank you for the PR and the interest in clover.
I have one question to you: is your Bitcask
storage engine able to support sorted iterations on keys?
Hey, @prologic, first of all thank you for the PR and the interest in clover. I have one question to you: is your
Bitcask
storage engine able to support sorted iterations on keys?
Yes it does.
Once I get this working, are we good to merge this without full transaction support? (which Bitcask has never had support for, until now, which is going to be possible since I'm nearing making a decision to swap out the internal trie implementation that's used)
Any advantage is using Bitcask
against this storage engine https://github.com/nutsdb/nutsdb?
As I understand they are both based on the Bitcask
model (nutsdb additionally supports transactions). I would be interested in understanding which one can be better addition to cloverdb
Based on this comparison of nutsdb vs. others the main advantage of using Bitcask is its use of a trie:
Compared with B+ trees, radix trees have smaller read and write amplifications since they do not store the entire keys in internal nodes
Otherwise I'm not really that familiar with NutsDB myself, and it looks like it was developed around the same time I was developing Bitcask (although I no longer actively use Github to store/collaborate on my projects anymore :/)
I've not done any other types of comparisons either and don't really want to get into "benchmark wars" 🤣 -- As an aside, I've used Bitcaks in many production projects, and it's used a few bit around the place if you look here
I'm also thinking about and planning to extend Bitcask's functionality a bit to support flushing the keyspace out to disk and using something like SSTables in additional to the WAL+LSM and Radix tree already in use. My hope/goal is to be able to use Bitcask for much larger datasets, where currently the limiting factor is "the entire keyspace has to be held in memory".
If I can ask, do you need to run clover
on top of Bitcask
for any specific project/workload type?
If I can ask, do you need to run
clover
on top ofBitcask
for any specific project/workload type?
I was intending to use it for a new production project (startup). yes. Why's that? 🤔
BTW, my main concern for Bitcask
is its lack for transaction support. Ideally, each storage engine supported by clover should offer same guarantee (for example, all documents should be inserted or modified in a transaction.
If Bitcask can provide transactions then I'm happy to merge it into clover code base, otherwise better option to go is separate repository containing cloverdb-bitcask
storage engine which I can link in the README
I was intending to use it for a new production project (_startup_). yes. Why's that?
Just interested in the usage :=)
Hmm have a few more tests to figure out why they're failing...
Example:
=== RUN TestSortWithIndex/bitcask
db_test.go:1136:
Error Trace: /Users/prologic/Contributions/clover/db_test.go:1136
/Users/prologic/Contributions/clover/db_test.go:86
Error: Not equal:
expected: 4408
actual : 0
Test: TestSortWithIndex/bitcask
--- FAIL: TestSortWithIndex (0.96s)
BTW, my main concern for
Bitcask
is its lack for transaction support. Ideally, each storage engine supported by clover should offer same guarantee (for example, all documents should be inserted or modified in a transaction. If Bitcask can provide transactions then I'm happy to merge it into clover code base, otherwise better option to go is separate repository containingcloverdb-bitcask
storage engine which I can link in the README
I will likely be adding this support, so we should be all good 👌
I was intending to use it for a new production project (_startup_). yes. Why's that?
Just interested in the usage :=)
I basically don't want to reinvent my own "document storage" engine 🤣 You seem to have done s nice job of that already 🤣 -- I have this really long standing PR where it adds List, Hash and SortedSet data structure to Bitcask, but honestly no-one (myself included) have really ever bothered using this 😅 So it justs sits there. There is also bitraft which also uses Bitcask internally that once day I hope to spend a bit more time with 🤔
BTW, before starting clover project, I did an attempt to implement a bitcask based storage engine too: https://github.com/ostafen/eagle (it's mainly experimental). I also used your project as a reference. I never lost the interest in building a really robust bitcask based storage engine (but I would need definetely more time), so I was thinking that we could collaborate if you like
BTW, before starting clover project, I did an attempt to implement a bitcask based storage engine too: https://github.com/ostafen/eagle (it's mainly experimental). I also used your project as a reference. I never lost the interest in building a really robust bitcask based storage engine (but I would need definetely more time), so I was thinking that we could collaborate if you like
I would love that ! 😍 I've had many good contributors come and go over the years and many folks love my version of Bitcask 😅 (I do too!) -- It's not perfect, but it works quite well and I use it everywhere. I'd still love to keep improving it, optimizing it and making it one of the best pure-Go KV stores around (although Badger, BBolt and others are pretty good too, but pro/cons 🤷♂️)
If you are interested maybe we can continue our discussion about this privately
Sure thing!
or perhaps you are also ok to continue to discuss that here publicly let's learn about that too :sweet_smile:
-- shane.xb.qian
@prologic: could you share some contact info? Email address? @Shane-XB-Qian: I guess this page is not definetely the context. But if you are interested, you are welcome to join private discussions too :=)
On my website and twtxt.net/~prologic 👌
Cool, sent you an email
This PR adds (or tries to so far) support for Bitcask an embedded KV store that uses a WAL+LSM and is optimized for sequential writes, fast low latency reads and high throughput.
This is still a work-in-progress as I've had to make changes in Bitcask itself in the refactor_trie branch which adds support for an Iterator/Custor (and I may also add support for Transactions too!)
The tests are not yet passing, and I need some help with this actually as I may have gotten some of the implementation wrong 🤔