pmwkaa / sophia

Modern transactional key-value/row storage library.
http://sophia.systems
Other
1.86k stars 154 forks source link

Roadmap #35

Open jwerle opened 11 years ago

jwerle commented 11 years ago

What is the roadmap for sophia?

pmwkaa commented 11 years ago

Thanks for the interest! :) I truly believe in work which is really personal, where you can put your soul and passion into, with no excuses.

Next release is going to be very important in terms of features and their impact on overall evolution. So, here is the roadmap:

sophia v1.2 (late november - december) (btm ~70% done)

features are going to be implemented after the release:

sophia v1.3

Hope you going to like it ;)

jwerle commented 11 years ago

I love all of it! Let me know how I can help in any way :)

jwerle commented 11 years ago

@pmwkaa I'd like to keep sphia(1) in line with the new features and changes to the code base. What you've mentioned for the roadmap could make this tool really useful as far as replication, backups, restoration, etc.

jwerle commented 11 years ago

@stephenmathieson care to join in on the fun? :)

stephenmathieson commented 11 years ago

hmm.. i haven't played with sophia, but i'm certainly interested

jwerle commented 11 years ago

Join the party! Would love some help with github.com/jwerle/sphia

pcdinh commented 11 years ago

@pmwkaa

sophia v1.3

    secondary indexes

I am a little bit confused when seeing this. Does it mean that sophia supports for complex data structure (not string or number) such hash-like ones or nested data structure like JSON object?

sophia v1.2 (late november - december) (btm ~70% done)
    completely multi-thread (everything, including cursors and any parallel operations)

What do you mean by "parallel operations"?

Thanks

pmwkaa commented 11 years ago

@jwerle i think that is a great idea) i will keep you informed about new updates and features, specifications or anything in v1.2:) Thanks!)

@pcdinh you already can store any document object like json, the only thing you need is to use your own custom compare functions which will retrive and compare you keys within a document and do according comparison.

Speaking about secondary index'es, there is no such support yet. Right now it is plain key-value database. But i think in a time i will add support for such functionality. I imaging it will be possible to maintain chained databases and do consistent updates on them in some optimized manner. It would be possible to separately query different index'es. Later, there would be support for online index creation, drop, etc. But there a still a long way in that direction, and that is not a priority right now.

By parallel operations, i mean there would be complete support for use in a user multi-thread environment, with a real mvcc transaction model. For eg., it would be possible to do consistent database traversal, doing updates in a same time and have a feel of real SERIALIZABLE isolation.

I think that the only thing that will change in v1.2 API for the user pointer of view, is that sp_begin() function will return transaction pointer. And that's is all ;)

For example:

void *db = sp_open(..) sp_set(db, key, value); # will do single-stmt transaction, semantic will not change

void *txn = sp_begin(db) sp_set(txn, key, value) # do multi-stmt transaction sp_set(txn, ...) sp_get(txn, ...) # will see changes made by current transaction or visible before it

sp_commit(txn) or sp_rollback(txn) or sp_destroy(txn)

awakmu commented 10 years ago

sophia v1.2 (late november - december) (btm ~70% done) pure mvcc implementation (storage engine is version-aware)

Someone tell me that supporting mvcc will make the code base bloated, is that true?

sophia v1.3 multi-process access protocol, replication (probably networked access)

Don't do it! It is better to develop sophia storage engine for MySQL or MariaDB. This is a sample implementation of LevelDB https://mariadb.atlassian.net/browse/MDEV-3841

pmwkaa commented 10 years ago

Someone tell me that supporting mvcc will make the code base bloated, is that true?

Yes, it's partly true. Introducing multi-version is a big task and mostly comparable to remake whole engine logic. But it's up to implementation anyway, i managed to make it as simple as possible and without visible performance degradation for now. lmdb for example have a very small multi-versional b-tree specific implementation.

Don't do it! It is better to develop sophia storage engine for MySQL or MariaDB.

Thanks! I will take a look on it :)

19h commented 10 years ago

I think replication (w/ or w/o networking) isn't supposed to be in a storage engine. That's a higher level issue!

Hot-backup, though, is already a great option.

awakmu commented 10 years ago

I think replication (w/ or w/o networking) isn't supposed to be in a storage engine. That's a higher level issue!

This is what I mean to be. If you create a storage engine in MySQL, then, replication, (not hot) backup, will be handled by MySQL.

mdcallag commented 10 years ago

A MySQL storage engine is a huge effort. It would be nice if there were a cleaner API (handler.h is huge and some behavior is obscure). The LevelDB storage engine that was cited above is a proof-of-concept, but some code from it could be reused here like the code for generating one byte array for a multi-part key. It would be nice if there were a chance for reuse between storage engines that have similar feature sets. But maybe the limited developer time is better spent making Sophia better and then integrating this into Tarantool.

dyu commented 10 years ago

"and then integrating this into Tarantool." That is the plan I think. :-)

awakmu commented 10 years ago

@mdcallag

I mentioned MySQL here because MySQL don't have this features (write optimized storage engine). I have read about TokuDB storage engine, but although it is GPL'ed, but it is patented technology. So we can't use that engine in production server, right?

mdcallag commented 10 years ago

I am not a lawyer so I won't answer your question about use. TokuDB is distributed as open source and included in MariaDB and Percona/MySQL. My brother works at Tokutek and is happy to speak with potential users.

ghost commented 10 years ago

Hi,

Any idea what the planned release date for sophia v1.3 (or a v1.3 release candidate) is?

pmwkaa commented 10 years ago

Hello,

Do you need a some particular feature, like secondary indexes?

It's been a while and i'm unfortunately can't tell any fixed date for sure right now. For the time left from last release, i've made a couple of new engine prototypes trying to improve sophia behavior on large data sets and memory management on high load. It took a lot of time, but i'm believe i'm on the right path right now.

sophia v1.2 development status: https://groups.google.com/forum/#!topic/sophia-database/C8zjKliVS3c

ghost commented 10 years ago

Yes, I am interested in secondary indexes. But, I was just curious. Lack of secondary indexes is not a showstopper for my project.

I'd rather a stable engine over new features so keep up the good work on your current track.

jwerle commented 10 years ago

@pmwkaa any goodies coming soon ?

jcspencer commented 10 years ago

Any news on compression, secondary indexes or networking?

pmwkaa commented 10 years ago

Yes! After several prototypes made, trying new ideas of internal design, i believe i found a good one to continue development with.

Work is going according to plan, and upcoming features are:

I've start working on integrating sophia as a disk storage for Tarantool project lately: http://tarantool.org https://github.com/tarantool/tarantool

Since i'm now able to share more time on sophia integration and it's development (as part of tarantool team), i plan to make a release in July.

Thanks for the interest! :)

jcspencer commented 10 years ago

Sounds great! I'm looking forward to seeing these new features!

pmwkaa commented 10 years ago

Development branch has been published: https://github.com/pmwkaa/sophia/tree/dev

https://groups.google.com/forum/#!topic/sophia-database/C8zjKliVS3c announce and intrigue :)

jwerle commented 10 years ago

Woot ! Thanks for a detailed update On Jul 22, 2014 1:12 PM, "Dmitry Simonenko" notifications@github.com wrote:

Development branch has been published: https://github.com/pmwkaa/sophia/tree/dev

https://groups.google.com/forum/#!topic/sophia-database/C8zjKliVS3c announce and intrigue :)

— Reply to this email directly or view it on GitHub https://github.com/pmwkaa/sophia/issues/35#issuecomment-49769536.

jwerle commented 10 years ago

@pmwkaa i'm getting really excited with all this dev work

pmwkaa commented 10 years ago

Trying to make it worth for a long wait. Hope you guys like it :)

LanceNorskog commented 9 years ago

Lucene is another database you might be interested in. It is the major open source text search engine, and has a modular "codec" plugin design for the actual key-value storage engine.

There have been other projects to use Cassandra as the storage engine for Lucene. The native engine is coded in Java. Sophia might have advantages over it.

http://lucene.apache.org/

pmwkaa commented 9 years ago

Thanks, i'll take a look :)