isar / hive

Lightweight and blazing fast key-value database written in pure Dart.
Apache License 2.0
4.02k stars 401 forks source link

The future of Hive #246

Closed simc closed 2 years ago

simc commented 4 years ago

TLDR: Hive 2.0 will be rewritten in Rust to get amazing performance, multithreaded queries, read and write transactions, and super low memory usage. The code will work 1:1 in the browser.

Situation

I have thought a long time how to correctly implement queries in Hive and I have come to the conclusion that it is not possible with the current architecture. I have reviewed many projects on GitHub which use Hive and most of them have to create their own suboptimal workaround for queries. Apart from queries, Hive has another problem: Dart objects use much RAM. Since Hive currently keeps at least the keys in RAM, you can hit the OS limit of mobile devices quite fast.

I also created polls some time ago on the Hive documentation page and there were two very strong takeaways:

  1. Queries are something almost every user wants
  2. An overwhelming majority (86%) of users don't mind breaking changes

Idea

So here is what I have come up with: I will completely rewrite Hive in Rust. I will use the strengths of the old implementation (like auto migration) and fix the issues. On the VM, Hive will use LMDB as backend and on the Browser IndexedDB. The VM implementation will provide the same features as IndexedDB to allow easy code sharing. The two main goals of Hive will stay the same: Simplicity and Performance.

I have a small prototype and the performance is amazing. LMDB has to be some kind of black magic.

Sample

Here is how it is going to work:

The model definition is very similar to current models:

@HiveType(typeId: 0)
class Person {
  @Primary
  int id;

  @HiveField(fieldId: 0)
  @Index(unique: false)
  String name;

  @HiveField(fieldId: 1)
  int age;
}

Hive will then generate extension methods so you can write the following query:

var box = Hive.openBox<Person>('persons');
box
  .query()
  .where()
  .nameEquals('David')
  .or()
  .nameStartsWith('Lu')
  .filter()
  .ageBetween(18, 50)
  .sortedByAge()

where() vs filter()

The difference between where() and filter() is that where() uses an index and filter() needs to test each of the resulting elements. Normally a database figures out when to use an index itself but Hive will provide you the option to customize.

There are multiple reasons for that:

  1. This code will work 1:1 with IndexedDB
  2. You know your data best and can choose the perfect index
  3. The database code will be significantly easier

Things to figure out

Blocking Issues (pls upvote)

Other issues

For existing apps using Hive 1.x:

I will continue to keep a Hive 1.x branch for bugfixes etc.

What do you think?

kaboc commented 4 years ago

Rewriting in Rust sounds so interesting. Better performance and less memory usage are attractive and welcome.

However, I'm seriously worried about compatibility of Box. Can boxes for v1.x still be used in v2 as well? It's dubious as to what other users actually meant in the polls. For me, only breaking changes of APIs are acceptable, not of Box.

I have to decide whether to leave Hive and choose another package for the app I'm currently developing if there is a risk that I need to go through all the trouble to port old boxes to new ones on my own sometime in the future.

Having said that, your plan is exciting too all the same. I look forward to seeing the first release of the new major version.

simc commented 4 years ago

Yes I agree. It is bad that old boxes will not be compatible and existing apps in production cannot upgrade to the new version without loosing their data.

Hive is very young and I still think it is the right path. For future breaking changes there will be auto migration. Unfortunately that is not possible for this change because we switch the backend.

shinayser commented 4 years ago

A noob question: how will you make Rust to work with Dart?

simc commented 4 years ago

Using Dart FFI

shinayser commented 4 years ago

Using Dart FFI

But DART FFI is only for C language, not Rust, right?

It will require the user to use FFI or are you planning to provide a working interface already in Dart?

simc commented 4 years ago

But DART FFI is only for C language, not Rust, right?

Rust does provide C interop...

It will require the user to use FFI or are you planning to provide a working interface already in Dart?

The user will only use Dart and does not even notice the Rust backend.

Mravuri96 commented 4 years ago

You should give https://vlang.io/ a shot 😜

simc commented 4 years ago

You should give https://vlang.io/ a shot

V is interesting but in my opinion, there are multiple reasons why it is not a good idea to use V at the moment:

MarcelGarus commented 4 years ago

What initially excited me about Hive is that it's a pure Dart library without external dependencies, so it runs everywhere. Obviously, the same is true if the backend would be implemented in Rust, but I begin to wonder: There are loads of existing database implementations in Rust that are far more advanced. There are of course the usual SQLite-ish standards, but also document-based databases like MongoDB and truly innovative approaches like this one. I'm afraid there's nothing about Hive that's fundamentally better than with other database solutions, so rather than reimplementing the wheel, why not use some existing database and built a nice Dart-wrapper around it? Because developers also use the Rust database on its own, there are more users, more contributors and all developers from both the Rust and the Flutter community benefit from the research, optimizations, and bug fixes that are implemented on the Rust side. This package could simply focus on providing the most intuitive Dart API possible, which would make maintaining the package easier as well.

simc commented 4 years ago

What initially excited me about Hive is that it's a pure Dart library without external dependencies

Yes, that was the goal but it turned out that most users don't want a database that is basically an in-memory KV-store. The problem with Dart is that it is kind of slow, its objects are rather memory hungry and it misses essential features to implement a more advanced database.

There are loads of existing database implementations in Rust that are far more advanced

I thought the same thing but the list of candidates is short. In fact, I didn't find a single database that is suitable for mobile devices and our requirements.

Also, to my knowledge, there is no database that is built as a counterpart to IndexedDB. It is not trivial to write a database that works exactly the same in the browser. IndexedDB is very different from most other databases.

I'm afraid there's nothing about Hive that's fundamentally better than with other database solutions

As I said, I don't think there exists a single cross-platform database that also works in the browser and I don't think existing databases can be easily used with Dart and still have great performance. Realm, for example, will never work with Dart because it relies on proxy objects.

So I'm writing basically writing an abstraction around IndexedDB and LMDB in Rust which can be compiled to a binary or WASM.

And then there will be the Dart wrapper around this "backend".

It should be easily possible to use only the Rust side for example with React native.

Edit: I already have a fully working prototype of the LMDB part of the wrapper and not much Rust code is required. The performance is exceptional.

If you have an alternative approach that allows us to have a "real" database which also works in the browser, I'd love to discuss it.

Edit2: Another advantage of this approach is that breaking changes of the binary format will no longer be required and bugs that corrupt the database will not happen anymore because the storing of the data will be handled by LMDB and IndexedDB respectively.

Edit3: Like most other databases, Noria, the one you linked, is for backends and thus not really suitable for mobile devices.

MarcelGarus commented 4 years ago

Okay, I see. I was really expecting more lightweight Rust databases to exist. Then I take back what I said before — the Rust-Dart-architecture seems to be a great fit 😊 I'm looking forward to using this

simc commented 4 years ago

There is one topic where I still need input: Since we are rebuilding Hive anyway, I'd like to make it ready for synchronization from the beginning.

What do you guys think about CouchDB as a backend? Do you know good articles or papers on sync without conflicts? I need an easy to use (for the user) mechanism to avoid or resolve conflicts.

ashim-kr-saha commented 4 years ago

Syncing with CouchDB, is exactly what I am looking for my next project.

PouchDB, implementation in dart will be great solution.

frank06 commented 4 years ago

Used CouchDB a long time ago, and while the conflict resolution mechanism was neat, queries were a pain. That might have changed, or not. I thought one of the major drivers of this Hive rewrite was query support.

simc commented 4 years ago

I thought one of the major drivers of this Hive rewrite was query support.

Yes, the queries you use with should be more or less independent of CouchDB.

CouchDB is just an idea and nothing I have decided yet. I just want to figure this out before the first stable release of the new version has been released.

It would be great if someone knows a backend which fits our use case.

jamesdixon commented 4 years ago

Unsolicited advice / thought:

I realize you're planning for the future ahead of time by factoring in sync support, but it's a complex topic and something I personally would leave till after the rewrite 😄

I also say this because I'm not using CouchDB and while I'd love conflict resolution, I haven't found any really easy way to sync CouchDB changes to Postgres. This seems to be a problem for many who may use another database. Limited research, but I just wanted to throw it out there given that my anecdotal evidence suggests that more people are using Postgres/MySQL/etc and if CouchDB support for those is weak, it may not be worth the additional effort upfront.

All that said, killer job with Hive thus far. Excited to see what's next.

simc commented 4 years ago

Thanks for your opinion. Yes it would be cool to have a conflict resolution which is independent of the backend database but I have to do a lot of research because I have no idea how to do it 😆

jonataslaw commented 4 years ago

I think it would be prudent to create a second project> Hive ffi <or something like that. I have 9 applications in production using Hive, and it makes me very afraid to think that users with 300mb/500mb of data on Hive, may lose everything after a library update. I feel very enthusiastic to test Hive with Rust, it must be incredible, however, in my opinion, changing the backend is not legal for a stable library, if it were pre-release it would be justified, but there are many people who use Hive as a KV storage for many scripts than SP does not do so well, and queries are legal, but even cooler than that would be to maintain compatibility. I'm following the thread because if Hive changes, maybe I will need to fork this project, but if there is a risk-free way of migration, I fully support the idea.

simc commented 4 years ago

I don't think it will be possible to automatically migrate the data because the two models are not entirely compatible. But I will maintain a branch that contains the current version so you can just continue to use it.

algodave commented 4 years ago

@leisim In your vision, will the new Hive still allow to Create adapter manually? I'm not using code generation In my project, I'm just defining my own class MyModelHiveAdapter extends TypeAdapter<MyModel>

pishguy commented 4 years ago

when this version can be release and we can use that? :dancer: :dancer: :dancer:

simc commented 4 years ago

In your vision, will the new Hive still allow to Create adapter manually?

@algodave I don't think it will be possible in the same way as it is currently because in order to query your data, Hive needs to understand its structure. Probably there will still be adapters that map objects to Map<int, dynamic>. The keys of this map will be the field ids and the values are the primitive values of the fields (int, double, bool, String, List<int>, List<double>, List<bool>, List<String>). You can customize these adapters.

when this version can be release and we can use that?

@MahdiPishguy It probably still takes another month until I have the first test version.

xylobol commented 4 years ago

@jonataslaw

I'm following the thread because if Hive changes, maybe I will need to fork this project, but if there is a risk-free way of migration, I fully support the idea.

I've been working on a mission-critical project with Hive, and a major pull was that it's completely written in Dart, so I may need to fork as well. If you're interested, I can keep you posted.

stefanrusek commented 4 years ago

Might I request you create a new library? A complete rewrite, with different behavior, and large api changes is not a new version but a new library. Going down this path means there will be numerous forks of Hive 1, and people would just be better served if you started a new project and let Hive continue to evolve.

algodave commented 4 years ago

@Xylobol @jonataslaw I am one of those who would be interested in a fork

listepo commented 4 years ago

@leisim any news?

stevenspiel commented 4 years ago

@leisim I'm also interested in the progress on this.

wbsantos commented 4 years ago

I'm writing an app in flutter and thinking about use ArangoDB at the backend.

I don't know if it is the best solution for what you are thinking, but maybe you can keep it on the list of possible databases to autosync hive.

ArangoDb is a multi paradigm database, it can be document, graph or key-value. I'm not sure if it fits the hivedb structure, but it is a feature rich platform you may consider.

frank06 commented 4 years ago

I'm also really interested in the state of Hive 2.0.

I recently open sourced Flutter Data which uses Hive at its core. I wanted to start working on improving performance, but is that worth the effort if 2.0 is near?

In addition, @leisim I was wondering what you think about FD's Relationships API and if it could somehow be integrated back into Hive 2.0's Dart/Flutter layer.

MarcelGarus commented 4 years ago

@frank06 Interesting, I was working on a similar pattern for our Schul-Cloud app. I published the package as hive_cache. Similarly to flutter_data, it tries to model the relationship between entities and takes care of fetching data when necessary. The package provides Entitys, Collections, and Connections for that along with builders. I'd love it if Hive would integrate some common functionality natively.

Also, because Hive 2.0 will run with Rust at its core, there will be a simple-to-use Rust API anyway, so why not use it to run a webserver when the app is running in debug mode? Similarly to how Dart DevTools work, Hive could also provide a webpage with live preview and editing capabilities for the app, making debugging much easier.

frank06 commented 4 years ago

@marcelgarus Very interesting! I was totally unaware of hive_cache. Nice work! Definitely will check it out in more detail.

It would indeed be great to have a common API for Hive relationships (that our projects could extend to use with their own naming). I started from scratch because, while Hive had HiveLists, it didn't have an owner/inverse mechanism for to-one/belongs-to type relationships when I needed them.

britannio commented 4 years ago

I just want a blazingly fast cross-platform key-value store, maybe that could be maintained in a separate package?

themisir commented 4 years ago

+1 @britannio. I just want to save small key-value structured data. If I would like to store large datasets I would be probably using sqlite database.

I know this package is pretty new but lot's of projects already implemented hive v1 which is currently broken (have some dependency hell issues).

stefanrusek commented 4 years ago

+1 @britannio @TheMisir I agree. I need real DB then I reach for cloud firestore.

jonataslaw commented 4 years ago

+1 In fact I use Hive as a simple database, storing Key value, and objects. I don't need functions like advanced queries, and Hive serves me well. If there was a version with FFI, I would use it in future projects maybe, but for my current projects, Hive looks great. I still think that Hive should continue to dart, and that there was a second package like Hive-ffi to use Hive-ffi, I believe that would please 100% of the users who are here, both those who will follow with Hive using dart, and the who will follow with Hive using Rust, but in the end, it is leisim who chooses the direction this will take.

2x2xplz commented 4 years ago

@britannio for cross-platform maybe check out ObjectBox?

themisir commented 4 years ago

@2x2xplz I did used ObjectBox on native project (written in pure java not flutter). Sometimes it crashes unexpectedly. I don't know I might be did something wrong, but I'm sure I did everything said in documentation. But hive.db works like a charm (ps: currently broken if you're not on the stable channel).

neckaros commented 4 years ago

+1 @britannio. I just want to save small key-value structured data. If I would like to store large datasets I would be probably using sqlite database.

I know this package is pretty new but lot's of projects already implemented hive v1 which is currently broken (have some dependency hell issues).

The problem is that fsqlite does not work on web

jonataslaw commented 4 years ago

Off-topic: Guys, does anyone know if leisim (creator of Hive) is okay? He lives in one of the Corona Virus epicenters and he disappeared, his last comment was in this post a few months ago.

yringler commented 4 years ago

@leisim wrote

...and existing apps in production cannot upgrade to the new version without loosing their data.

This was one of the major causes of concern on this thread.

@leisim, is it very difficult to continue supporting just reading from old style boxes? That would allow

  1. iterating through the
  2. adding item to new style box
  3. Verify that all data was copied over correctly 3.1 This would require that people override the equality operator.
  4. delete legacy box

which could be wrapped up in a method and shipped with the hive package. To migrate data, a possible API shape would maybe be something like this:

if (Hive.legacyBoxExists("userdata") {
  final legacyBox = Hive.openTransitionalBox<UserData>("userdata");
  await legacyBox.migrateData();

  if (await legacyBox.verify()) {
    await legacyBox.removeFromDisk();
  }
}

Or verify and removeFromDisk could be collapsed into migrateData.

Edit: Even better, a parameter in openBox , { bool migrate }, which would then run the migration.

OlegBezr commented 4 years ago

@leisim Hey! So when should we expect the new version of Hive?

mdrideout commented 4 years ago

Hey @leisim , just continuing to support the idea that you build the new database as a brand new library. Hive DB 1.0 in its current form perfectly fulfills my needs. I would love to continue using it without worrying about weird dependency stuff.

A new name and a new library for a fundamentally different database I think is very appropriate.

I have spent a ton of time on a production app that uses Hive DB, and it would be very expensive for me to need to rewrite all of the data management to support a library, and deal with my client apps update processes.

I'm sure you will find that the open source community is happy to help maintain hive 1.0 as its own library

mdrideout commented 4 years ago

Off-topic: Guys, does anyone know if leisim (creator of Hive) is okay? He lives in one of the Corona Virus epicenters and he disappeared, his last comment was in this post a few months ago.

I noticed on LinkedIn that he started a new job around the same time he stopped updating Hive. New job looks very cool, could be very demanding. He may need more community support support for Hive.

frank06 commented 4 years ago

It's clear that Simon owes us ZERO. He put out an outstanding library that makes our developer lives better.

As an open source maintainer myself, I wouldn't leave users of my library in the dark even if I had a very demanding job.

"Hi all, I apologize for my delayed response. I'm doing well, thanks for worrying about me! I have a very demanding job and don't have time to think about how to move forward with Hive. I'll be in touch as soon as anything changes. Thank you!"

Takes literally one minute.

Being sick is a completely different story.

simc commented 4 years ago

Hey guys, Thanks a lot for your patience and kind words. I'm very sorry for the late response but the last months have been a bit crazy for me and I had zero time for open source work. I hope I can address some of your questions. If I miss some of them, please ask me again ;)

Hive could also provide a webpage with live preview and editing capabilities for the app, making debugging much easier.

@marcelgarus Yes, making debugging easier is one of the goals of the new version.

I've been working on a mission-critical project with Hive, and a major pull was that it's completely written in Dart, so I may need to fork as well. If you're interested, I can keep you posted.

@Xylobol @stefanrusek @jonataslaw I hear all of you and I think it is a good idea to maintain both, the new and "old" version because their purpose does not overlap 100%. I have not decided the best way to do it however. Maybe the new version should move to another repo or we keep the old one as a branch. I might also need some help to maintain both versions but I'm sure we can solve that.

is it very difficult to continue supporting just reading from old style boxes?

@yringler I'm not sure whether this is a good idea. It would require a lot of work. Depending on your needs, maybe we can create a migration version. Since the new version will work a bit different, not everything can be migrated however.

Hey! So when should we expect the new version of Hive?

Due to my limited time, the release is not in sight yet. I would guess it takes at least another few months. It looks like I have a bit more time now and development should speed up now.

A new name and a new library for a fundamentally different database I think is very appropriate.

I'm not sure. I see the pros for creating a new project but I think the new database will be very cool and it would be sad if the old "Hive" would be popular while the new version has only few users due to its bad pub.dev popularity score...

I noticed on LinkedIn that he started a new job around the same time he stopped updating Hive. New job looks very cool, could be very demanding. He may need more community support support for Hive.

@mdrideout Yes my job at BMW and my studies are partly to blame for my little free time in the last months :(

As an open source maintainer myself, I wouldn't leave users of my library in the dark even if I had a very demanding job.

@frank06 Very sorry for that. I do feel responsible for all the people who depend on the project, despite Hive being open source.

Being sick is a completely different story.

I have been but I still should've posted an update.

yringler commented 4 years ago

I'm not sure. I see the pros for creating a new project but I think the new database will be very cool and it would be sad if the old "Hive" would be popular while the new version has only few users due to its bad pub.dev popularity score...

IMO there's no reason to be concerned about that. People will find it with search, and there could be a note in the readme to check out the new package - I think it will be very popular. People are quick to find great packages (like one I bumped into a while back, hivedb :wink: )

Case in point: @ryanheise made a package, just_audio, in a pretty competitive area - playing audio - and it quickly shot up to a very high score.

I totally get the concern - you already have an established, successful "brand", and to start all over is a daunting prospect. But I think it's a great idea, and with the huge benefits which may be realized in the new version - not having to worry about using up all the RAM, queries, crazy speed - maybe it's better to start fresh and see how far the rabbit hole goes.

Whatever you decide, I'll be looking forward to the new version.

themisir commented 4 years ago

What if there is a package (something like: hive_migrate) to migrate already exists data from hive-v1 to hive-v2 instead of doing this automatically. I don't think it would be hard thing.

  1. List v1 boxes
  2. Read data from v1 boxes
  3. Save data to v2 boxes
  4. Clean up (remove v1 boxes to free up unused space).

That's it. All of our data migrated to v2.

simc commented 4 years ago

@TheMisir Yes this is a good idea but the new version will only support objects (not Maps, ints etc.) so they cannot be migrated. It would be very easy if we decided to create a new package for the new version because you could just import both and do the migration yourself.

yringler commented 4 years ago

not Maps, ints etc.

Meaning, it can store ints as part of objects, but not by themselves. For example box.put('key', 4) would not be supported anymore.

simc commented 4 years ago

@yringler exactly