meilisearch / heed

A fully typed LMDB wrapper with minimum overhead 🐦
https://docs.rs/heed
MIT License
569 stars 52 forks source link

Dupsort support #182

Closed AureliaDolo closed 1 year ago

AureliaDolo commented 1 year ago

Fixes #41

Previous work on this issue #59

I chose to pass flags to Env::raw_init_database to reuse it in [create/open]_poly_multi_database.

The only difference between PolyDatabase and PolyMultiDatabase is the return type of get : it's a RoRange for PolyMultiDatabase.

For now, I added the case of a duplicate key only in the get method. If you're ok with the core of this PR, I plan to add that in the other doctests, especially the one that documents the PolyMultiDatabase struct.

Tests

Comparison methods

I'm not sure what exactly should these methods return, especially for get greater than. If we keep returning just a key value, we can use the get to have the iterator, or return an iterator on the next key. For get_greater_than what if the key we check against has multiple values ? For last we should return the last value of the last key ? Moreover we could add equivalent for those methods that take both a key and a value.

set related methods

Len should return total number of key value not just number of key.

iterators

I see no change in behavior here. Maybe add in the doc that prefix iter on a key that has multiple value should be equivalent to getting that key ?

modification

Should delete delete all values for the key ? Should we add a delete_key_value that takes a key and a value ?

conversion

AureliaDolo commented 1 year ago

To sum up what we said earlier today:

Dbi type alias

type Dbi = u32;

Update EnvInner accordingly.

Flags

DatabaseBuilder

To avoid duplicating create/open xxx database methods. See Builder pattern. One method per supported flag,

pub struct DatabaseBuilder<'e> {
    env: &'e Env,
    name: Option<String>,
    type: Option<(typeId, TypeId)>, 
    database_flags: DatabaseFlags
}

Duplicate keys support

AureliaDolo commented 1 year ago

I unimplemented put reserved as it seems MDB_RESERVED seems incompatible with MBD_DUPSORT

MDB_RESERVED - [...] This flag must not be specified if the database was opened with #MDB_DUPSORT.

AureliaDolo commented 1 year ago

The as uniform won't be as straightforward as there is a PolyDatabase behind the hood.

On the spot the solution I think about is to:

And find a better name for PolyMultiDatabase

AureliaDolo commented 1 year ago

Duplicate keys support

  • new specific iterator to iterate on one duplicate key
  • add a boolean to existing iterators to determine if we iterate on data items or unique keys

I don't think it's a good idea to leak duplicate key support everywhere.

Kerollmops commented 1 year ago

The PolyDatabase::as_uniform won't be as straightforward as there is a PolyDatabase behind the hood.

What's the issue? Note that I wanted to delete the PolyDatabase type from the crate and only have a Database<K, V> type, maybe. Would it help?

  • put_reserved is not supported

We could return an error in this particular case with a user-friendly error message.

  • MDB_APPENDDUP behaves differently than MDB_APPEND

I am not sure of the exact difference. Is MDB_APPENDDUP used to append new data to a specific key? We should introduce a new Database::append_dup(&self, wtxn, key) method that accepts the key on which to append a dup data.

  • harder constraint on put during a mutable iteration: the key must be the same size.

Indeed, but that is not something that we will handle differently than with a runtime error. Do we have to test the length of the keys by ourselves? If so, we can do it only when DUPSORT is enabled.

AureliaDolo commented 1 year ago

Hello,

We did some tests with our data and this branch and we ran into the limits of lmdb's dupsort. In a nutshell, when dupsort is enabled the values are keys of a sub DB, hence the same limitations applies than for regular keys.

Namely it worked until we encountered a MDB_BAD_VALSIZE because some values are bigger than mdb_env_get_maxkeysize, 511 bytes

https://github.com/LMDB/lmdb/blob/mdb.master/libraries/liblmdb/lmdb.h#L275-L286 https://github.com/LMDB/lmdb/blob/mdb.master/libraries/liblmdb/mdb.c#L650-L668

So we won't be able to use the dupsort feature.

As for your previous comment:

Note that I wanted to delete the PolyDatabase type from the crate and only have a Database<K, V> type, maybe. Would it help?

I still think that it's necessary to have to separated apis for dupsort and not dupsort databases. There won't be any conversion method to implement between them so it's ok.

We could return an error in this particular case with a user-friendly error message.

:+1:

I am not sure of the exact difference. Is MDB_APPENDDUP used to append new data to a specific key? We should introduce a new Database::append_dup(&self, wtxn, key) method that accepts the key on which to append a dup data.

https://github.com/meilisearch/heed/blob/ad6766f98c30e118277b0951c1ad8491d3fe4e18/heed/src/db/multi.rs#L1710-L1714

This part shows the difference in behavior. I don't think we could do that as the behavior depends on the flag the database was opened with.

Indeed, but that is not something that we will handle differently than with a runtime error. Do we have to test the length of the keys by ourselves? If so, we can do it only when DUPSORT is enabled.

This would need a if at each call for everyone, which can be mitigated with a dupsort specific database. I'd define all mutable iteration methods as unsafe, and in the safety section specify that keys must be of the same length, and let the user take their own risks.

Kerollmops commented 1 year ago

Hey @AureliaDolo 👋

Namely it worked until we encountered a MDB_BAD_VALSIZE because some values are bigger than mdb_env_get_maxkeysize, 511 bytes

Sad to hear this too! We will not be able to use the DUPSORT flag in Meilisearch if this default limit of 511 bytes also applies to the values...

Weren't you using bdb (BerkeleyDB) before? Isn't it already the case that the values are limited to 511 bytes?

So we won't be able to use the dupsort feature.

It seems that redb doesn't support multi-value but I advise you to look at sanakirja, much harder to use but supports multi-values in a much better (internal) way and has better performances than LMDB.

I still think that it's necessary to have to separated apis for dupsort and not dupsort databases. There won't be any conversion method to implement between them so it's ok.

Why can't we return a Database<ByteSlice, ByteSlice> by default and let the user use the Database::remap method to change the key/data codec?

https://github.com/meilisearch/heed/blob/ad6766f98c30e118277b0951c1ad8491d3fe4e18/heed/src/db/multi.rs#L1710-L1714

This part shows the difference in behavior. I don't think we could do that as the behavior depends on the flag the database was opened with.

Cannot the Database::append method always be used to append a new key/data at the end of the database and the new-to-be introduced Database::append_dup method be always used to append a new data associated with the specified key?

This would need a if at each call for everyone, which can be mitigated with a dupsort specific database. I'd define all mutable iteration methods as unsafe, and in the safety section specify that keys must be of the same length, and let the user take their own risks.

A runtime if is nothing, especially when it will always be the same result. We already do a lot of security verifications.

[...] and in the safety section specify that keys must be of the same length, and let the user take their own risks.

Isn't LMDB checking for the length already?

darnuria commented 1 year ago

Weren't you using bdb (BerkeleyDB) before? Isn't it already the case that the values are limited to 511 bytes?

From BDB ref manual https://docs.oracle.com/database/bdb181/html/api_reference/C/dbset_flags.html

DB_DUPSORT

Permit duplicate data items in the database; that is, insertion when the key of the key/data pair being inserted already exists in the database will be successful. The ordering of duplicates in the database is determined by the duplicate comparison function. If the application does not specify a comparison function using the DB->set_dup_compare() method, a default lexical comparison will be used. It is an error to specify both DB_DUPSORT and DB_RECNUM.

Calling DB->set_flags() with the DB_DUPSORT flag affects the database, including all threads of control accessing the database.

If the database already exists when DB->open() is called, the DB_DUPSORT flag must be the same as the existing database or an error will be returned.

and https://docs.oracle.com/database/bdb181/html/api_reference/C/dbset_flags.html

 If the database supports sorted duplicates, the new data value is inserted at the correct sorted location. 

Yeah, there was no limitation (BDB is not bound to mmap semantics), it's not an issue we will find a way that avoid key duplication, will not ask lmdb what it cannot do.

But the MDB_DUPSORT within it's limitation is nice! Just part of the data don't fit. :)

Lot of people do "pascal-esque byte-string inside the value" like Powerdns or Knot to circumvent the limitation, it's just costly in terms of IO-write if you don't know what you need to write at once.

Isn't LMDB checking for the length already?

Yeah it is hence the MDB_BAD_VALSIZE returned, it returns at the insert.

About sanakirja

It seems that redb doesn't support multi-value but I advise you to look at sanakirja, much harder to use but supports multi-values in a much better (internal) way and has better performances than LMDB.

Looks nice but the crash free (for our kind of workload) feature of lmdb in a safe-mode is really pristine! For the performances it's also fast. :)

But will look definitely!

Redb has the https://docs.rs/redb/latest/redb/struct.MultimapTable.html but have to check-it-out.

https://github.com/cberner/redb/blob/master/tests/multimap_tests.rs#L40

mikedilger commented 1 year ago

I was just looking for the delete key and value and came back here to mention I couldn't find it.

Kerollmops commented 1 year ago

Superseded by #200 and therefore closed.