Utilize persy index feature

fogti commented 5 years ago

This crate could probably utilize the persy index feature, which could reduce backend boilerplate code, althrough it would make the file format incompatible.

Partial solution for file format incompatibility:

On (first?) LOAD, check if file format is old or new, deserialize accordingly.
On SAVE, we need to know a preference (e.g. old or new), assume new, but provide a function to serialize to the old file format.

shockham commented 5 years ago

Sounds good, will have a look into it when I get a chance

fogti commented 5 years ago

Ok, possible usage scenario:

if KV<K,V> where K,V:persy::IndexType, use a simple index directly as backend
otherwise, use an index to store the PersyID of the single record in the database, so that we can easily overwrite it (instead of deleting + append a new one to the segment)

That probably requires either specialization or a split of the KV type into two types + one common trait.

fogti commented 5 years ago

Another optimization (overlapping with 'otherwise') would be: cache the PersyID of the main record on first read. possible implementation

That optimization is not so complex, but already has a (consistent) performance benefit.

$ cargo bench
     Running target/release/deps/benches-3fd929b700ccf6db

running 3 tests
test benches::bench_get_int        ... bench:       2,918 ns/iter (+/- 101)
test benches::bench_insert_get_int ... bench: 182,488,462 ns/iter (+/- 67,986,967)
test benches::bench_insert_int     ... bench: 178,663,504 ns/iter (+/- 57,918,853)

test result: ok. 0 passed; 0 failed; 0 ignored; 3 measured; 0 filtered out

$ git checkout cached
$ cargo bench
     Running target/release/deps/benches-3fd929b700ccf6db

running 3 tests
test benches::bench_get_int        ... bench:       1,345 ns/iter (+/- 84)
test benches::bench_insert_get_int ... bench: 118,919,583 ns/iter (+/- 39,605,769)
test benches::bench_insert_int     ... bench: 118,166,067 ns/iter (+/- 45,283,092)

test result: ok. 0 passed; 0 failed; 0 ignored; 3 measured; 0 filtered out

fogti commented 5 years ago

OT question: do we really need to reload the DB on every read (and write) and does that currently work as intended? the database file is locked by default, no other process can (most of the time, except corner cases) access it

fogti commented 5 years ago

I've sent the following e-mail to the persy maintainer:

Hi tglman,

I don't have a gitlab account, so I post this here.

It would probably be useful to (at least) be able to use an Persy Index to store KV-pairs where the value is an Vec / &[u8], so that one can store arbitrary data in the value (which is useful in the case of ValueMode::REPLACE, where you would be required to use a combination of index (N entries) + segment (N entries) to achieve the same behavoir, but probably slower

(K --INDEX-> ID --SEGMENT-RECORD-> V)

or a combination of index (with 1 entry) + segment (with 1 entry) with a serialized hashmap).

I don't really know about a reason, why Persy should allow String's as Index K/V's, but not Vec / &[u8]

References: https://github.com/mount-tech/typedb/issues/6

-- zserik

Edit: development branch for byte vectors (Vec<u8>).

fogti commented 5 years ago

cc @tglman

tglman commented 5 years ago

Hi @zserik,

Yes it should be possible to support Vec<u8> as a key, I've just to double check if can be Ord (or just do a custom Implementation of it).

fogti commented 5 years ago

@tglman I think it's especially important to support Vec<u8> as a value, but because both index keys and values are based upon the same infrastructure around IndexType, that doesn't really matter. (And yes, it implements PartialOrd and Ord, if the underlying element type supports it: impl traits by Vec<*>, rust std source code) In case you didn't notice: I already did a Proof-of-concept implementation here.

tglman commented 5 years ago

hi @zserik,

That's look cool, I will check and if you do not mind merge it to the main repository in the next days.

fogti commented 5 years ago

oh, one things missing: ByteVec must be exported from lib.rs...

Side note: is it really required that IndexType's must impl Display? That's the only thing that makes the ByteVec newtype neccessary.

For future readers: merged commit @ github gitlab

tglman commented 5 years ago

@zserik Ported to the main repository :)

fogti commented 5 years ago

impl using index feature: https://github.com/zserik/typedb/commit/1612bcf7fc1bbc5a94d23c8a7b5035579940b0e2

reads are as slow as before #7
writes are a bit faster
memory footprint is probably smaller

fogti commented 4 years ago

EWONTFIX

mount-tech / typedb

Utilize persy index feature #6