Closed fogti closed 4 years ago
Sounds good, will have a look into it when I get a chance
Ok, possible usage scenario:
KV<K,V> where K,V:
persy::IndexType
, use a simple index directly as backendPersyID
of the single record in the database, so that we can easily overwrite it (instead of deleting + append a new one to the segment)That probably requires either specialization or a split of the KV type into two types + one common trait.
Another optimization (overlapping with 'otherwise') would be: cache the PersyID of the main record on first read. possible implementation
That optimization is not so complex, but already has a (consistent) performance benefit.
$ cargo bench
Running target/release/deps/benches-3fd929b700ccf6db
running 3 tests
test benches::bench_get_int ... bench: 2,918 ns/iter (+/- 101)
test benches::bench_insert_get_int ... bench: 182,488,462 ns/iter (+/- 67,986,967)
test benches::bench_insert_int ... bench: 178,663,504 ns/iter (+/- 57,918,853)
test result: ok. 0 passed; 0 failed; 0 ignored; 3 measured; 0 filtered out
$ git checkout cached
$ cargo bench
Running target/release/deps/benches-3fd929b700ccf6db
running 3 tests
test benches::bench_get_int ... bench: 1,345 ns/iter (+/- 84)
test benches::bench_insert_get_int ... bench: 118,919,583 ns/iter (+/- 39,605,769)
test benches::bench_insert_int ... bench: 118,166,067 ns/iter (+/- 45,283,092)
test result: ok. 0 passed; 0 failed; 0 ignored; 3 measured; 0 filtered out
OT question: do we really need to reload the DB on every read (and write) and does that currently work as intended? the database file is locked by default, no other process can (most of the time, except corner cases) access it
I've sent the following e-mail to the persy maintainer:
Hi tglman,
I don't have a gitlab account, so I post this here.
It would probably be useful to (at least) be able to use an Persy Index to store KV-pairs where the value is an Vec
/ &[u8], so that one can store arbitrary data in the value (which is useful in the case of ValueMode::REPLACE, where you would be required to use a combination of index (N entries) + segment (N entries) to achieve the same behavoir, but probably slower (K --INDEX-> ID --SEGMENT-RECORD-> V)
or a combination of index (with 1 entry) + segment (with 1 entry) with a serialized hashmap).
I don't really know about a reason, why Persy should allow String's as Index K/V's, but not Vec
/ &[u8] References: https://github.com/mount-tech/typedb/issues/6
-- zserik
Edit: development branch for byte vectors (Vec<u8>
).
cc @tglman
Hi @zserik,
Yes it should be possible to support Vec<u8>
as a key, I've just to double check if can be Ord (or just do a custom Implementation of it).
@tglman
I think it's especially important to support Vec<u8>
as a value, but because both index keys and values are based upon the same infrastructure around IndexType
, that doesn't really matter.
(And yes, it implements PartialOrd
and Ord
, if the underlying element type supports it: impl
traits by Vec<*>
, rust std source code)
In case you didn't notice: I already did a Proof-of-concept implementation here.
hi @zserik,
That's look cool, I will check and if you do not mind merge it to the main repository in the next days.
@zserik Ported to the main repository :)
impl using index feature: https://github.com/zserik/typedb/commit/1612bcf7fc1bbc5a94d23c8a7b5035579940b0e2
EWONTFIX
This crate could probably utilize the persy index feature, which could reduce backend boilerplate code, althrough it would make the file format incompatible.
Partial solution for file format incompatibility:
LOAD
, check if file format is old or new, deserialize accordingly.SAVE
, we need to know a preference (e.g. old or new), assume new, but provide a function to serialize to the old file format.