khonsulabs / bonsaidb

A developer-friendly document database that grows with you, written in Rust
https://bonsaidb.io/
Apache License 2.0
1.03k stars 37 forks source link

Custom key implementation confusion #314

Closed a-0-dev closed 9 months ago

a-0-dev commented 9 months ago

Just found this project yesterday and so far it looks very very nice! However, a major hickup for me is how to implement custom keys. I have many structs (=documents) which are identified by a wrapper type around uuid::Uuid. I first thought "well, I'll just use some u128 as primary key then and write a view where I can query by Uuid", but it seems like any object you query on needs to implement `Key. So this is no workaround.

Since bonsaidb does not implement Key for Uuid itself, I have to do this, and after some fiddling around, wrote the following, which compiles just fine:

mod keys {
    use std::fmt::Display;

    use bonsaidb::core::key::{ByteSource, Key, KeyEncoding};
    use serde::{Deserialize, Serialize};
    use uuid::Uuid;

    // Error type definition
    #[derive(Debug)]
    pub(crate) struct KeyEncodingError;
    impl Display for KeyEncodingError {
        fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
            write!(f, "Some key encoding error")
        }
    }
    impl std::error::Error for KeyEncodingError {}

    // NewId definition
    #[derive(Clone, Debug, Serialize, Deserialize, PartialEq, Eq, PartialOrd, Ord)]
    pub(crate) struct NewId(Uuid);

    impl<'k> Key<'k> for NewId {
        const CAN_OWN_BYTES: bool = false;

        fn first_value() -> Result<Self, bonsaidb::core::key::NextValueError> {
            Ok(NewId {
                0: Uuid::from_u128(0u128).into(),
            })
        }

        fn from_ord_bytes<'e>(
            bytes: bonsaidb::core::key::ByteSource<'k, 'e>,
        ) -> Result<Self, Self::Error> {
            let b = match bytes {
                ByteSource::Owned(v) => v,
                ByteSource::Borrowed(v) | ByteSource::Ephemeral(v) => v.to_owned(),
            };
            match TryInto::<&[u8; 16]>::try_into(b.as_slice()) {
                Ok(array) => Ok(NewId {
                    0: Uuid::from_bytes(array.to_owned()),
                }),
                Err(_) => Err(KeyEncodingError),
            }
        }
    }

    impl KeyEncoding for NewId {
        type Error = KeyEncodingError;
        const LENGTH: Option<usize> = Some(128);

        fn describe<Visitor>(visitor: &mut Visitor)
        where
            Visitor: bonsaidb::core::key::KeyVisitor,
        {
            // ?!
        }
        fn as_ord_bytes(&self) -> Result<std::borrow::Cow<'_, [u8]>, Self::Error> {
            Ok(std::borrow::Cow::Owned(Vec::from(self.0.as_bytes())))
        }
    }
}

I do have some questions on this, which may be interesting for other devs as well:

  1. Does this make any sense to you?
  2. Is this amount of boilerplate necessary or is there already an easier way?
  3. Can bonsaidb use Uuids as keys at all? It looks like any new key added to the database must be greater than all existing keys, which is exactly not how Uuids work
  4. What does KeyEncoding::describe(...) do?

I'm very grateful for your work on this, and I would love to use this in one of my projects and contribute as much as possible. Thanks!

ecton commented 9 months ago

Does this make any sense to you? Is this amount of boilerplate necessary or is there already an easier way?

Since Key is implemented for [u8; N], you can skip a lot of the boilerplate by calling through to the implementations for the underlying type. E.g, <[u8; 16]>::from_ord_bytes(bytes). This will reduce the boilerplate significantly. We've talked about adding a way to do a encode_as conversion to the derive macro which could simplify this type of conversion.

For types that you're in control of, Key can usually just be derived, avoiding the need to implement these traits manually. However, due to the orphan rule, it's not possible for this type of conversion to be able to be fully automatic.

Can bonsaidb use Uuids as keys at all? It looks like any new key added to the database must be greater than all existing keys, which is exactly not how Uuids work

Nothing prevents out-of-order insertion. It's just that data is naturally ordered based on the Key encoding. Inserting random or sequential data is completely fine.

What does KeyEncoding::describe(...) do?

It describes the data that is encoded in the key, one-day allowing for generic decoding of keys stored in a database in a tool such as a database query tool. Since such a tool can't know your types, it needs a way to know how to encode keys to be able to browse collections or make view queries. This function can be used to create key type manifests that could be given to such a tool.

Thank you for trying out BonsaiDb!

a-0-dev commented 9 months ago

Wow, thanks a lot for your quick reply! That clarified a few things, things are getting cleaner now.

I still don't get how to use this describe thing though, not even after reading how it's used for existing implementations of KeyEncoding and the changelog entry. Is it safe to just leave the function with an empty body?

ecton commented 9 months ago

You can safely leave it blank or call through to the u8 array's implementation as well. There aren't any tools that utilize this feature yet.

a-0-dev commented 9 months ago

Alright, thank you very much! :)

a-0-dev commented 9 months ago

Follow-up to this one: I wrote myself a helper struct which makes getting started with arbitrary structs as keys easier (see code below, I called it AsKey<T>). I know it's terrible in terms of performance, but I can still tune it later on. However, the test case included below as well panics with the error message

invalid KeyEncoding::describe implementation -- imbalanced visit calls

I assume this may be due to the lacking implementation of describe(...), but I'm not sure. Any idea what may cause this?

use std::{fmt::Display, ops::{Deref, DerefMut}};

use bonsaidb::{core::{
    key::{Key, KeyEncoding},
    schema::{Collection, Schema, SerializedCollection},
}, local::{config::{Builder, StorageConfiguration}, Database}};
use serde::{de::DeserializeOwned, Deserialize, Deserializer, Serialize};
use uuid::Uuid;

/*
    Definition of some type which cannot derive `Key`
*/

#[derive(Serialize, Deserialize, Debug, Clone, Default, PartialEq, Eq, PartialOrd, Ord)]
struct SomeId(Uuid);

impl SomeId {
    pub fn new() -> Self {
        SomeId(Uuid::new_v4())
    }
}

/*
    BonsaiDB schema
*/

#[derive(Schema, Debug)]
#[schema(name = "ubisync", collections = [DbElement])]
struct MySchema;

#[derive(Debug, Serialize, Deserialize, Collection, PartialEq, Clone)]
#[collection(name = "elements", views = [])]
struct DbElement {
    #[natural_id]
    id: AsKey<SomeId>,
    content: String,
}

/*
    Test
*/

#[test]
fn add_element() {
    let db =
        Database::open::<MySchema>(StorageConfiguration::new("./test.bonsaidb")).unwrap();
    let test_element = DbElement {
        id: AsKey::new(SomeId::new()),
        content: "test".to_string()
    };
    let element_doc = DbElement::push(test_element.clone(), &db).unwrap();

    let retrieved_element = DbElement::get(&element_doc.contents.id, &db).unwrap();

    assert_eq!(
        Some(test_element),
        retrieved_element.map(|coll_doc| coll_doc.contents)
    )
}

/*
    Custom Error type
 */

#[derive(Debug)]
pub(crate) struct KeyEncodingError;

impl Display for KeyEncodingError {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        write!(f, "Some key encoding error")
    }
}

impl std::error::Error for KeyEncodingError {}

/*
    Generic key-wrapping struct definition
 */

trait SerdeCompatibleKey: Serialize + DeserializeOwned + Default + Clone + Send + Sync {}
impl<T> SerdeCompatibleKey for T where T: Serialize + DeserializeOwned + Default + Clone + Send + Sync {}

#[derive(Clone, Debug, Default, Serialize, PartialEq, Eq, PartialOrd, Ord)]
struct AsKey<T: SerdeCompatibleKey>(T);

impl<T: SerdeCompatibleKey> AsKey<T>
{
    pub fn new(key: T) -> Self {
        AsKey(key)
    }
}

impl<T: SerdeCompatibleKey> Deref for AsKey<T> {
    type Target = T;

    fn deref(&self) -> &Self::Target {
        &self.0
    }
}

impl<T: SerdeCompatibleKey> DerefMut for AsKey<T> {
    fn deref_mut(&mut self) -> &mut Self::Target {
        &mut self.0
    }
}

impl<'de, T> Deserialize<'de> for AsKey<T>
where
    T: SerdeCompatibleKey,
{
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        let t = T::deserialize(deserializer)?;
        Ok(AsKey(t))
    }
}

impl<'k, T: SerdeCompatibleKey> Key<'k> for AsKey<T>
{
    const CAN_OWN_BYTES: bool = false;

    fn first_value() -> Result<Self, bonsaidb::core::key::NextValueError> {
        Ok(AsKey::default())
    }

    fn from_ord_bytes<'e>(
        bytes: bonsaidb::core::key::ByteSource<'k, 'e>,
    ) -> Result<Self, Self::Error> {
        match serde_json::from_slice(&*bytes) as Result<T, _> {
            Ok(k) => Ok(AsKey(k)),
            Err(_) => Err(KeyEncodingError),
        }
    }
}

impl<T: SerdeCompatibleKey> KeyEncoding for AsKey<T>
{
    type Error = KeyEncodingError;
    const LENGTH: Option<usize> = None;

    fn describe<Visitor>(_: &mut Visitor)
    where
        Visitor: bonsaidb::core::key::KeyVisitor,
    {
    }

    fn as_ord_bytes(&self) -> Result<std::borrow::Cow<'_, [u8]>, Self::Error> {
        Ok(std::borrow::Cow::Owned(
            serde_json::to_vec(&self.0).unwrap(),
        ))
    }
}
ecton commented 9 months ago

Oops, I forgot there is a feature using the key description. list_available_schemas() returns a SchemaSummary, which has a KeyDescription at various locations. This KeyDescription uses the describe API, and the schema summaries are gathered as the database is opened.

You should be able to use visitor.visit_type(KeyKind::Bytes) inside of describe() for now. I think the fact an empty implementation fails is a bug, however.

a-0-dev commented 9 months ago

Right, that fixed it :+1: