wnfs-wg / rs-wnfs

Rust implementation of the WebNative FileSystem (WNFS) specification
https://github.com/wnfs-wg
Apache License 2.0
145 stars 23 forks source link

Refactor: Consistent way of doing (De)Serialization #84

Closed matheus23 closed 1 year ago

matheus23 commented 2 years ago

Central to our inconsistency is the fact that we can't implement Serialize and Deserialize for our core structs like PublicDirectory or PrivateFile, since they have additional context requirements during serialization or deserialization.


There's two dimensions to this issue:

  1. We have a need for abstractions for serialization / deserialization functions. Usually those would just be the Serialize or Deserialize traits from serde, but since we have extra context requirements, we need to copy & modify these traits to become more "powerful" (i.e. have access to a BlockStore & be async -> AsyncSerialize). Such abstractions are then needed for abstract building blocks like Link<T>, which needs T to be some kind of serializable/deserializable, but should be flexible enough to support data structures like PublicDirectory.
  2. Since we can't implement Serialize and Deserialize for our structs directly, we also can't directly use #[derive(Deserialize, Serialize)]. We have resorted to different ways of solving this problem, although we're mostly using structs named *Serde that only contain easily de/serializable data.

For (1) we both have the AsyncSerialize trait for public data (notice its serialize is an async fn): https://github.com/wnfs-wg/rs-wnfs/blob/ce7d988157884115e32d7db412b7d4cedf56d23e/wnfs/src/common/async_serialize.rs#L34-L49

But on the private side we didn't have a need to abstract yet, so we only used custom functions: (this serialization only needs some randomness for encryption, since the key can be derived from the ratchet within PrivateDirectory) https://github.com/wnfs-wg/rs-wnfs/blob/ce7d988157884115e32d7db412b7d4cedf56d23e/wnfs/src/private/directory.rs#L1140-L1147 (but we need to be provided the key for decryption) https://github.com/wnfs-wg/rs-wnfs/blob/ce7d988157884115e32d7db412b7d4cedf56d23e/wnfs/src/private/directory.rs#L1170-L1173

And for (2) in order to make the above easier, we're using these *Serde structs (Notice what #[derive(Serialize, Deserialize)]s): (notice how the *Serde variant uses Cid instead of PublicLink) https://github.com/wnfs-wg/rs-wnfs/blob/ce7d988157884115e32d7db412b7d4cedf56d23e/wnfs/src/public/directory.rs#L41-L56 (for PrivateDirectory the PrivateNodeHeader turns into a Vec<u8>, the ciphertext of a PrivateNodeHeader) https://github.com/wnfs-wg/rs-wnfs/blob/ce7d988157884115e32d7db412b7d4cedf56d23e/wnfs/src/private/directory.rs#L43-L58


One problem of deriving Serialize and Deserialize for a related struct like PrivateDirectorySerde is that, this intermediate data structure mostly gets built up only to be torn down immediately afterwards. Ideally we can skip such work.

While we're looking at these issues, there's a similar concern with building up Ipld as an intermediate data structure, e.g. for deserializing PublicNode into appropriate elements, depending on what the type field deserializes to: https://github.com/wnfs-wg/rs-wnfs/blob/ce7d988157884115e32d7db412b7d4cedf56d23e/wnfs/src/public/node.rs#L268-L292

matheus23 commented 1 year ago

We should consider using https://github.com/ipld/serde_ipld_dagcbor

Turns out we can make generated tagged enum serializers and deserializers work with it:

#[derive(Debug, PartialEq, Eq, Clone, Serialize, Deserialize)]
#[serde(tag = "type")]
pub enum Test {
    #[serde(rename = "wnfs/priv/dir")]
    X {
        x: usize,
    },
    #[serde(rename = "wnfs/priv/file")]
    Y {
        y: usize,
    },
}

#[test]
fn test_serialization() {
    let x = Test::X { x: 10 };
    let vec = serde_ipld_dagcbor::to_vec(&x).unwrap();
    println!("{vec:?}");

    let y: Test = serde_ipld_dagcbor::from_slice(&vec).unwrap();
    println!("{y:?}");
}
matheus23 commented 1 year ago

My example above doesn't work if Test contains CID somewhere, since it'd call .deserialize_any on the CID deserializer which errors out.

It works if you use the "normal" way of tagging sum types in serde and that way is faster for deserialization anyway, so we've changed to that in the spec: https://github.com/wnfs-wg/spec/pull/62

matheus23 commented 1 year ago

Dig likes to use a pattern like this:

struct Foo {
    link: Link<Something>,
    // ...  
}

#[derive(Serialize, Deserialize)]
struct FooSerializable {
    link: Cid,
    // ...
}

pub trait Storable {
    type Serializable: Deserialize + Serialize;
    async fn to_serializable(&self, store: &impl BlockStore) -> Self::Serializable;
    fn store(&Self::Foo, store: &impl BlockStore) -> Result<Cid>;
}

impl<T: Serialize + Deserialize> Storable for T {

}

Also, major note: We don't need our stuff to openly be Serializable. We only care about serializing e.g. PrivateDirectory to dag-cbor. We don't want to be able to serialize it to toml. So we shouldn't make our stuff be based around that abstraction (e.g. re-using Serializable in Link or PrivateLink or BlockStore. We can do our own custom trait that's simpler & tailored to our needs, i.e. an AsyncSerialize with less serde dependencies).

matheus23 commented 1 year ago

Since #248 most of our structs aren't openly (De)Serialize anymore, since only internal structs (e.g. PrivateDirectoryContentSerializable) are now serializable and we're using https://github.com/ipld/serde_ipld_dagcbor.

I'd consider this closed :)