Closed matheus23 closed 1 year ago
We should consider using https://github.com/ipld/serde_ipld_dagcbor
Turns out we can make generated tagged enum serializers and deserializers work with it:
#[derive(Debug, PartialEq, Eq, Clone, Serialize, Deserialize)]
#[serde(tag = "type")]
pub enum Test {
#[serde(rename = "wnfs/priv/dir")]
X {
x: usize,
},
#[serde(rename = "wnfs/priv/file")]
Y {
y: usize,
},
}
#[test]
fn test_serialization() {
let x = Test::X { x: 10 };
let vec = serde_ipld_dagcbor::to_vec(&x).unwrap();
println!("{vec:?}");
let y: Test = serde_ipld_dagcbor::from_slice(&vec).unwrap();
println!("{y:?}");
}
My example above doesn't work if Test
contains CID
somewhere, since it'd call .deserialize_any
on the CID
deserializer which errors out.
It works if you use the "normal" way of tagging sum types in serde and that way is faster for deserialization anyway, so we've changed to that in the spec: https://github.com/wnfs-wg/spec/pull/62
Dig likes to use a pattern like this:
struct Foo {
link: Link<Something>,
// ...
}
#[derive(Serialize, Deserialize)]
struct FooSerializable {
link: Cid,
// ...
}
pub trait Storable {
type Serializable: Deserialize + Serialize;
async fn to_serializable(&self, store: &impl BlockStore) -> Self::Serializable;
fn store(&Self::Foo, store: &impl BlockStore) -> Result<Cid>;
}
impl<T: Serialize + Deserialize> Storable for T {
}
Also, major note: We don't need our stuff to openly be Serializable
. We only care about serializing e.g. PrivateDirectory
to dag-cbor. We don't want to be able to serialize it to toml.
So we shouldn't make our stuff be based around that abstraction (e.g. re-using Serializable
in Link
or PrivateLink
or BlockStore
. We can do our own custom trait that's simpler & tailored to our needs, i.e. an AsyncSerialize
with less serde dependencies).
Since #248 most of our structs aren't openly (De)Serialize
anymore, since only internal structs (e.g. PrivateDirectoryContentSerializable
) are now serializable and we're using https://github.com/ipld/serde_ipld_dagcbor.
I'd consider this closed :)
Central to our inconsistency is the fact that we can't implement
Serialize
andDeserialize
for our core structs likePublicDirectory
orPrivateFile
, since they have additional context requirements during serialization or deserialization.Key
for serialization and deserialization used for en/decryption as well as someimpl RngCore
for randomness during serialization for generating the IV (initialization vector) for encryptionLink
abstraction which allows postponing serialization of CID references in structures as well as lazily loading what's behind a CID reference into an in-memory structure. This means we need aBlockStore
during serialization to callput_block
on for getting the actual CIDs to serialize. It also means that serialization can be asynchronous. We don't technically have an issue implementingDeserialize
, since we can just initialize ourLink
with the deserializedCid
s.There's two dimensions to this issue:
Serialize
orDeserialize
traits from serde, but since we have extra context requirements, we need to copy & modify these traits to become more "powerful" (i.e. have access to aBlockStore
& be async ->AsyncSerialize
). Such abstractions are then needed for abstract building blocks likeLink<T>
, which needsT
to be some kind of serializable/deserializable, but should be flexible enough to support data structures likePublicDirectory
.Serialize
andDeserialize
for our structs directly, we also can't directly use#[derive(Deserialize, Serialize)]
. We have resorted to different ways of solving this problem, although we're mostly using structs named*Serde
that only contain easily de/serializable data.For (1) we both have the
AsyncSerialize
trait for public data (notice itsserialize
is anasync fn
): https://github.com/wnfs-wg/rs-wnfs/blob/ce7d988157884115e32d7db412b7d4cedf56d23e/wnfs/src/common/async_serialize.rs#L34-L49But on the private side we didn't have a need to abstract yet, so we only used custom functions: (this serialization only needs some randomness for encryption, since the key can be derived from the ratchet within
PrivateDirectory
) https://github.com/wnfs-wg/rs-wnfs/blob/ce7d988157884115e32d7db412b7d4cedf56d23e/wnfs/src/private/directory.rs#L1140-L1147 (but we need to be provided the key for decryption) https://github.com/wnfs-wg/rs-wnfs/blob/ce7d988157884115e32d7db412b7d4cedf56d23e/wnfs/src/private/directory.rs#L1170-L1173And for (2) in order to make the above easier, we're using these
*Serde
structs (Notice what#[derive(Serialize, Deserialize)]
s): (notice how the*Serde
variant usesCid
instead ofPublicLink
) https://github.com/wnfs-wg/rs-wnfs/blob/ce7d988157884115e32d7db412b7d4cedf56d23e/wnfs/src/public/directory.rs#L41-L56 (forPrivateDirectory
thePrivateNodeHeader
turns into aVec<u8>
, the ciphertext of aPrivateNodeHeader
) https://github.com/wnfs-wg/rs-wnfs/blob/ce7d988157884115e32d7db412b7d4cedf56d23e/wnfs/src/private/directory.rs#L43-L58One problem of deriving
Serialize
andDeserialize
for a related struct likePrivateDirectorySerde
is that, this intermediate data structure mostly gets built up only to be torn down immediately afterwards. Ideally we can skip such work.While we're looking at these issues, there's a similar concern with building up
Ipld
as an intermediate data structure, e.g. for deserializingPublicNode
into appropriate elements, depending on what thetype
field deserializes to: https://github.com/wnfs-wg/rs-wnfs/blob/ce7d988157884115e32d7db412b7d4cedf56d23e/wnfs/src/public/node.rs#L268-L292