zakarumych / alkahest

Fantastic serialization library
Other
157 stars 9 forks source link

Example on how to use Bincoded? #12

Closed vlovich closed 1 year ago

vlovich commented 1 year ago

The documentation says to use Bincoded but it's not clear to me exactly how to go about doing that.

#[derive(serde::Serialize, serde::Deserialize)]
struct SomeType {
}

#[alkahest(Formula)]
struct MyStruct {
   v: alkahest::Bincoded<SomeType>,
}

fn foo() {
   let v = SomeType {};
   let s = MyStruct {
     // what goes here?
   };
}

Similar question about how to actually access v when MyStruct is deserialized.

zakarumych commented 1 year ago

No values of your type MyStruct are needed. Formula types are only used on type level.

To serialize a value its type must implement Serialize<FormulaType>. Where FormulaType is a type that implements Formula which you wish to use. Serialize can be derived for structures and enums. Derive macro uses an attribute to determine FormulaType for which Serialize will be implemented. For each field of the structure for which you derive Serialize there must be a filed with same name and position in specified FormulaType. And field's type must implement Serialize<FormulaFieldType> where FormulaFieldType is type of the field in Formula type.

Deserialize derive macro works almost the same way.

Now you want to use Bincoded<T> type in formula. The T would implement Serialize<Bincoded<T>> if T: serde::Serialize. This means that you may do the following.

#[alkahest(Formula)]
struct MyFormula {
  v: Bincoded<SomeType>,
}

#[alkahest(Serialize<MyFormula>, Deserialize<MyFormula>)]
struct MyStruct {
  v: SomeType,
}

fn foo() {
   let v = SomeType {};
   let s = MyStruct {
     v,
   };

   let mut buffer = [0u8; 8]; // Should be enough for value `s`.
   serialize::<MyFormula, _>(s, &mut buffer);
}
vlovich commented 1 year ago

I didn't see anywhere on the documentation about this pattern for arranging things. It does feel conceptually a bit awkward to have to keep in sync parallel definitions between the formula and the value struct. Any way to have them be one and the same (like the examples in the README) and instead have annotations on the field?

Also, the compiler is throwing an Error on Deserialize saying "expected lifetime" but not giving any more indication beyond that.

Finally, I'm curious if there's a way to override the formula for a nested struct:

#[alkahest(Serialize<InternalFormula>, Deserialize<InternalFormula>)]
struct Internal {
   v: SomeType
}
#[alkahest(Formula)]
struct InternalFormula {
  v: Bincoded<SomeType>
}

#[alkahest(Formula), Serialize, Deserialize]
struct TypeToSerialize {
  internal: Internal,
}

The snippet above fails with (in addition to the lifetime issue):

the method `write_field` exists for struct `WithFormula<Internal>`, but its trait bounds were not satisfied
the following trait bounds were not satisfied:
`Internal: Formula`
zakarumych commented 1 year ago

Documentation is far from perfect I agree.

For many kinds of Formula you may as well implement Serialize and Deserialize on the same type. The only requirement is that fields' types also implement Formula and Serialize<Self>/Deserialize<Self>. This is true for all primitive types and also for as many types as possible.

But not for Bincoded<T>. If Bincoded<T> was struct Bincoded<T>(pub T); it is possible to implement Bincoded<T>: Serialize<Bincoded<T>> where T: serde::Serialize.

Overriding formula for a field for a type that implements both Formula and Serialize is not implemented and I'm not sure if this is a good thing. It could be even more confusing that it is now.

The idea behind all this type shenanigans is that caller of serialize functions may define its own types and derive Serialize<Formula> for it while being 100% sure it will be compatible with whatever Deserialize<Formula> impl there is on receiving end. And doing so make the type as cheap to construct as possible. Because it will be thrown away immediately after that.

For example with serde user often to keep any complex value for serialization instead of constructing them on-fly, but it may cause other problems or just be impossible and each serialization now begins with lots of allocations and iterator collections. Alkahest if flexible in this regard. You may simply put #[alkahest(Formula, Serialize, Deserialize)] on types and proceed to use it like serde. But in places where construction of the value to serialize is expensive - make ad-hoc Serialize type that will simply reference and iterate over the data required instead of copying it.

vlovich commented 1 year ago

Not trying to critique the docs. It's an impressive library. Just trying to understand how things are structured.

Any thoughts about why my code is failing to compile with expected lifetime for Deserialize using a separate formula?

The other thing I wanted to highlight was that the need to separate the formula is infectious. If I have a struct that has a Bincoded field, then any struct that includes it also has to have a separate formula and can't use the default annotation. Consider:

#[alkahest(Serialize<SomeSharedStructFormula>, Deserialize<SomeSharedStructFormula>)]
struct SomeSharedStruct {
  field: Bincoded<..>,
}

#[alkahest(Formula)]
struct SomeSharedStructFormula {}

#[alkahest(Formula, Serialize, Deserialize)]
struct Struct1 {
  field: SomeSharedStruct,
}

#[alkahest(Formula, Serialize, Deserialize)]
struct Struct2 {
  field: SomeSharedStruct,
}

Today, that doesn't work. I have to explicitly define Struct1Formula and Struct2Formula which then use SomeSharedStructFormula for field. Ideally Struct1 and Struct2 would either detect that the underlying formula for field automagically OR support something like:

#[alkahest(Formula, Serialize, Deserialize)]
struct Struct2 {
  #[alkahest(FieldFormula<SomeSharedStructFormula>)]
  field: SomeSharedStruct,
}

Magic would be best but not sure how easy it is.

Alkahest if flexible in this regard. You may simply put #[alkahest(Formula, Serialize, Deserialize)] on types and proceed to use it like serde. But in places where construction of the value to serialize is expensive - make ad-hoc Serialize type that will simply reference and iterate over the data required instead of copying it.

I'd love to learn more about this. It's not exactly clear to me how to use this high performance mode setup.

vlovich commented 1 year ago

Ugh. Figured out the Deserialize piece. Needed Deserialize<'_, T>.

zakarumych commented 1 year ago

It's probably my earlier comment was misleading you to think that there's no lifetime in Deserialize attribute argument syntax.

It is not possible to figure SomeSharedStructFormula from SomeSharedStruct field type, there's no 1-to-1 connection. A type may implement Serialize<F> for many F types. So automagic is ruled out. As for additional attribute, I don't see how its better than separate struct.

Regarding high performance mode. Depending on the values you may or may not gain better performance from using advanced tools available in Alkahest.

For instance you may wish to serialize a runtime sized array of values. The simple approach is to allocate a Vec and put values there, then serialize it. It works well in either serde or Alkahest or probably any other serialization lib for Rust. In Alkahest, however, you may construct an iterator and serialize it directly. Simply define ad-hoc type that contain iterator and all other required parts and derive Serialize<FormulaType> for it. No reason to allocate memory and copy values to Vec if they aren't already there.

During deserialization, you may consider to not deserialize a part of the value eagerly. For example value behind Bincoded<T>. Use Lazy<Bincoded<T>> as a field type on a struct that derives Deserialize<FormulaType>. Decide later when and if this data needs to be parsed.

Types duplication (triplication) may sound scary, but consistency between them is checked at compile time.

vlovich commented 1 year ago

Is it possible to implement SerializeRef for Bincoded? I'm trying to figure out the size of the formula containing a Bincoded type ahead of time so that I know how much allocation is needed.

zakarumych commented 1 year ago

Serialize<Bincoded> is implemented for T where T: serde::Serialize. But that means it is also implemented for &T since &T implements serde::Serialize where T: serde::Serialize.

Implementing SerializeRef is not necessary. All bounds must use Serialize trait.