Closed mina86 closed 11 months ago
I would avoid expanding the scope of borsh spec with varint/smallvec specializations. I would treat these types as application-specific ones and leave app developers to optimize their custom types on their end.
So my question is how do I implement BorshSchema for such type? There’s no Definition for an application-specific encoding. The options seems to be:
Perhaps it would make sense to have Definition::AppSpecific with some at least rudimentary description of the format (e.g. min and max encoded length). For varint for example this would mean a definition "VarInt<u32>"
→ Definition::AppSpecific(1..5)
.
I think this also maybe relates to https://github.com/near/borsh-rs/issues/181. Perhaps it would make sense to extend Sequence and Enum by adding length_size
and tag_size
fields respectively? So currently we’d have Sequence { length_size: 4, elements: ... }
and Enum { tag_size: 1, variants: ... }
. This would allow expressing smallvec and enums with different tag representation.
A vector of varints Vec<VarInt>
can be serialized as Vec<u8>
first and then presented as that to borsh
, if the need for compression, that varint provides, is required.
The info about total num of VarInt
-s will be lost, the info about total bytes - not. So it will look like a Sequence { elements: "u8".to_string() }
with respect to schema.
It's about the same with rust's String
at the moment. A String
is essentially a Vec<VarInt>
. It's serialized as Vec<u8>
with info about total characters lost in serialized form, and having a "string"
Declaration
for itself and empty Definition
. (second option in comment )
Similarly to String
, one can define a type VarintsVec(Vec<VarInt>)
, serialize and deserialize the contents as Vec<u8>
, with error checking during deserialization (about the lengths of encountered varints), and define BorshSchema
as special "varint_vector"
Declaration
and empty Definition
.
A SmallVec
type will on average be 127 bytes long (with minimal nonzero length of a type defined as 1 byte according to #209 ), and defining header_size
field in Definition::Sequence
for the gain of 3 bytes less spent on header of an average ~120 bytes payload doesn't appear a big gain compared to just using Vec
.
It's about the same with rust's
String
at the moment. AString
is essentially aVec<VarInt>
. It's serialized asVec<u8>
with info about total characters lost in serialized form, and having astring
Declaration
for itself and emptyDefinition
.
That’s not quite the same though. In String case, I can deserialise Vec<u8>
and then convert it with no additional allocations to String. With Vec<VarInt>
I’d have to first deserialise Vec<u8>
and then allocate a new (say) Vec<VarInt<u32>>
.
However, this is a bit besides the point. Of course, I can always write serialisation which can be described by BorshSchema. The question is what to do when serialisation I’m using cannot be described by BorschSchema.
Say I’d like to use varint in borsh. Or have a custom SmallVec type which is encoded with 8-bit length rather than 32-bit length.
This is easy enough to do by implementing custom BorshSerialize and BorshDeserialize. However, BorshSchema becomes an issue. Varint could be modelled as a nested enum with 256 variants. Similarly SmallVec could be modeled as an enum with 256 variants each being an array. That’s hardly a clean solution though.
Do you guys have any thoughts on that?