near / borsh

Binary Object Representation Serializer for Hashing
https://borsh.io/
474 stars 40 forks source link

borsh

Binary Object Representation Serializer for Hashing

Website | Join Community | Implementations | Benchmarks | Specification

Why do we need yet another serialization format? Borsh is the first serializer that prioritizes the following qualities that are crucial for security-critical projects:

Implementations

Platform Repository Latest Release
Rust borsh-rs Latest released version
TypeScript, JavaScript borsh-js Latest released version
TypeScript borsh-ts Latest released version
Java, Kotlin, Scala, Clojure, etc borshj
Go borsh-go Latest released version
Python borsh-construct-py Latest released version
Assemblyscript borsh-as Latest released version
C# Hexarc.Borsh Latest released version
C++ borsh-cpp (work-in-progress)
C++20 borsh-cpp20 (work-in-progress)
Elixir borsh-ex Latest released version

Benchmarks

We measured the following benchmarks on objects that blockchain projects care about the most: blocks, block headers, transactions, accounts. We took object structure from the NEAR Protocol blockchain. We used Criterion for building the following graphs.

The benchmarks were run on Google Cloud n1-standard-2 (2 vCPUs, 7.5 GB memory).

Block header serialization speed vs block header size in bytes (size only roughly corresponds to the serialization complexity which causes non-smoothness of the graph):

ser_header

Block header de-serialization speed vs block header size in bytes:

ser_header

Block serialization speed vs block size in bytes:

ser_header

Block de-serialization speed vs block size in bytes:

ser_header

See complete report here.

Specification

In short, Borsh is a non self-describing binary serialization format. It is designed to serialize any objects to canonical and deterministic set of bytes.

General principles:

Formal specification:

Informal type Rust EBNF * Pseudocode
Integers integer_type: ["u8" | "u16" | "u32" | "u64" | "u128" | "i8" | "i16" | "i32" | "i64" | "i128" ] little_endian(x)
Floats float_type: ["f32" | "f64" ] err_if_nan(x)
little_endian(x as integer_type)
Unit unit_type: "()" We do not write anything
Bool boolean_type: "bool" if x {
  repr(1 as u8)
} else {
  repr(0 as u8)
}
Fixed sized arrays array_type: '[' ident ';' literal ']' for el in x {
  repr(el as ident)
}
Dynamic sized array vec_type: "Vec<" ident '>' repr(len() as u32)
for el in x {
  repr(el as ident)
}
Struct struct_type: "struct" ident fields repr(fields)
Fields fields: [named_fields | unnamed_fields]
Named fields named_fields: '{' ident_field0 ':' ident_type0 ',' ident_field1 ':' ident_type1 ',' ... '}' repr(ident_field0 as ident_type0)
repr(ident_field1 as ident_type1)
...
Unnamed fields unnamed_fields: '(' ident_type0 ',' ident_type1 ',' ... ')' repr(x.0 as type0)
repr(x.1 as type1)
...
Enum enum: 'enum' ident '{' variant0 ',' variant1 ',' ... '}'
variant: ident [ fields ] ?
Suppose X is the number of the variant that the enum takes.
repr(X as u8)
repr(x.X as fieldsX)
HashMap hashmap: "HashMap<" ident0, ident1 ">" repr(x.len() as u32)
for (k, v) in x.sorted_by_key() {
  repr(k as ident0)
  repr(v as ident1)
}
HashSet hashset: "HashSet<" ident ">" repr(x.len() as u32)
for el in x.sorted() {
  repr(el as ident)
}
Option option_type: "Option<" ident '>' if x.is_some() {
  repr(1 as u8)
  repr(x.unwrap() as ident
} else {
  repr(0 as u8)
}
String string_type: "String" encoded = utf8_encoding(x) as Vec<u8>
repr(encoded.len() as u32)
repr(encoded as Vec<u8>)

Note: