NoProto: Flexible, Fast & Compact Serialization with RPC

Github | Crates.io | Documentation

Features

Lightweight

Zero dependencies
no_std support, WASM ready
Most compact non compiling storage format

Stable

Safely accept untrusted buffers
Passes Miri compiler safety checks
Panic and unwrap free

Easy

Extensive Documentation & Testing
Full interop with JSON, Import and Export JSON values
Thoroughly documented & simple data storage format

Fast

Zero copy deserialization
Most updates are append only
Deserialization is incrimental

Powerful

Native byte-wise sorting
Supports recursive data types
Supports most common native data types
Supports collections (list, map, struct & tuple)
Supports arbitrary nesting of collection types
Schemas support default values and non destructive updates
Transport agnostic RPC Framework.

Why ANOTHER Serialization Format?

NoProto combines the performance of compiled formats with the flexibilty of dynamic formats:

Compiled formats like Flatbuffers, CapN Proto and bincode have amazing performance and extremely compact buffers, but you MUST compile the data types into your application. This means if the schema of the data changes the application must be recompiled to accomodate the new schema.

Dynamic formats like JSON, MessagePack and BSON give flexibilty to store any data with any schema at runtime but the buffers are fat and performance is somewhere between horrible and hopefully acceptable.

NoProto takes the performance advantages of compiled formats and implements them in a flexible format.

NoProto is a key-value database focused format:

Byte Wise Sorting Ever try to store a signed integer as a sortable key in a database? NoProto can do that. Almost every data type is stored in the buffer as byte-wise sortable, meaning buffers can be compared at the byte level for sorting without deserializing.

Primary Key Management Compound sortable keys are extremely easy to generate, maintain and update with NoProto. You don't need a custom sort function in your key-value store, you just need this library.

UUID & ULID Support NoProto is one of the few formats that come with first class suport for these popular primary key data types. It can easily encode, decode and generate these data types.

Fastest Updates NoProto is the only format that supports all mutations without deserializng. It can do the common database read -> update -> write operation between 50x - 300x faster than other dynamic formats. Benchamrks

Comparison With Other Formats

Compared to Apache Avro

- Far more space efficient
- Significantly faster serialization & deserialization
- All values are optional (no void or null type)
- Supports more native types (like unsigned ints)
- Updates without deserializng/serializing
- Works with `no_std`.
- Safely handle untrusted data.

Compared to Protocol Buffers

- Comparable serialization & deserialization performance
- Updating buffers is an order of magnitude faster
- Schemas are dynamic at runtime, no compilation step
- All values are optional
- Supports more types and better nested type support
- Byte-wise sorting is first class operation
- Updates without deserializng/serializing
- Safely handle untrusted data.
- All values are optional and can be inserted in any order.

Compared to JSON / BSON

- Far more space efficient
- Significantly faster serialization & deserialization
- Deserializtion is zero copy
- Has schemas / type safe
- Supports byte-wise sorting
- Supports raw bytes & other native types
- Updates without deserializng/serializing
- Works with `no_std`.
- Safely handle untrusted data.

Compared to Flatbuffers / Bincode

- Data types can change or be created at runtime
- Updating buffers is an order of magnitude faster
- Supports byte-wise sorting
- Updates without deserializng/serializing
- Works with `no_std`.
- Safely handle untrusted data.
- All values are optional and can be inserted in any order.

Format	Zero-Copy	Size Limit	Mutable	Schemas	Byte-wise Sorting
Runtime Libs
NoProto	✓	~4GB	✓	✓	✓
Apache Avro	✗	2^63 Bytes	✗	✓	✓
JSON	✗	Unlimited	✓	✗	✗
BSON	✗	~16MB	✓	✗	✗
MessagePack	✗	Unlimited	✓	✗	✗
Compiled Libs
FlatBuffers	✓	~2GB	✗	✓	✗
Bincode	✓	?	✓	✓	✗
Protocol Buffers	✗	~2GB	✗	✓	✗
Cap'N Proto	✓	2^64 Bytes	✗	✓	✗
Veriform	✗	?	✗	✗	✗

Quick Example

use no_proto::error::NP_Error;
use no_proto::NP_Factory;

// An ES6 like IDL is used to describe schema for the factory
// Each factory represents a single schema
// One factory can be used to serialize/deserialize any number of buffers
let user_factory = NP_Factory::new(r#"
    struct({ fields: {
        name: string(),
        age: u16({ default: 0 }),
        tags: list({ of: string() })
    }})
"#)?;

// create a new empty buffer
let mut user_buffer = user_factory.new_buffer(None); // optional capacity

// set the "name" field
user_buffer.set(&["name"], "Billy Joel")?;

// read the "name" field
let name = user_buffer.get::<&str>(&["name"])?;
assert_eq!(name, Some("Billy Joel"));

// set a nested value, the first tag in the tag list
user_buffer.set(&["tags", "0"], "first tag")?;

// read the first tag from the tag list
let tag = user_buffer.get::<&str>(&["tags", "0"])?;
assert_eq!(tag, Some("first tag"));

// close buffer and get internal bytes
let user_bytes: Vec<u8> = user_buffer.finish().bytes();

// open the buffer again
let user_buffer = user_factory.open_buffer(user_bytes);

// read the "name" field again
let name = user_buffer.get::<&str>(&["name"])?;
assert_eq!(name, Some("Billy Joel"));

// get the age field
let age = user_buffer.get::<u16>(&["age"])?;
// returns default value from schema
assert_eq!(age, Some(0u16));

// close again
let user_bytes: Vec<u8> = user_buffer.finish().bytes();

// we can now save user_bytes to disk, 
// send it over the network, or whatever else is needed with the data

# Ok::<(), NP_Error>(())

Guided Learning / Next Steps:

Schemas - Learn how to build & work with schemas.
Factories - Parsing schemas into something you can work with.
Buffers - How to create, update & compact buffers/data.
RPC Framework - How to use the RPC Framework APIs.
Data & Schema Format - Learn how data is saved into the buffer and schemas.

Benchmarks

While it's difficult to properly benchmark libraries like these in a fair way, I've made an attempt in the graph below. These benchmarks are available in the bench folder and you can easily run them yourself with cargo run --release.

The format and data used in the benchmarks were taken from the flatbuffers benchmarks github repo. You should always benchmark/test your own use case for each library before making any choices on what to use.

Legend: Ops / Millisecond, higher is better

Format / Lib	Encode	Decode All	Decode 1	Update 1	Size (bytes)	Size (Zlib)
Runtime Libs
NoProto
no_proto	1393	1883	55556	9524	308	198
Apache Avro
avro-rs	156	57	56	40	702	337
FlexBuffers
flexbuffers	444	962	24390	294	490	309
JSON
json	609	481	607	439	439	184
serde_json	938	646	644	403	446	198
BSON
bson	129	116	123	90	414	216
rawbson	130	1117	17857	89	414	216
MessagePack
rmp	661	623	832	202	311	193
messagepack-rs	152	266	284	138	296	187
Compiled Libs
Flatbuffers
flatbuffers	3165	16393	250000	2532	264	181
Bincode
bincode	6757	9259	10000	4115	163	129
Postcard
postcard	3067	7519	7937	2469	128	119
Protocol Buffers
protobuf	953	1305	1312	529	154	141
prost	1464	2020	2232	1040	154	142
Abomonation
abomonation	2342	125000	500000	2183	261	160
Rkyv
rkyv	1605	37037	200000	1531	180	154

Encode: Transfer a collection of fields of test data into a serialized Vec<u8>.
Decode All: Deserialize the test object from the Vec<u8> into all fields.
Decode 1: Deserialize the test object from the Vec<u8> into one field.
Update 1: Deserialize, update a single field, then serialize back into Vec<u8>.

Runtime VS Compiled Libs: Some formats require data types to be compiled into the application, which increases performance but means data types cannot change at runtime. If data types need to mutate during runtime or can't be known before the application is compiled (like with databases), you must use a format that doesn't compile data types into the application, like JSON or NoProto.

Complete benchmark source code is available here. Suggestions for improving the quality of these benchmarks is appreciated.

NoProto Strengths

If your use case fits any of the points below, NoProto might be a good choice for your application.

Flexible At Runtime
If you need to work with data types that will change or be created at runtime, you normally have to pick something like JSON since highly optimized formats like Flatbuffers and Bincode depend on compiling the data types into your application (making everything fixed at runtime). When it comes to formats that can change/implement data types at runtime, NoProto is fastest format we're aware of (if you know if one that might be faster, let us know!).
Safely Accept Untrusted Data
The worse case failure mode for NoProto buffers is junk data. While other formats can cause denial of service attacks or allow unsafe memory access, there is no such failure case with NoProto. There is no way to construct a NoProto buffer that would cause any detrement in performance to the host application or lead to unsafe memory access. Also, there is no panic causing code in the library, meaning it will never crash your application.
Extremely Fast Updates
If you have a workflow in your application that is read -> modify -> write with buffers, NoProto will usually outperform every other format, including Bincode and Flatbuffers. This is because NoProto never actually deserializes, it doesn't need to. This includes complicated mutations like pushing a value onto a nested list or replacing entire structs.
All Fields Optional, Insert/Update In Any Order
Many formats require that all values be present to close the buffer, further they may require data to be inserted in a specific order to accomodate the encoding/decoding scheme. With NoProto, all fields are optional and any update/insert can happen in any order.
Incremental Deserializing
You only pay for the fields you read, no more. There is no deserializing step in NoProto, opening a buffer performs no operations. Once you start asking for fields, the library will navigate the buffer using the format rules to get just what you asked for and nothing else. If you have a workflow in your application where you read a buffer and only grab a few fields inside it, NoProto will outperform most other libraries.
Bytewise Sorting
Almost all of NoProto's data types are designed to serialize into bytewise sortable values, including signed integers. When used with Tuples, making database keys with compound sorting is extremly easy. When you combine that with first class support for UUIDs and ULIDs NoProto makes an excellent tool for parsing and creating primary keys for databases like RocksDB, LevelDB and TiKV.
no_std Support
If you need a serialization format with low memory usage that works in no_std environments, NoProto is one of the few good choices.
Stable
NoProto will never cause a panic in your application. It has zero panics or unwraps, meaning there is no code path that could lead to a panic. Fallback behavior is to provide a sane default path or bubble an error up to the caller.
CPU Independent
All numbers and pointers in NoProto buffers are always stored in big endian, so you can safely create buffers on any CPU architecture and know that they will work with any other CPU architecture.

When to use Flatbuffers / Bincode / CapN Proto

If you can safely compile all your data types into your application, all the buffers/data is trusted, and you don't intend to mutate buffers after they're created, Bincode/Flatbuffers/CapNProto is a better choice for you.

When to use JSON / BSON / MessagePack

If your data changes so often that schemas don't really make sense or the format you use must be self describing, JSON/BSON/MessagePack is a better choice. Although I'd argue that if you can make schemas work you should. Once you can use a format with schemas you save a ton of space in the resulting buffers and performance far better.

Limitations

Structs and Tuples cannot have more than 255 items.
Lists and Maps cannot have more than 2^16 (~64k) items.
You cannot nest more than 255 levels deep.
Struct field names cannot be longer than 255 UTF8 bytes.
Enum/Option types are limited to 255 options and each option cannot be more than 255 UTF8 Bytes.
Map keys cannot be larger than 255 UTF8 bytes.
Buffers cannot be larger than 2^32 bytes or ~4GB.

Unsafe

This library makes use of unsafe to get better performance. Generally speaking, it's not possible to have a high performance serialization library without unsafe. It is only used where performance improvements are significant and additional checks are performed so that the worst case for any unsafe block is it leads to junk data in a buffer.

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

only-cliches / NoProto

readme