ocapn / syrup

Syrup is a simple binary way of preserving data on the wire, with perhaps a few extra calories.
Apache License 2.0
20 stars 4 forks source link

Syrup is a lightweight and easy-to-develop (and reasonably easy to read) serialization of [[https://preserves.gitlab.io/preserves/][Preserves]]. You can also think of it as an extension of the ideas of [[https://people.csail.mit.edu/rivest/Sexp.txt][canonical s-expressions]] and [[https://en.wikipedia.org/wiki/Bencode][Bencode]].

For the sake of simplicity, let's look at examples in Python.

We've got bytestrings and strings and integers and booleans:

+BEGIN_SRC python

from syrup import syrup_encode, syrup_decode, Symbol

bytestrings

syrup_encode(b"a bytestring") b'12:a bytestring'

strings

syrup_encode("a string") b'8"a string'

symbols (maybe only lispy people care)

syrup_encode(Symbol('foo')) b"3'foo"

integers

syrup_encode(42) b'42+' syrup_encode(0) b'0+' syrup_encode(-123) b'123-'

floats (single and double precision, but Python only supports double)

(encoding looks ugly... floats are complicated, IEEE 754 bla bla)

syrup_encode(123.456) b'D@^\xdd/\x1a\x9f\xbew'

booleans

syrup_encode(True) b't' syrup_encode(False) b'f'

+END_SRC

But maybe you want to combine things. So ok, we have lists, sets, and dictionaries:

+BEGIN_SRC python

lists

syrup_encode(["foo", 123, True]) b'[3"foo123+t]'

dictionaries

syrup_encode({"species": "cat", ... "name": "Tabatha", ... "age": 12}) b'{3"age12+4"name7"Tabatha7"species3"cat}'

sets

syrup_encode({"cookie", "milk", "napkin"}) b'#4"milk6"cookie6"napkin$'

+END_SRC

When reading, whitespace is ignored (but we recommend using the official [[https://preserves.gitlab.io/preserves/][Preserves]] syntax if you're trying to make things human-readable).

+BEGIN_SRC python

syrup_decode(b'{3"age 12+ 4"name 7"Tabatha 7"species 3"cat}') {'age': 12, 'name': 'Tabatha', 'species': 'cat'}

+END_SRC

Arbitrary nesting is allowed, including in keys and sets, if the platform supports it.

Sets and dictionaries are both ordered, dictionaries by keys. Items and keys are sorted by comparing them in sorted form. (This might make Tony Garnock-Jones say, "yuck!") But in this way, if two applications both agree to use Syrup, it is an acceptable form for canonicalization.

Maybe this seems almost good enough, but you need your own types. For this purpose, Syrup also provides a Record type:

+BEGIN_SRC python

from syrup import record

We could encode a date as a single iso8601 string

syrup_encode(record('isodate', '2020-05-01T14:08:11')) b'<4"date19"2020-05-01T14:08:11>'

But records permit multiple arguments,

so we could encode it that way too

syrup_encode(record('date', 2020, 5, 1, 14, 8, 11)) b'<4"date2020+5+1+14+8+11+>'

+END_SRC

Here's nearly everything you need to know, taken right from a comment in the Racket implementation:

+BEGIN_SRC racket

;; Booleans: t or f ;; Single flonum: F (big endian) ;; Double flonum: D (big endian) ;; Positive integers: + ;; Negative integers: - ;; Bytestrings: 3:cat ;; Strings: 3"cat (utf-8 encoded) ;; Symbols: 3'cat (utf-8 encoded) ;; Dictionary: {} ;; Lists: [] ;; Records: <

+END_SRC

(Sorry, records look a bit confusing there, since =<>= are actually used rather than just a placeholder for a variable.)

There's only one other key detail. Writing out a Syrup structure should always canonicalize it. The good news: this is fairly easy to do via recursion, since only dictionaries and sets are unordered. Dictionaries are ordered by their keys, sets by their items (and dictionaries must not include the same key twice). Simply write out the keys/items first, then sort them by the bytes, from lower to higher.

That's it, really. Easy-peasy.

In comparison to [[https://people.csail.mit.edu/rivest/Sexp.txt][canonical s-expressions]], syrup uses similar syntax when limited to lists and bytestrings, except that in writing uses =[]= rather than =()=. But as you can see above, it also defines a lot of other types, too.

Syrup is also similar to [[https://en.wikipedia.org/wiki/Bencode][Bencode]], in that it supports integers, bytestrings, lists, and dictionaries. However its syntax for lists and dictionaries are different when syrup is written.

For funsies, the syrup decoders that ship in this repository will accept Bencode or canonical s-expression syntax (except for canonical s-expression "display hints" syntax, but those never made sense anyway... use records instead).

+BEGIN_SRC python

syrup_decode(b'd3:agei12e4:name5:Missy7:species3:cate') {b'age': 12, b'name': b'Missy', b'species': b'cat'}

+END_SRC

But, Syrup uses ={}= instead of =de= for dictionaries when encoding itself.

+BEGIN_SRC python

syrup_encode({b'age': 12, b'name': b'Missy', b'species': b'cat'}) b'{3:agei12e4:name5:Missy7:species3:cat}'

+END_SRC

Implementations in [[file:./impls/][impls/]] subdirectory:

External implementations:

Apache v2