karalabe / ssz

Opinionated 0-alloc SSZ codec for Go
https://github.com/ethereum/consensus-specs/blob/dev/ssz/simple-serialize.md
BSD 3-Clause "New" or "Revised" License
40 stars 8 forks source link
encoder ethereum golang ssz

Obligatory xkcd

Simple Serialize (SSZ)... v15

API Reference Build Status Code Coverage

Package ssz provides a zero-allocation, opinionated toolkit for working with Ethereum's Simple Serialize (SSZ) format through Go. The focus is on code maintainability, only secondarily striving towards raw performance.

Please note, this repository is a work in progress. The API is unstable and breaking changes will regularly be made. Do not depend on this in publicly available modules.

This package heavily inspired from the code generated by- and contained within fastssz!

Goals and objectives

Expectations

Whilst we aim to be a become the SSZ encoder of go-ethereum - and more generally, a go-to encoder for all Go applications requiring to work with Ethereum data blobs - there is no guarantee that this outcome will occur. At the present moment, this package is still in the design and experimentation phase and is not ready for a formal proposal.

There are several possible outcomes from this experiment:

Design

Responsibilities

The ssz package splits the responsibility between user code and library code in the way pictured below:

Scope

Weird stuff

The Simple Serialize spec has schema definitions for mapping SSZ data to JSON. We believe in separation of concerns. This library does not concern itself with encoding/decoding from formats other than SSZ.

How to use

First up, you need to add the package to your project:

go get github.com/karalabe/ssz

Static types

Some types in Ethereum will only contain a handful of statically sized fields. One example is a Withdrawal:

type Address [20]byte

type Withdrawal struct {
    Index     uint64
    Validator uint64
    Address   Address
    Amount    uint64
}

To encode/decode such an object via SSZ, it needs to implement the ssz.StaticObject interface:

type StaticObject interface {
    // SizeSSZ returns the total size of an SSZ object.
    SizeSSZ() uint32

    // DefineSSZ defines how an object would be encoded/decoded.
    DefineSSZ(codec *Codec)
}
func (w *Withdrawal) SizeSSZ() uint32 { return 44 }

func (w *Withdrawal) DefineSSZ(codec *ssz.Codec) {
    ssz.DefineUint64(codec, &w.Index)        // Field (0) - Index          -  8 bytes
    ssz.DefineUint64(codec, &w.Validator)    // Field (1) - ValidatorIndex -  8 bytes
    ssz.DefineStaticBytes(codec, &w.Address) // Field (2) - Address        - 20 bytes
    ssz.DefineUint64(codec, &w.Amount)       // Field (3) - Amount         -  8 bytes
}

To encode the above Withdrawal into an SSZ stream, use either ssz.EncodeToStream or ssz.EncodeToBytes. The former will write into a stream directly, whilst the latter will write into a bytes buffer directly. In both cases you need to supply the output location to avoid GC allocations in the library.

func main() {
    out := new(bytes.Buffer)
    if err := ssz.EncodeToStream(out, new(Withdrawal)); err != nil {
        panic(err)
    }
    fmt.Printf("ssz: %#x\n", blob)
}

To decode an SSZ blob, use ssz.DecodeFromStream and ssz.DecodeFromBytes with the same disclaimers about allocations. Note, decoding requires knowing the size of the SSZ blob in advance. Unfortunately, this is a limitation of the SSZ format.

Dynamic types

Most data types in Ethereum will contain a cool mix of static and dynamic data fields. Encoding those is much more interesting, yet still proudly simple. One such a data type would be an ExecutionPayload as seen below:

type Hash      [32]byte
type LogsBLoom [256]byte

type ExecutionPayload struct {
    ParentHash    Hash
    FeeRecipient  Address
    StateRoot     Hash
    ReceiptsRoot  Hash
    LogsBloom     LogsBLoom
    PrevRandao    Hash
    BlockNumber   uint64
    GasLimit      uint64
    GasUsed       uint64
    Timestamp     uint64
    ExtraData     []byte
    BaseFeePerGas *uint256.Int
    BlockHash     Hash
    Transactions  [][]byte
    Withdrawals   []*Withdrawal
}

Do note, we've reused the previously defined Address and Withdrawal types. You'll need those too to make this part of the code work. The uint256.Int type is from the github.com/holiman/uint256 package.

To encode/decode such an object via SSZ, it needs to implement the ssz.DynamicObject interface:

type DynamicObject interface {
    // SizeSSZ returns either the static size of the object if fixed == true, or
    // the total size otherwise.
    SizeSSZ(fixed bool) uint32

    // DefineSSZ defines how an object would be encoded/decoded.
    DefineSSZ(codec *Codec)
}

If you look at it more closely, you'll notice that it's almost the same as ssz.StaticObject, except the type of SizeSSZ is different, here taking an extra boolean argument. The method name/type clash is deliberate: it guarantees compile time that dynamic objects cannot end up in static ssz slots and vice versa.

func (e *ExecutionPayload) SizeSSZ(fixed bool) uint32 {
    // Start out with the static size
    size := uint32(512)
    if fixed {
        return size
    }
    // Append all the dynamic sizes
    size += ssz.SizeDynamicBytes(e.ExtraData)           // Field (10) - ExtraData    - max 32 bytes (not enforced)
    size += ssz.SizeSliceOfDynamicBytes(e.Transactions) // Field (13) - Transactions - max 1048576 items, 1073741824 bytes each (not enforced)
    size += ssz.SizeSliceOfStaticObjects(e.Withdrawals) // Field (14) - Withdrawals  - max 16 items, 44 bytes each (not enforced)

    return size
}

Opposed to the static Withdrawal from the previous section, ExecutionPayload has both static and dynamic fields, so we can't just return a pre-computed literal number.

The codec itself is very similar to the static example before:

func (e *ExecutionPayload) DefineSSZ(codec *ssz.Codec) {
    // Define the static data (fields and dynamic offsets)
    ssz.DefineStaticBytes(codec, &e.ParentHash)                                           // Field  ( 0) - ParentHash    -  32 bytes
    ssz.DefineStaticBytes(codec, &e.FeeRecipient)                                         // Field  ( 1) - FeeRecipient  -  20 bytes
    ssz.DefineStaticBytes(codec, &e.StateRoot)                                            // Field  ( 2) - StateRoot     -  32 bytes
    ssz.DefineStaticBytes(codec, &e.ReceiptsRoot)                                         // Field  ( 3) - ReceiptsRoot  -  32 bytes
    ssz.DefineStaticBytes(codec, &e.LogsBloom)                                            // Field  ( 4) - LogsBloom     - 256 bytes
    ssz.DefineStaticBytes(codec, &e.PrevRandao)                                           // Field  ( 5) - PrevRandao    -  32 bytes
    ssz.DefineUint64(codec, &e.BlockNumber)                                               // Field  ( 6) - BlockNumber   -   8 bytes
    ssz.DefineUint64(codec, &e.GasLimit)                                                  // Field  ( 7) - GasLimit      -   8 bytes
    ssz.DefineUint64(codec, &e.GasUsed)                                                   // Field  ( 8) - GasUsed       -   8 bytes
    ssz.DefineUint64(codec, &e.Timestamp)                                                 // Field  ( 9) - Timestamp     -   8 bytes
    ssz.DefineDynamicBytesOffset(codec, &e.ExtraData, 32)                                 // Offset (10) - ExtraData     -   4 bytes
    ssz.DefineUint256(codec, &e.BaseFeePerGas)                                            // Field  (11) - BaseFeePerGas -  32 bytes
    ssz.DefineStaticBytes(codec, &e.BlockHash)                                            // Field  (12) - BlockHash     -  32 bytes
    ssz.DefineSliceOfDynamicBytesOffset(codec, &e.Transactions, 1_048_576, 1_073_741_824) // Offset (13) - Transactions  -   4 bytes
    ssz.DefineSliceOfStaticObjectsOffset(codec, &e.Withdrawals, 16)                       // Offset (14) - Withdrawals   -   4 bytes

    // Define the dynamic data (fields)
    ssz.DefineDynamicBytesContent(codec, &e.ExtraData, 32)                                 // Field (10) - ExtraData
    ssz.DefineSliceOfDynamicBytesContent(codec, &e.Transactions, 1_048_576, 1_073_741_824) // Field (13) - Transactions
    ssz.DefineSliceOfStaticObjectsContent(codec, &e.Withdrawals, 16)                       // Field (14) - Withdrawals
}

Most of the DefineXYZ methods are similar as before. However, you might spot two distinct sets of method calls, DefineXYZOffset and DefineXYZContent. You'll need to use these for dynamic fields:

To encode the above ExecutionPayload do just as we have done with the static Withdrawal object.

Asymmetric types

For types defined in perfect isolation - dedicated for SSZ - it's easy to define the fields with the perfect types, and perfect sizes, and perfect everything. Generating or writing an elegant encoder for those, is easy.

In reality, often you'll need to encode/decode types which already exist in a codebase, which might not map so cleanly onto the SSZ defined structure spec you want (e.g. you have one union type of ExecutionPayload that contains all the Bellatrix, Capella, Deneb, etc fork fields together) and you want to encode/decode them differently based on the context.

Most SSZ libraries will not permit you to do such a thing. Reflection based libraries cannot infer the context in which they should switch encoders and can neither can they represent multiple encodings at the same time. Generator based libraries again have no meaningful way to specify optional fields based on different constraints and contexts.

The only way to handle such scenarios is to write the encoders by hand, and furthermore, encoding might be dependent on what's in the struct, whilst decoding might be dependent on what's it contained within. Completely asymmetric, so our unified codec definition approach from the previous sections cannot work.

For these scenarios, this package has support for asymmetric encoders/decoders, where the caller can independently implement the two paths with their unique quirks.

To avoid having a real-world example's complexity overshadow the point we're trying to make here, we'll just convert the previously demoed Withdrawal encoding/decoding from the unified codec version to a separate encoder and decoder version.

func (w *Withdrawal) DefineSSZ(codec *ssz.Codec) {
    codec.DefineEncoder(func(enc *ssz.Encoder) {
        ssz.EncodeUint64(enc, w.Index)         // Field (0) - Index          -  8 bytes
        ssz.EncodeUint64(enc, w.Validator)     // Field (1) - ValidatorIndex -  8 bytes
        ssz.EncodeStaticBytes(enc, &w.Address) // Field (2) - Address        - 20 bytes
        ssz.EncodeUint64(enc, w.Amount)        // Field (3) - Amount         -  8 bytes
    })
    codec.DefineDecoder(func(dec *ssz.Decoder) {
        ssz.DecodeUint64(dec, &w.Index)        // Field (0) - Index          -  8 bytes
        ssz.DecodeUint64(dec, &w.Validator)    // Field (1) - ValidatorIndex -  8 bytes
        ssz.DecodeStaticBytes(dec, &w.Address) // Field (2) - Address        - 20 bytes
        ssz.DecodeUint64(dec, &w.Amount)       // Field (3) - Amount         -  8 bytes
    })
}

Encoding the above Withdrawal into an SSZ stream, you use the same thing as before. Everything is seamless.

Checked types

If your types are using strongly typed arrays (e.g. [32]byte, and not []byte) for static lists, the above codes work just fine. However, some types might want to use []byte as the field type, but have it still behave as if it was [32]byte. This poses an issue, because if the decoder only sees []byte, it cannot figure out how much data you want to decode into it. For those scenarios, we have checked methods.

The previous Withdrawal is a good example. Let's replace the type Address [20]byte alias, with a plain []byte slice (not a [20]byte array, rather an opaque []byte slice).

type Withdrawal struct {
    Index     uint64
    Validator uint64
    Address   []byte
    Amount    uint64
}

The code for the SizeSSZ remains the same. The code for DefineSSZ changes ever so slightly:

func (w *Withdrawal) DefineSSZ(codec *ssz.Codec) {
    ssz.DefineUint64(codec, &w.Index)                   // Field (0) - Index          -  8 bytes
    ssz.DefineUint64(codec, &w.Validator)               // Field (1) - ValidatorIndex -  8 bytes
    ssz.DefineCheckedStaticBytes(codec, &w.Address, 20) // Field (2) - Address        - 20 bytes
    ssz.DefineUint64(codec, &w.Amount)                  // Field (3) - Amount         -  8 bytes
}

Notably, the ssz.DefineStaticBytes call from our old code (which got given a [20]byte array), is replaced with ssz.DefineCheckedStaticBytes. The latter method operates on an opaque []byte slice, so if we want it to behave like a static sized list, we need to tell it how large it's needed to be. This will result in a runtime check to ensure that the size is correct before decoding.

Note, checked methods entail a runtime cost. When decoding such opaque slices, we can't blindly fill the fields with data, rather we need to ensure that they are allocated and that they are of the correct size. Ideally only use checked methods for prototyping or for pre-existing types where you just have to run with whatever you have and can't change the field to an array.

Monolithic types

We've seen previously, that asymmetric codecs can be used to implement custom serialization logic for types that might encode in a variety of ways depending on their data content.

One verify specific subset of that scenario is the Ethereum consensus typeset. Whenever a new fork is released, a number of types are slightly modified, usually by adding new fields to existing structs. In the beacon specs, this usually results in an explosion of types: a new base type for fork X is created (e.g. BeaconBlockBodyBellatrix), but any type including that also needs to be re-created for fork X (e.g. BeaconBlockBellatrix), resulting in cascading type creations. Point in case, there are 79 consensus types in Prysm, most of which are copies of one another with tiny additions.

This design is definitely clean and works well if these containers are used just as data transmission objects or storage objects. However, operating on hundreds of types storing the same thing in a live codebase is unwieldy. In go-ethereum we've always used monolithic types that encode just right according to the RLP specs of EL forks and thus this library aims to provide similar support for the SSZ world too.

We define a monolithic type as a container that can be encoded/decoded differently, based on what fork the codec runs in. To give an example, let's look at the previous ExecutionPayload, but instead of using it to represent a single possible consensus form, let's define all possible fields across all possible forks:

type ExecutionPayloadMonolith struct {
    ParentHash    Hash
    FeeRecipient  Address
    StateRoot     Hash
    ReceiptsRoot  Hash
    LogsBloom     LogsBLoom
    PrevRandao    Hash
    BlockNumber   uint64
    GasLimit      uint64
    GasUsed       uint64
    Timestamp     uint64
    ExtraData     []byte
    BaseFeePerGas *uint256.Int
    BlockHash     Hash
    Transactions  [][]byte
    Withdrawals   []*Withdrawal // Appears in the Shanghai fork
    BlobGasUsed   *uint64       // Appears in the Cancun fork
    ExcessBlobGas *uint64       // Appears in the Cancun fork
}

Not much difference versus what we've used previously, but note, the fields that are fork-specific must all be nil-able (Withdrawal is a slice that can be nil and the blob gas fields are *uint64, which again can be nil).

Like before, we need to implement the SizeSSZ method:

func (e *ExecutionPayloadMonolith) SizeSSZ(sizer *ssz.Sizer, fixed bool) uint32 {
    // Start out with the static size
    size := uint32(512)
    if sizer.Fork() >= ssz.ForkShanghai {
        size += 4
    }
    if sizer.Fork() >= ssz.ForkCancun {
        size += 16
    }
    if fixed {
        return size
    }
    // Append all the dynamic sizes
    size += ssz.SizeDynamicBytes(sizer, obj.ExtraData)
    size += ssz.SizeSliceOfDynamicBytes(sizer, obj.Transactions)
    if sizer.Fork() >= ssz.ForkShanghai {
        size += ssz.SizeSliceOfStaticObjects(sizer, obj.Withdrawals)
    }
    return size
}

This time, it was a bit more complex:

Similarly to how SizeSSZ needs to be fork-enabled, DefineSSZ goes through a transformation:

func (obj *ExecutionPayloadMonolith) DefineSSZ(codec *ssz.Codec) {
    // Define the static data (fields and dynamic offsets)
    ssz.DefineStaticBytes(codec, &obj.ParentHash)                                                                    // Field  ( 0) -    ParentHash -  32 bytes
    ssz.DefineStaticBytes(codec, &obj.FeeRecipient)                                                                  // Field  ( 1) -  FeeRecipient -  20 bytes
    ssz.DefineStaticBytes(codec, &obj.StateRoot)                                                                     // Field  ( 2) -     StateRoot -  32 bytes
    ssz.DefineStaticBytes(codec, &obj.ReceiptsRoot)                                                                  // Field  ( 3) -  ReceiptsRoot -  32 bytes
    ssz.DefineStaticBytes(codec, &obj.LogsBloom)                                                                     // Field  ( 4) -     LogsBloom - 256 bytes
    ssz.DefineStaticBytes(codec, &obj.PrevRandao)                                                                    // Field  ( 5) -    PrevRandao -  32 bytes
    ssz.DefineUint64(codec, &obj.BlockNumber)                                                                        // Field  ( 6) -   BlockNumber -   8 bytes
    ssz.DefineUint64(codec, &obj.GasLimit)                                                                           // Field  ( 7) -      GasLimit -   8 bytes
    ssz.DefineUint64(codec, &obj.GasUsed)                                                                            // Field  ( 8) -       GasUsed -   8 bytes
    ssz.DefineUint64(codec, &obj.Timestamp)                                                                          // Field  ( 9) -     Timestamp -   8 bytes
    ssz.DefineDynamicBytesOffset(codec, &obj.ExtraData, 32)                                                          // Offset (10) -     ExtraData -   4 bytes
    ssz.DefineUint256(codec, &obj.BaseFeePerGas)                                                                     // Field  (11) - BaseFeePerGas -  32 bytes
    ssz.DefineStaticBytes(codec, &obj.BlockHash)                                                                     // Field  (12) -     BlockHash -  32 bytes
    ssz.DefineSliceOfDynamicBytesOffset(codec, &obj.Transactions, 1048576, 1073741824)                               // Offset (13) -  Transactions -   4 bytes
    ssz.DefineSliceOfStaticObjectsOffsetOnFork(codec, &obj.Withdrawals, 16, ssz.ForkFilter{Added: ssz.ForkShanghai}) // Offset (14) -   Withdrawals -   4 bytes
    ssz.DefineUint64PointerOnFork(codec, &obj.BlobGasUsed, ssz.ForkFilter{Added: ssz.ForkCancun})                    // Field  (15) -   BlobGasUsed -   8 bytes
    ssz.DefineUint64PointerOnFork(codec, &obj.ExcessBlobGas, ssz.ForkFilter{Added: ssz.ForkCancun})                  // Field  (16) - ExcessBlobGas -   8 bytes

    // Define the dynamic data (fields)
    ssz.DefineDynamicBytesContent(codec, &obj.ExtraData, 32)                                                          // Field  (10) -     ExtraData - ? bytes
    ssz.DefineSliceOfDynamicBytesContentOnFork(codec, &obj.Transactions, 1048576, 1073741824)                         // Field  (13) -  Transactions - ? bytes
    ssz.DefineSliceOfStaticObjectsContentOnFork(codec, &obj.Withdrawals, 16, ssz.ForkFilter{Added: ssz.ForkShanghai}) // Field  (14) -   Withdrawals - ? bytes
}

The above code is eerily similar to our previous codec, yet, weirdly strange. Wherever fork specific fields appear, the methods get suffixed with OnFork and get passed the rule as to which fork to apply in (e.g. ssz.ForkFilter{Added: ssz.ForkCancun}). There are good reasons for both:

Lastly, to encode the above ExecutionPayloadMonolith into an SSZ stream, we can't use the tried and proven ssz.EncodeToStream, since that will not know what fork we'd like to use. Rather, again, we need to call an OnFork version:

func main() {
    out := new(bytes.Buffer)
    if err := ssz.EncodeToStreamOnFork(out, new(ExecutionPayloadMonolith), ssz.ForkCancun); err != nil {
        panic(err)
    }
    fmt.Printf("ssz: %#x\n", blob)
}

As a side emphasis, although the SSZ library has the Ethereum hard-forks included (e.g. ssz.ForkCancun and ssz.ForkDeneb), there is nothing stopping a user of the library from using their own fork enum (e.g. mypkg.ForkAlice and mypkg.ForkBob), just type it with ssz.Fork and make sure 0 means some variation of unknown/present in all forks.

Generated encoders

More often than not, the Go structs that you'd like to serialize to/from SSZ are simple data containers. Without some particular quirk you'd like to explicitly support, there's little reason to spend precious time counting the bits and digging through a long list of encoder methods to call.

For those scenarios, the library also supports generating the encoding/decoding code via a Go command:

go run github.com/karalabe/ssz/cmd/sszgen --help

Inferred field sizes

Let's go back to our very simple Withdrawal type from way back.

type Withdrawal struct {
    Index     uint64
    Validator uint64
    Address   [20]byte
    Amount    uint64
}

This seems like a fairly simple thing that we should be able to automatically generate a codec for. Let's try:

go run github.com/karalabe/ssz/cmd/sszgen --type Withdrawal

Calling the generator on this type will produce the following (very nice I might say) code:

// Code generated by github.com/karalabe/ssz. DO NOT EDIT.

package main

import "github.com/karalabe/ssz"

// SizeSSZ returns the total size of the static ssz object.
func (obj *Withdrawal) SizeSSZ() uint32 {
    return 8 + 8 + 20 + 8
}

// DefineSSZ defines how an object is encoded/decoded.
func (obj *Withdrawal) DefineSSZ(codec *ssz.Codec) {
    ssz.DefineUint64(codec, &obj.Index)        // Field  (0) -     Index -  8 bytes
    ssz.DefineUint64(codec, &obj.Validator)    // Field  (1) - Validator -  8 bytes
    ssz.DefineStaticBytes(codec, &obj.Address) // Field  (2) -   Address - 20 bytes
    ssz.DefineUint64(codec, &obj.Amount)       // Field  (3) -    Amount -  8 bytes
}

It has everything we would have written ourselves: SizeSSZ and DefineSSZ... and it also has a lot of useful comments we for sure wouldn't have written outselves. Generator for the win!

Ok, but this was too easy. All the fields of the Withdrawal object were primitive types of known lengths, so there's no heavy lifting involved at all. Lets take a look at a juicier example.

Explicit field sizes

For our complex test, lets pick our dynamic ExecutionPayload type from before, but lets make it as hard as it gets and remove all size information from the Go types (e.g. instead of using [32]byte, we can make it extra hard by using []byte only).

Now, obviously, if we were to write serialization code by hand, we'd take advantage of our knowledge of what each of these fields is semantically, so we could provide the necessary sizes for a decoder to use. If we want to, however, generate the serialization code, we need to share all that "insider-knowledge" with the code generator somehow.

The standard way in Go world is through struct tags. Specifically in the context of this library, it will be through the ssz-size and ssz-max tags. These follow the convention set previously by other Go SSZ libraries;

type ExecutionPayload struct {
    ParentHash    []byte        `ssz-size:"32"`
    FeeRecipient  []byte        `ssz-size:"32"`
    StateRoot     []byte        `ssz-size:"20"`
    ReceiptsRoot  []byte        `ssz-size:"32"`
    LogsBloom     []byte        `ssz-size:"256"`
    PrevRandao    []byte        `ssz-size:"32"`
    BlockNumber   uint64
    GasLimit      uint64
    GasUsed       uint64
    Timestamp     uint64
    ExtraData     []byte        `ssz-max:"32"`
    BaseFeePerGas *uint256.Int
    BlockHash     []byte        `ssz-size:"32"`
    Transactions  [][]byte      `ssz-max:"1048576,1073741824"`
    Withdrawals   []*Withdrawal `ssz-max:"16"`
}

Calling the generator as before, just with the ExecutionPayload yields the below, fork-enhanced code:

// Code generated by github.com/karalabe/ssz. DO NOT EDIT.

package main

import "github.com/karalabe/ssz"

// SizeSSZ returns either the static size of the object if fixed == true, or
// the total size otherwise.
func (obj *ExecutionPayload) SizeSSZ(fixed bool) uint32 {
    var size = uint32(32 + 32 + 20 + 32 + 256 + 32 + 8 + 8 + 8 + 8 + 4 + 32 + 32 + 4 + 4)
    if fixed {
        return size
    }
    size += ssz.SizeDynamicBytes(obj.ExtraData)
    size += ssz.SizeSliceOfDynamicBytes(obj.Transactions)
    size += ssz.SizeSliceOfStaticObjects(obj.Withdrawals)

    return size
}

// DefineSSZ defines how an object is encoded/decoded.
func (obj *ExecutionPayload) DefineSSZ(codec *ssz.Codec) {
    // Define the static data (fields and dynamic offsets)
    ssz.DefineCheckedStaticBytes(codec, &obj.ParentHash, 32)                           // Field  ( 0) -    ParentHash -  32 bytes
    ssz.DefineCheckedStaticBytes(codec, &obj.FeeRecipient, 32)                         // Field  ( 1) -  FeeRecipient -  32 bytes
    ssz.DefineCheckedStaticBytes(codec, &obj.StateRoot, 20)                            // Field  ( 2) -     StateRoot -  20 bytes
    ssz.DefineCheckedStaticBytes(codec, &obj.ReceiptsRoot, 32)                         // Field  ( 3) -  ReceiptsRoot -  32 bytes
    ssz.DefineCheckedStaticBytes(codec, &obj.LogsBloom, 256)                           // Field  ( 4) -     LogsBloom - 256 bytes
    ssz.DefineCheckedStaticBytes(codec, &obj.PrevRandao, 32)                           // Field  ( 5) -    PrevRandao -  32 bytes
    ssz.DefineUint64(codec, &obj.BlockNumber)                                          // Field  ( 6) -   BlockNumber -   8 bytes
    ssz.DefineUint64(codec, &obj.GasLimit)                                             // Field  ( 7) -      GasLimit -   8 bytes
    ssz.DefineUint64(codec, &obj.GasUsed)                                              // Field  ( 8) -       GasUsed -   8 bytes
    ssz.DefineUint64(codec, &obj.Timestamp)                                            // Field  ( 9) -     Timestamp -   8 bytes
    ssz.DefineDynamicBytesOffset(codec, &obj.ExtraData, 32)                            // Offset (10) -     ExtraData -   4 bytes
    ssz.DefineUint256(codec, &obj.BaseFeePerGas)                                       // Field  (11) - BaseFeePerGas -  32 bytes
    ssz.DefineCheckedStaticBytes(codec, &obj.BlockHash, 32)                            // Field  (12) -     BlockHash -  32 bytes
    ssz.DefineSliceOfDynamicBytesOffset(codec, &obj.Transactions, 1048576, 1073741824) // Offset (13) -  Transactions -   4 bytes
    ssz.DefineSliceOfStaticObjectsOffset(codec, &obj.Withdrawals, 16)                  // Offset (14) -   Withdrawals -   4 bytes

    // Define the dynamic data (fields)
    ssz.DefineDynamicBytesContent(codec, &obj.ExtraData, 32)                            // Field  (10) -     ExtraData - ? bytes
    ssz.DefineSliceOfDynamicBytesContent(codec, &obj.Transactions, 1048576, 1073741824) // Field  (13) -  Transactions - ? bytes
    ssz.DefineSliceOfStaticObjectsContent(codec, &obj.Withdrawals, 16)                  // Field  (14) -   Withdrawals - ? bytes
}

Points of interests to note:

Cross-validated field sizes

We've seen that the size of a field can either be deduced automatically, or it can be provided to the generator explicitly. But what happens if we provide an ssz struct tag for a field of known size?

type Withdrawal struct {
    Index     uint64   `ssz-size:"8"`
    Validator uint64   `ssz-size:"8"`
    Address   [20]byte `ssz-size:"32"` // Deliberately wrong tag size
    Amount    uint64   `ssz-size:"8"`
}
go run github.com/karalabe/ssz/cmd/sszgen --type Withdrawal

failed to validate field Withdrawal.Address: array of byte basic type tag conflict: field is 20 bytes, tag wants [32] bytes

The code generator will take into consideration the information in both the field's Go type and the struct tag, and will cross validate them against each other. If there's a size conflict, it will abort the code generation.

This functionality can be very helpful in detecting refactor issues, where the user changes the type of a field, which would result in a different encoding. By having the field tagged with an ssz-size, such an error would be detected.

As such, we'd recommend always tagging all SSZ encoded fields with their sizes. It results in both safer code and self-documenting code.

Monolithic types

This library supports monolithic types that encode differently based on what fork the codec is operating in. Naturally, that is a perfect example of something that would be useful to be able to generate, and indeed, can do.

type ExecutionPayloadMonolith struct {
    ParentHash    Hash
    FeeRecipient  Address
    StateRoot     Hash
    ReceiptsRoot  Hash
    LogsBloom     LogsBloom
    PrevRandao    Hash
    BlockNumber   uint64
    GasLimit      uint64
    GasUsed       uint64
    Timestamp     uint64
    ExtraData     []byte       `ssz-max:"32"`
    BaseFeePerGas *uint256.Int
    BlockHash     Hash
    Transactions  [][]byte      `ssz-max:"1048576,1073741824"`
    Withdrawals   []*Withdrawal `ssz-max:"16" ssz-fork:"shanghai"`
    BlobGasUsed   *uint64       `             ssz-fork:"cancun"`
    ExcessBlobGas *uint64       `             ssz-fork:"cancun"`
}

Calling the generator as before, just with the ExecutionPayloadMonolith yields the below, much more interesting code:

// Code generated by github.com/karalabe/ssz. DO NOT EDIT.

package main

import "github.com/karalabe/ssz"

// SizeSSZ returns either the static size of the object if fixed == true, or
// the total size otherwise.
func (obj *ExecutionPayloadMonolith) SizeSSZ(sizer *ssz.Sizer, fixed bool) (size uint32) {
    size = 32 + 20 + 32 + 32 + 256 + 32 + 8 + 8 + 8 + 8 + 4 + 32 + 32 + 4
    if sizer.Fork() >= ssz.ForkShanghai {
        size += 4
    }
    if sizer.Fork() >= ssz.ForkCancun {
        size += 8 + 8
    }
    if fixed {
        return size
    }
    size += ssz.SizeDynamicBytes(sizer, obj.ExtraData)
    size += ssz.SizeSliceOfDynamicBytes(sizer, obj.Transactions)
    if sizer.Fork() >= ssz.ForkShanghai {
        size += ssz.SizeSliceOfStaticObjects(sizer, obj.Withdrawals)
    }
    return size
}

// DefineSSZ defines how an object is encoded/decoded.
func (obj *ExecutionPayloadMonolith) DefineSSZ(codec *ssz.Codec) {
    // Define the static data (fields and dynamic offsets)
    ssz.DefineStaticBytes(codec, &obj.ParentHash)                                                                    // Field  ( 0) -    ParentHash -  32 bytes
    ssz.DefineStaticBytes(codec, &obj.FeeRecipient)                                                                  // Field  ( 1) -  FeeRecipient -  20 bytes
    ssz.DefineStaticBytes(codec, &obj.StateRoot)                                                                     // Field  ( 2) -     StateRoot -  32 bytes
    ssz.DefineStaticBytes(codec, &obj.ReceiptsRoot)                                                                  // Field  ( 3) -  ReceiptsRoot -  32 bytes
    ssz.DefineStaticBytes(codec, &obj.LogsBloom)                                                                     // Field  ( 4) -     LogsBloom - 256 bytes
    ssz.DefineStaticBytes(codec, &obj.PrevRandao)                                                                    // Field  ( 5) -    PrevRandao -  32 bytes
    ssz.DefineUint64(codec, &obj.BlockNumber)                                                                        // Field  ( 6) -   BlockNumber -   8 bytes
    ssz.DefineUint64(codec, &obj.GasLimit)                                                                           // Field  ( 7) -      GasLimit -   8 bytes
    ssz.DefineUint64(codec, &obj.GasUsed)                                                                            // Field  ( 8) -       GasUsed -   8 bytes
    ssz.DefineUint64(codec, &obj.Timestamp)                                                                          // Field  ( 9) -     Timestamp -   8 bytes
    ssz.DefineDynamicBytesOffset(codec, &obj.ExtraData, 32)                                                          // Offset (10) -     ExtraData -   4 bytes
    ssz.DefineUint256(codec, &obj.BaseFeePerGas)                                                                     // Field  (11) - BaseFeePerGas -  32 bytes
    ssz.DefineStaticBytes(codec, &obj.BlockHash)                                                                     // Field  (12) -     BlockHash -  32 bytes
    ssz.DefineSliceOfDynamicBytesOffset(codec, &obj.Transactions, 1048576, 1073741824)                               // Offset (13) -  Transactions -   4 bytes
    ssz.DefineSliceOfStaticObjectsOffsetOnFork(codec, &obj.Withdrawals, 16, ssz.ForkFilter{Added: ssz.ForkShanghai}) // Offset (14) -   Withdrawals -   4 bytes
    ssz.DefineUint64PointerOnFork(codec, &obj.BlobGasUsed, ssz.ForkFilter{Added: ssz.ForkCancun})                    // Field  (15) -   BlobGasUsed -   8 bytes
    ssz.DefineUint64PointerOnFork(codec, &obj.ExcessBlobGas, ssz.ForkFilter{Added: ssz.ForkCancun})                  // Field  (16) - ExcessBlobGas -   8 bytes

    // Define the dynamic data (fields)
    ssz.DefineDynamicBytesContent(codec, &obj.ExtraData, 32)                                                          // Field  (10) -     ExtraData - ? bytes
    ssz.DefineSliceOfDynamicBytesContent(codec, &obj.Transactions, 1048576, 1073741824)                               // Field  (13) -  Transactions - ? bytes
    ssz.DefineSliceOfStaticObjectsContentOnFork(codec, &obj.Withdrawals, 16, ssz.ForkFilter{Added: ssz.ForkShanghai}) // Field  (14) -   Withdrawals - ? bytes
}

To explicitly highlight, the ssz-fork tags have been extracted from the struct definition and mapped into both an updated SizeSSZ method as well as a new definition style in DefineSSZ.

Do note, this type (or anything embedding it) will require the OnFork versions of ssz.Encode, ssz.Decode, ssz.Hash and ssz.Size to be called, since naturally it relies on a correct fork being set in the codec's context.

Lastly, whilst the library itself supports custom fork enums, there is no support yet for these in the code generator. This will probably be added eventually via a --forks=mypkg or similar CLI flag, but it's a TODO for now.

Go generate

Perhaps just a mention, anyone using the code generator should call it from a go:generate compile instruction. It is much simpler and once added to the code, it can always be called via running go generate.

Multi-type ordering

When generating code for multiple types at once (with one call or many), there's one ordering issue you need to be aware of.

When the code generator finds a field that is a struct of some sort, it needs to decide if it's a static or a dynamic type. To do that, it relies on checking if the type implements the ssz.StaticObject or ssz.DynamicObject interface. If if doesn't implement either, the generator will error.

This means, however, that if you have a type that's embedded in another type (e.g. in our examples above, Withdrawal was embedded inside ExecutionPayload in a slice), you need to generate the code for the inner type first, and then the outer type. This ensures that when the outer type is resolving the interface of the inner one, that is already generated and available.

Merkleization

Half the SSZ spec is about encoding/decoding data into a binary format, the other half is about proving the data via Merkle Proofs.

Symmetric API

The same way that encoding/decoding has a "symmetric" and "asymmetric" API, so does merkleization. What's more, the symmetric API is actually exactly the same as for encoding/decoding, with no code changes necessary!

Taking our very simple Withdrawal type and it's codec code:

type Address [20]byte

type Withdrawal struct {
    Index     uint64
    Validator uint64
    Address   Address
    Amount    uint64
}

func (w *Withdrawal) SizeSSZ() uint32 { return 44 }
func (w *Withdrawal) DefineSSZ(codec *ssz.Codec) {
    ssz.DefineUint64(codec, &w.Index)        // Field (0) - Index          -  8 bytes
    ssz.DefineUint64(codec, &w.Validator)    // Field (1) - ValidatorIndex -  8 bytes
    ssz.DefineStaticBytes(codec, &w.Address) // Field (2) - Address        - 20 bytes
    ssz.DefineUint64(codec, &w.Amount)       // Field (3) - Amount         -  8 bytes
}

Hashing this works out of the box. To merkleize the above Withdrawal and calculate it's merkel trie root, use either ssz.HashSequential or ssz.HashConcurrent. The former will run on a single thread and use 0 allocations, whereas the latter might run on multiple threads concurrently (if large enough fields are present) and use O(1) memory.

func main() {
    hash := ssz.HashSequential(new(Withdrawal))
    fmt.Printf("hash: %#x\n", hash)
}

Asymmetric API

If for some reason you have a type that requires custom encoders/decoders, high chance, that it will also require a custom hasher. For those cases, this library provides an API surface very similar to how the asymmetric encoding/decoding worked:

func (w *Withdrawal) DefineSSZ(codec *ssz.Codec) {
    codec.DefineEncoder(func(enc *ssz.Encoder) {
        ssz.EncodeUint64(enc, w.Index)         // Field (0) - Index          -  8 bytes
        ssz.EncodeUint64(enc, w.Validator)     // Field (1) - ValidatorIndex -  8 bytes
        ssz.EncodeStaticBytes(enc, &w.Address) // Field (2) - Address        - 20 bytes
        ssz.EncodeUint64(enc, w.Amount)        // Field (3) - Amount         -  8 bytes
    })
    codec.DefineDecoder(func(dec *ssz.Decoder) {
        ssz.DecodeUint64(dec, &w.Index)        // Field (0) - Index          -  8 bytes
        ssz.DecodeUint64(dec, &w.Validator)    // Field (1) - ValidatorIndex -  8 bytes
        ssz.DecodeStaticBytes(dec, &w.Address) // Field (2) - Address        - 20 bytes
        ssz.DecodeUint64(dec, &w.Amount)       // Field (3) - Amount         -  8 bytes
    })
    codec.DefineHasher(func(has *ssz.Hasher) {
        ssz.HashUint64(has, w.Index)         // Field (0) - Index          -  8 bytes
        ssz.HashUint64(has, w.Validator)     // Field (1) - ValidatorIndex -  8 bytes
        ssz.HashStaticBytes(has, &w.Address) // Field (2) - Address        - 20 bytes
        ssz.HashUint64(has, w.Amount)        // Field (3) - Amount         -  8 bytes
    })
}

Hashing the above Withdrawal into a Merkle trie root, you use the same thing as before. Everything is seamless.

Quick reference

The table below is a summary of the methods available for SizeSSZ and DefineSSZ:

If some type you need is missing, please open an issue, so it can be added.

Type Size API Symmetric API Asymmetric Encoding Asymmetric Decoding Asymmetric Hashing
bool 1 byte DefineBool EncodeBool DecodeBool HashBool
uint8 1 bytes DefineUint8 EncodeUint8 DecodeUint8 HashUint8
uint16 2 bytes DefineUint16 EncodeUint16 DecodeUint16 HashUint16
uint32 4 bytes DefineUint32 EncodeUint32 DecodeUint32 HashUint32
uint64 8 bytes DefineUint64 EncodeUint64 DecodeUint64 HashUint64
[N]byte as bitvector[N] N bytes DefineArrayOfBits EncodeArrayOfBits DecodeArrayOfBits HashArrayOfBits
bitfield.Bitlist² SizeSliceOfBits DefineSliceOfBitsOffset DefineSliceOfBitsContent EncodeSliceOfBitsOffset EncodeSliceOfBitsContent DecodeSliceOfBitsOffset DecodeSliceOfBitsContent HashSliceOfBits
[N]uint64 N * 8 bytes DefineArrayOfUint64s EncodeArrayOfUint64s DecodeArrayOfUint64s HashArrayOfUint64s
[]uint64 SizeSliceOfUint64s DefineSliceOfUint64sOffset DefineSliceOfUint64sContent EncodeSliceOfUint64sOffset EncodeSliceOfUint64sContent DecodeSliceOfUint64sOffset DecodeSliceOfUint64sContent HashSliceOfUint64s
*uint256.Int¹ 32 bytes DefineUint256 EncodeUint256 DecodeUint256 HashUint256
*big.Int as uint256 32 bytes DefineUint256BigInt EncodeUint256BigInt DecodeUint256BigInt HashUint256BigInt
[N]byte N bytes DefineStaticBytes EncodeStaticBytes DecodeStaticBytes HashStaticBytes
[N]byte in []byte N bytes DefineCheckedStaticBytes EncodeCheckedStaticBytes DecodeCheckedStaticBytes HashCheckedStaticBytes
[]byte SizeDynamicBytes DefineDynamicBytesOffset DefineDynamicBytesContent EncodeDynamicBytesOffset EncodeDynamicBytesContent DecodeDynamicBytesOffset DecodeDynamicBytesContent HashDynamicBytes
[M][N]byte M * N bytes DefineArrayOfStaticBytes EncodeArrayOfStaticBytes DecodeArrayOfStaticBytes HashArrayOfStaticBytes
[M][N]byte in [][N]byte M * N bytes DefineCheckedArrayOfStaticBytes EncodeCheckedArrayOfStaticBytes DecodeCheckedArrayOfStaticBytes HashCheckedArrayOfStaticBytes
[][N]byte SizeSliceOfStaticBytes DefineSliceOfStaticBytesOffset DefineSliceOfStaticBytesContent EncodeSliceOfStaticBytesOffset EncodeSliceOfStaticBytesContent DecodeSliceOfStaticBytesOffset DecodeSliceOfStaticBytesContent HashSliceOfStaticBytes
[][]byte SizeSliceOfDynamicBytes DefineSliceOfDynamicBytesOffset DefineSliceOfDynamicBytesContent EncodeSliceOfDynamicBytesOffset EncodeSliceOfDynamicBytesContent DecodeSliceOfDynamicBytesOffset DecodeSliceOfDynamicBytesContent HashSliceOfDynamicBytes
ssz.StaticObject Object(nil).SizeSSZ() DefineStaticObject EncodeStaticObject DecodeStaticObject HashStaticObject
[]ssz.StaticObject SizeSliceOfStaticObjects DefineSliceOfStaticObjectsOffset DefineSliceOfStaticObjectsContent EncodeSliceOfStaticObjectsOffset EncodeSliceOfStaticObjectsContent DecodeSliceOfStaticObjectsOffset DecodeSliceOfStaticObjectsContent HashSliceOfStaticObjects
ssz.DynamicObject SizeDynamicObject DefineDynamicBytesOffset DefineDynamicBytesContent EncodeDynamicBytesOffset EncodeDynamicBytesContent DecodeDynamicBytesOffset DecodeDynamicBytesContent HashDynamicBytes
[]ssz.DynamicObject SizeSliceOfDynamicObjects DefineSliceOfDynamicObjectsOffset DefineSliceOfDynamicObjectsContent EncodeSliceOfDynamicObjectsOffset EncodeSliceOfDynamicObjectsContent DecodeSliceOfDynamicObjectsOffset DecodeSliceOfDynamicObjectsContent HashSliceOfDynamicObjects

¹Type is from github.com/holiman/uint256. \ ²Type is from github.com/prysmaticlabs/go-bitfield.

Performance

The goal of this package is to be close in performance to low level generated encoders, without sacrificing maintainability. It should, however, be significantly faster than runtime reflection encoders.

The package includes a set of benchmarks for handling the beacon spec types and test datasets. You can run them with go test ./tests --bench=.. These can be interesting for some baseline numbers, but they are unrealistic with regard to live beacon state data.

If you want to see the performance on a more realistic piece of data, you'll need to provide a beacon state SSZ object and place it into the project root named state.ssz. You can then run go test --bench=Mainnet ./tests/manual_test.go to explicitly run this one benchmark. A sample output running against a 208MB state export from around June 11, 2024, on a MacBook Pro M2 Max:

go test --bench=Mainnet ./tests/manual_test.go

BenchmarkMainnetState/beacon-state/208757379-bytes/encode-12                  26      45164494 ns/op    4622.16 MB/s          74 B/op          0 allocs/op
BenchmarkMainnetState/beacon-state/208757379-bytes/decode-12                  27      40984980 ns/op    5093.51 MB/s     8456490 B/op      54910 allocs/op
BenchmarkMainnetState/beacon-state/208757379-bytes/merkleize-sequential-12     2     659472250 ns/op     316.55 MB/s         904 B/op          1 allocs/op
BenchmarkMainnetState/beacon-state/208757379-bytes/merkleize-concurrent-12     9     113414449 ns/op    1840.66 MB/s       16416 B/op        108 allocs/op