whyrusleeping / cbor-gen

Codegen for cbor codecs on your types
MIT License
35 stars 25 forks source link

Considering byte slice interfaces #20

Open whyrusleeping opened 4 years ago

whyrusleeping commented 4 years ago

I have been considering switching to a byte array based interface for a while, but was trying to see how much perf I could scrape out of the Reader/Writer interfaces in go before giving up.

I think we can still squeeze another 20-30% out of the current interfaces, but going beyond that will get really tricky. The main problems here are centered around two main things, Allocations, and interfaces (and they are related).

A major issue i'm having trying to optimize further is dealing with how go decides to put things on the stack vs the heap. Usually, go's escape analysis does a good job in determining if something escapes, but it gives up if an argument is passed through an interface. So:

var w Writer = new(bytes.Buffer)
w.Write([]byte("foo"))

allocates []byte("foo") on the heap, where:

b := new(bytes.Buffer)
b.Write([]byte("foo"))

allocates it on the stack.

This results in a large number of allocations that should just be on the stack, and thus really fast, to be placed on the heap, and cause more resource consumption.

Additionally, something as simple as looking at the next byte in the stream is very hard to do performantly when dealing with Reader interfaces. Doing:

var b [1]byte
r.Read(b[:])

causes a heap allocation for previously mentioned reasons. So we might want to check if our underlying reader supports ReadByte (like the bytes.Buffer does). We can do that with an interface assertion, but when we start doing that, guess what? the cost of that type assertion starts showing up in performance profiles. As i'm writing this, I have a pprof trace open that shows assertI2I2 taking 1.5% of total filecoin sync time. And just under half of the total time spent in CborReadHeader is spent asserting interfaces.

So, go just keeps getting in my way here. And making it really hard to use their most famous and highly pushed interfaces for code that needs super high performance.

I'm considering now adding new interfaces that work over byte arrays, designed with allocation optimization in mind:

type CborAble interface {
    EncodeCBOR([]byte) ([]byte, error)
    DecodeCBOR([]byte) error
    EncodedLen() int
}

This interface will allow you to pass in your own buffers to save on allocations, and also provides a method to check how many bytes actually need to be allocated to serialize this object (this can actually be computed really efficiently). The encode method also returns the final byte buffer, similar to append, in case it needed to allocate more bytes during the call (and also allowing the caller to just pass in a big slice, and have the subslice of the correct length returned)

With this interface, i'm predicting that I can get a 2-3x improvement in marshal/unmarshal perf (Note: cborgen is already quite a bit faster than any other CBOR library i've tested, even with the reader/writer interfaces)