karalabe / ssz

Opinionated 0-alloc SSZ codec for Go
https://github.com/ethereum/consensus-specs/blob/dev/ssz/simple-serialize.md
BSD 3-Clause "New" or "Revised" License
39 stars 7 forks source link

Possible speedup #3

Open holiman opened 3 months ago

holiman commented 3 months ago

This might speed up the int-slice encoder a bit

diff --git a/encoder.go b/encoder.go
index 3f3a264..42725f0 100644
--- a/encoder.go
+++ b/encoder.go
@@ -198,11 +198,23 @@ func EncodeSliceOfUint64sOffset[T ~uint64](enc *Encoder, ns []T) {
 // EncodeSliceOfUint64sContent is the lazy data writer for EncodeSliceOfUint64sOffset.
 func EncodeSliceOfUint64sContent[T ~uint64](enc *Encoder, ns []T) {
    if enc.outWriter != nil {
-       for _, n := range ns {
+       // Our enc.buf fits four uint64, so we encode four at a time and write once
+       i := 0
+       for ; i < len(ns)-4; i += 4 {
+           if enc.err != nil {
+               return
+           }
+           binary.LittleEndian.PutUint64(enc.buf[:], (uint64)(ns[i]))
+           binary.LittleEndian.PutUint64(enc.buf[8:], (uint64)(ns[i+1]))
+           binary.LittleEndian.PutUint64(enc.buf[16:], (uint64)(ns[i+2]))
+           binary.LittleEndian.PutUint64(enc.buf[24:], (uint64)(ns[i+3]))
+           _, enc.err = enc.outWriter.Write(enc.buf[:])
+       }
+       for ; i < len(ns); i++ {
            if enc.err != nil {
                return
            }
-           binary.LittleEndian.PutUint64(enc.buf[:8], (uint64)(n))
+           binary.LittleEndian.PutUint64(enc.buf[:8], (uint64)(ns[i]))
            _, enc.err = enc.outWriter.Write(enc.buf[:8])
        }
    } else {

The current benchmarkers do not hit this codepath, however, afaict the only type using it is IndexedAttestation, and the benchmark-case uses a AttestationIndices with length 2, so doesn't improve anything. It might be a speedup in case we're encoding a larger list of ints, but yeah, might be totally moot and not worth it. Feel free to close.

karalabe commented 3 months ago

Yes, the test cases from the specs repo are a bit moot for optimising.