google / flatbuffers

FlatBuffers: Memory Efficient Serialization Library
https://flatbuffers.dev/
Apache License 2.0
23.35k stars 3.25k forks source link

[go]Flatbuffer serialization performance is slow compared to protobuf #5790

Closed debabratadebroy closed 4 years ago

debabratadebroy commented 4 years ago

Env: Golang
Version : go version go1.13.3 linux/amd64

With following IDL files my intention is to measure the serialization speed of Flatbuffer . I am using golang for my analysis

namespace MyFlat;

struct Vertices {
    x : double;
    y  :double;

}
table Polygon  {

    polygons : [Vertices];
}

table Layer {

    polygons : [Polygon];
}
root_type Layer;

Here is the code I have written for calculation

package main

import (
    "MyFlat"
    "fmt"
    "io/ioutil"
    "log"
    "strconv"
    "time"

    flatbuffers "github.com/google/flatbuffers/go"
)

func calculation(size int, vertices int) {
    b := flatbuffers.NewBuilder(0)
    var polyoffset []flatbuffers.UOffsetT

    rawSize := ((16 * vertices) * size) / 1024
    var vec1 flatbuffers.UOffsetT
    var StartedAtMarshal time.Time
    var EndedAtMarshal time.Time
    StartedAtMarshal = time.Now()
    for k := 0; k < size; k++ {

        MyFlat.PolygonStartPolygonsVector(b, vertices)

        for i := 0; i < vertices; i++ {
            MyFlat.CreateVertices(b, 2.0, 2.4)

        }

        vec1 = b.EndVector(vertices)
        MyFlat.PolygonStart(b)
        MyFlat.PolygonAddPolygons(b, vec1)
        polyoffset = append(polyoffset, MyFlat.PolygonEnd(b))

    }

    MyFlat.LayerStartPolygonsVector(b, size)
    for _, offset := range polyoffset {
        b.PrependUOffsetT(offset)
    }
    vec := b.EndVector(size)
    MyFlat.LayerStart(b)
    MyFlat.LayerAddPolygons(b, vec)
    finalOffset := MyFlat.LayerEnd(b)

    b.Finish(finalOffset)
    EndedAtMarshal = time.Now()

SeElaprseTime := EndedAtMarshal.Sub(StartedAtMarshal).String()
    mybyte := b.FinishedBytes()

`   file := "/tmp/myflat_" + strconv.Itoa(size) + ".txt"`
    if err := ioutil.WriteFile(file, mybyte, 0644); err != nil {
        log.Fatalln("Failed to write address book:", err)
      }

     StartedAt := time.Now()

    layer := MyFlat.GetRootAsLayer(mybyte, 0)

     size = layer.PolygonsLength()
    obj := &MyFlat.Polygon{}
    layer.Polygons(obj, 1)

    for i := 0; i < obj.PolygonsLength(); i++ {
        objVertices := &MyFlat.Vertices{}
        obj.Polygons(objVertices, i)
        fmt.Println(objVertices.X(), objVertices.Y())
    }

    EndedAt := time.Now()
    DeElapseTime := EndedAt.Sub(StartedAt).String()
    fmt.Println(size, ",", vertices, ", ", SeElaprseTime, ",", DeElapseTime, ",", (len(mybyte) / 1024), ",", rawSize)
}

func main() {

    data := []int{500000, 1000000, 1500000, 3000000, 8000000}

    for _, size := range data {
        //calculation(size, 5)
        //calculation(size, 10)
        calculation(size, 20)

    }
}

Problem is I find it serialization is quite slow compared to protobuff with similar idl.

For 3M polygons serialization its taking almost 4.1167037s. Where in protobuf its taking half. Deserilization time for flatbuf is very less (in micro sec). In protobuf its quite high. But still if I add both flatbuf performance is lower.

Do you see any optimized way to serialize it. Flatbuffer is having a method createBinaryVector for byte vector but there is no direct way to serialize vector of polygon from a existing a user defined type vector.

Thanks!
dbaileychess commented 4 years ago

I would construct the builder with the anticipated size, as you would avoid the internal buffer from resizing a few times. Log 2 (8,000,000) is ~23, so you are copying large portions of data 23 times unnecessarily.

dbaileychess commented 4 years ago

Also, 8000000 20 (8 + 8) ~= 2.4 GB of data, that is a big contiguous array! I would reduce your test size and use a more precise timer to do the comparisons.

debabratadebroy commented 4 years ago

You can ignore 8m polygons. For me 1m and 3m is very important

dbaileychess commented 4 years ago

Can you try updating the builder default allocation and post the comparison? Even for 1 and 3 mil, that is 20+ doublings.

debabratadebroy commented 4 years ago

I have tried it. Do u suggest any default value

debabratadebroy commented 4 years ago

Actually my question is for 3m polygons with 5 vertices it's taking 1.5 sec. I am using 16 cores 32gb ram. Protobuff is taking half for same volume

dbaileychess commented 4 years ago

Your rawSize variable is close to the size of the default value i would suggest. Just make sure it is in # of bytes.

debabratadebroy commented 4 years ago

I tried with b := flatbuffers.NewBuilder(rawSize * 1024) But dont see much improvement. Its still coming around 1.3 sec for 3M polygons with 5 vertices

debabratadebroy commented 4 years ago

Just to add one more point if same thing I am doing in Java i can see its done within 800 ms for 3M X 5 vertices

aardappel commented 4 years ago

See also my response to the same question here: https://stackoverflow.com/questions/60506304/flatbuffer-serialization-performance-is-slow-compared-to-protobuf/60512615

tsingson commented 4 years ago

Interesting serialization comparison, i try it out tomorrow.

debabratadebroy commented 4 years ago

hi @tsingson are you able to reproduce the same behavior

tsingson commented 4 years ago

Yes, I reproduced this comparison. as result below: flatc is 4, protoc is 1.

test code

test data generate

package generate

import (
    "math/rand"
)

type GenerateData struct {
    X float64
    Y float64
}

func Generate(size int) []GenerateData {
    g := make([]GenerateData, 0)

    for i := 0; i < size; i++ {
        t := GenerateData{
            X: rand.Float64(),
            Y: rand.Float64(),
        }
        g = append(g, t)
    }
    return g
}

main.go

package main

import (
    "fmt"
    "time"

    "github.com/google/flatbuffers/playground/flat"
    "github.com/google/flatbuffers/playground/generate"
    "github.com/google/flatbuffers/playground/proto"
)

func main() {
    StartedAt := time.Now()
    size := 1024 * 1024 * 8
    data := generate.Generate(size)

    DeElapseTime := time.Since(StartedAt).String()
    fmt.Println("size: ", size, "    time: ", DeElapseTime)
    fmt.Println(" len of data: ", len(data))
    fmt.Println("--------------------------------------")

    f := FlatGenerate(data)

    StartedAt = time.Now()
    f.Marshal()

    DeElapseTime = time.Since(StartedAt).String()
    fmt.Println("size: ", size, " flatc   time: ", DeElapseTime)
    fmt.Println(" len of data: ", len(data))
    fmt.Println("--------------------------------------")
    g := ProtoGenerate(data)
    StartedAt = time.Now()

    _, _ = g.Marshal()
    DeElapseTime = time.Since(StartedAt).String()
    fmt.Println("size: ", size, " protoc   time: ", DeElapseTime)
    fmt.Println(" len of data: ", len(data))
    fmt.Println("--------------------------------------")
}

func FlatGenerate(in []generate.GenerateData) *flat.LayerT {
    vlist := make([]*flat.VerticesT, 0)
    for _, v := range in {
        vl := &flat.VerticesT{X: v.X, Y: v.Y}
        vlist = append(vlist, vl)
    }
    pl := &flat.PolygonT{}
    pl.Polygons = vlist
    ly := &flat.LayerT{
        Polygon: []*flat.PolygonT{pl},
    }
    return ly
}

func ProtoGenerate(in []generate.GenerateData) *proto.Layer {
    vlist := make([]*proto.Vertices, 0)
    for _, v := range in {
        vl := &proto.Vertices{X: v.X, Y: v.Y}
        vlist = append(vlist, vl)
    }
    pl := &proto.Polygon{}
    pl.Polygons = vlist
    ly := &proto.Layer{
        Polygon: []*proto.Polygon{pl},
    }
    return ly
}

the f.Marshal() is shortcut , in flatc generated go file layer.go , like this:

func (rcv *LayerT) Marshal() []byte  {
     b := flatbuffers.NewBuilder(0)
     b.Finish(rcv.Pack(b))
    return b.FinishedBytes()
}

proto IDL

syntax = "proto3";
package proto;

message Vertices {
    double x = 1;
    double y = 2;
}

message Polygon {
    repeated Vertices polygons = 1;

}

message Layer {
    repeated Polygon polygon = 1;
}

flatc IDL

// Generated from simple.proto

namespace flat;

table Vertices {
  x:double;
  y:double;
}

table Polygon {
  polygons:[Vertices];
}

table Layer {
  polygon:[Polygon];
}

result:

size:  8388608     time:  592.923046ms
 len of data:  8388608
--------------------------------------
size:  8388608  flatc   time:  1.044466263s
 len of data:  8388608
--------------------------------------
size:  8388608  protoc   time:  249.879179ms
 len of data:  8388608
--------------------------------------

I am currently writing a go verifier for flatc, so I also consider using goroutine or some tech to improve the encoding performance of flatc.

The original flatc encoder in go , it's used a writetable() function to sequentially process the encoding in a loop. I had split the vector / struct / scalar encoding into separate encodings function (it's good for me , so , maybe use parallel processing next round ), and only rewrite the header / root table when the all part in root-table is generated

i m learning more from c++ / c , in active try in go way.

tsingson commented 4 years ago

flatc go encoder CPU pprof ( why flatc go encoder is slow )

ignore main.FlatGenerate, this function is prepare testing data


Flat | Flat% | Sum% | Cum | Cum% | Name | Inlined?
-- | -- | -- | -- | -- | -- | --
2.50s | 26.37% | 26.37% | 2.50s | 26.37% | runtime.memmove |  =============> MARK 1 
0.86s | 9.07% | 35.44% | 4.50s | 47.47% | github.com/google/flatbuffers/benchmark/flat.(*VerticesT).Pack |  
0.50s | 5.27% | 40.72% | 0.50s | 5.27% | runtime.markBits.isMarked | (inline)
0.39s | 4.11% | 44.83% | 1.63s | 17.19% | runtime.scanobject |  
0.38s | 4.01% | 48.84% | 0.78s | 8.23% | math/rand.(*lockedSource).Int63 |  
0.37s | 3.90% | 52.74% | 0.37s | 3.90% | sync.(*Mutex).Unlock | (inline)
0.37s | 3.90% | 56.65% | 0.37s | 3.90% | runtime.memclrNoHeapPointers |  
0.33s | 3.48% | 60.13% | 2.05s | 21.62% | main.FlatGenerate |  
0.24s | 2.53% | 62.66% | 0.34s | 3.59% | runtime.findObject |  
0.22s | 2.32% | 64.98% | 0.22s | 2.32% | runtime.heapBits.bits |  
0.21s | 2.22% | 67.19% | 0.44s | 4.64% | runtime.wbBufFlush1 |  
0.21s | 2.22% | 69.41% | 0.23s | 2.43% | runtime.pageIndexOf | (inline)
0.21s | 2.22% | 71.62% | 0.21s | 2.22% | runtime.(*mspan).init | (inline)
0.21s | 2.22% | 73.84% | 0.21s | 2.22% | github.com/google/flatbuffers/go.WriteUint32 | (inline)
0.21s | 2.22% | 76.05% | 0.44s | 4.64% | github.com/google/flatbuffers/go.(*Builder).WriteVtable |  
0.20s | 2.11% | 78.16% | 0.20s | 2.11% | runtime.nanotime1 |  
0.19s | 2.00% | 80.17% | 0.19s | 2.00% | runtime.(*mspan).refillAllocCache |  
0.19s | 2.00% | 82.17% | 2.89s | 30.49% | github.com/google/flatbuffers/go.(*Builder).Prep |  
0.12s | 1.27% | 83.44% | 0.13s | 1.37% | github.com/google/flatbuffers/go.vtableEqual |  
0.11s | 1.16% | 84.60% | 0.77s | 8.12% | runtime.greyobject |  
0.11s | 1.16% | 85.76% | 0.12s | 1.27% | github.com/google/flatbuffers/go.(*Builder).StartObject |  
0.09s | 0.95% | 86.71% | 0.09s | 0.95% | runtime.heapBitsSetType |  
0.09s | 0.95% | 87.66% | 0.11s | 1.16% | github.com/google/flatbuffers/go.(*Builder).Pad |  
0.08s | 0.84% | 88.50% | 0.09s | 0.95% | runtime.spanOf | (inline)
0.08s | 0.84% | 89.35% | 2.99s | 31.54% | github.com/google/flatbuffers/go.(*Builder).PrependFloat64Slot |  
0.07s | 0.74% | 90.08% | 0.91s | 9.60% | math/rand.Float64 |  
0.07s | 0.74% | 90.82% | 2.90s | 30.59% | github.com/google/flatbuffers/go.(*Builder).PrependFloat64 |  
0.06s | 0.63% | 91.46% | 0.84s | 8.86% | math/rand.(*Rand).Float64 | (inline)
0.06s | 0.63% | 92.09% | 1.58s | 16.67% | github.com/google/flatbuffers/benchmark/generate.Generate |  
0.05s | 0.53% | 92.62% | 2.88s | 30.38% | github.com/google/flatbuffers/benchmark/flat.VerticesAddX | (inline)
0.04s | 0.42% | 93.04% | 0.06s | 0.63% | runtime.stkbucket |  
0.04s | 0.42% | 93.46% | 4.84s | 51.05% | github.com/google/flatbuffers/benchmark/flat.(*PolygonT).Pack |  
0.03s | 0.32% | 93.78% | 0.28s | 2.95% | github.com/google/flatbuffers/go.(*Builder).PrependUOffsetT |  
0.02s | 0.21% | 93.99% | 0.06s | 0.63% | runtime.sweepone |  
0.02s | 0.21% | 94.20% | 0.46s | 4.85% | github.com/google/flatbuffers/benchmark/flat.VerticesEnd |  
0.02s | 0.21% | 94.41% | 0.18s | 1.90% | github.com/google/flatbuffers/benchmark/flat.VerticesAddY | (inline)
0.01s | 0.11% | 94.51% | 0.64s | 6.75% | runtime.newobject |  
0.01s | 0.11% | 94.62% | 1.72s | 18.14% | runtime.mallocgc |  
0.01s | 0.11% | 94.73% | 0.96s | 10.13% | runtime.gcAssistAlloc1 |  
0.01s | 0.11% | 94.83% | 0.48s | 5.06% | runtime.bulkBarrierPreWriteSrcOnly |  
0.01s | 0.11% | 94.94% | 0.22s | 2.32% | github.com/google/flatbuffers/go.(*Builder).PlaceUOffsetT | (inline)
0 | 0.00% | 94.94% | 0.44s | 4.64% | runtime.wbBufFlush.func1 |  
0 | 0.00% | 94.94% | 0.44s | 4.64% | runtime.wbBufFlush |  
0 | 0.00% | 94.94% | 2.55s | 26.90% | runtime.systemstack |  
0 | 0.00% | 94.94% | 0.21s | 2.22% | runtime.sysmon |  
0 | 0.00% | 94.94% | 0.06s | 0.63% | runtime.profilealloc |  
0 | 0.00% | 94.94% | 0.20s | 2.11% | runtime.nanotime | (inline)
0 | 0.00% | 94.94% | 0.21s | 2.22% | runtime.mstart1 |  
0 | 0.00% | 94.94% | 0.25s | 2.64% | runtime.mstart |  
0 | 0.00% | 94.94% | 0.19s | 2.00% | runtime.mallocgc.func1 |  
0 | 0.00% | 94.94% | 0.39s | 4.11% | runtime.makeslice |  
0 | 0.00% | 94.94% | 8.47s | 89.35% | runtime.main |  
0 | 0.00% | 94.94% | 0.06s | 0.63% | runtime.mProf_Malloc |  
0 | 0.00% | 94.94% | 0.19s | 2.00% | runtime.largeAlloc |  
0 | 0.00% | 94.94% | 0.05s | 0.53% | runtime.heapBits.initSpan |  
0 | 0.00% | 94.94% | 2.80s | 29.54% | runtime.growslice |  
0 | 0.00% | 94.94% | 0.95s | 10.02% | runtime.gcDrainN |  
0 | 0.00% | 94.94% | 0.70s | 7.38% | runtime.gcDrain |  
0 | 0.00% | 94.94% | 0.70s | 7.38% | runtime.gcBgMarkWorker.func2 |  
0 | 0.00% | 94.94% | 0.71s | 7.49% | runtime.gcBgMarkWorker |  
0 | 0.00% | 94.94% | 0.96s | 10.13% | runtime.gcAssistAlloc.func1 |  
0 | 0.00% | 94.94% | 0.96s | 10.13% | runtime.gcAssistAlloc |  
0 | 0.00% | 94.94% | 0.12s | 1.27% | runtime.(*mspan).nextFreeIndex |  
0 | 0.00% | 94.94% | 0.27s | 2.85% | runtime.(*mheap).allocSpan |  
0 | 0.00% | 94.94% | 0.27s | 2.85% | runtime.(*mheap).alloc.func1 |  
0 | 0.00% | 94.94% | 0.36s | 3.80% | runtime.(*mheap).alloc |  
0 | 0.00% | 94.94% | 0.24s | 2.53% | runtime.(*mcentral).grow |  
0 | 0.00% | 94.94% | 0.32s | 3.38% | runtime.(*mcentral).cacheSpan |  
0 | 0.00% | 94.94% | 0.32s | 3.38% | runtime.(*mcache).refill |  
0 | 0.00% | 94.94% | 0.44s | 4.64% | runtime.(*mcache).nextFree |  
0 | 0.00% | 94.94% | 0.78s | 8.23% | math/rand.(*Rand).Int63 | (inline)
0 | 0.00% | 94.94% | 8.47s | 89.35% | main.main |  
0 | 0.00% | 94.94% | 0.21s | 2.22% | github.com/google/flatbuffers/go.WriteUOffsetT | (inline)
0 | 0.00% | 94.94% | 2.59s | 27.32% | github.com/google/flatbuffers/go.(*Builder).growByteBuffer | (inline)
0 | 0.00% | 94.94% | 0.08s | 0.84% | github.com/google/flatbuffers/go.(*Builder).PrependSOffsetT |  
0 | 0.00% | 94.94% | 0.44s | 4.64% | github.com/google/flatbuffers/go.(*Builder).EndObject | (inline)
0 | 0.00% | 94.94% | 0.12s | 1.27% | github.com/google/flatbuffers/benchmark/flat.VerticesStart | (inline)
0 | 0.00% | 94.94% | 4.84s | 51.05% | github.com/google/flatbuffers/benchmark/flat.(*LayerT).Pack |  

flatc go encoder MEM pprof

ignore main.FlatGenerate, this function is prepare testing data

Flat | Flat% | Sum% | Cum | Cum% | Name | Inlined?
-- | -- | -- | -- | -- | -- | --
845.48MB | 62.05% | 62.05% | 845.48MB | 62.05% | main.FlatGenerate |  
388.62MB | 28.52% | 90.57% | 388.62MB | 28.52% | github.com/google/flatbuffers/go.(*Builder).growByteBuffer | (inline)  =====================>  ********  MARK 2
128MB | 9.39% | 99.96% | 516.62MB | 37.91% | github.com/google/flatbuffers/benchmark/flat.(*PolygonT).Pack |  
0 | 0.00% | 99.96% | 1362.09MB | 99.96% | runtime.main |  
0 | 0.00% | 99.96% | 1362.09MB | 99.96% | main.main |  
0 | 0.00% | 99.96% | 388.62MB | 28.52% | github.com/google/flatbuffers/go.(*Builder).PrependFloat64Slot |  
0 | 0.00% | 99.96% | 388.62MB | 28.52% | github.com/google/flatbuffers/go.(*Builder).PrependFloat64 |  
0 | 0.00% | 99.96% | 388.62MB | 28.52% | github.com/google/flatbuffers/go.(*Builder).Prep |  
0 | 0.00% | 99.96% | 388.62MB | 28.52% | github.com/google/flatbuffers/benchmark/flat.VerticesAddX | (inline)
0 | 0.00% | 99.96% | 388.62MB | 28.52% | github.com/google/flatbuffers/benchmark/flat.(*VerticesT).Pack |  
0 | 0.00% | 99.96% | 516.62MB | 37.91% | github.com/google/flatbuffers/benchmark/flat.(*LayerT).Pack |  

flatc go encoder ( *Builder)
Pre-allocated memory 388.62MB Actual memory used 128MB

some code


func (b *Builder) Prep(size, additionalBytes int) {
    // Track the biggest thing we've ever aligned to.
    if size > b.minalign {
        b.minalign = size
    }
    // Find the amount of alignment needed such that `size` is properly
    // aligned after `additionalBytes`:
    alignSize := (^(len(b.Bytes) - int(b.Head()) + additionalBytes)) + 1
    alignSize &= (size - 1)

    // Reallocate the buffer if needed:
    for int(b.head) <= alignSize+size+additionalBytes {
        oldBufSize := len(b.Bytes)
        b.growByteBuffer()  // ==============================>  *******  MARK 3 
        b.head += UOffsetT(len(b.Bytes) - oldBufSize)
    }
    b.Pad(alignSize)
}

func (b *Builder) growByteBuffer() {
    if (int64(len(b.Bytes)) & int64(0xC0000000)) != 0 {
        panic("cannot grow buffer beyond 2 gigabytes")
    }
    newLen := len(b.Bytes) * 2 // ==========================>  ********** MARK 4 
    if newLen == 0 {
        newLen = 1
    }

    if cap(b.Bytes) >= newLen {
        b.Bytes = b.Bytes[:newLen]
    } else {
        extension := make([]byte, newLen-len(b.Bytes))
        b.Bytes = append(b.Bytes, extension...)
    }

    middle := newLen / 2
    copy(b.Bytes[middle:], b.Bytes[:middle])   // ===============>  ******* MARK 5
}

check CPU pprof, MARK 1 , go runtime.memmove internal function called by copy() , MARK 5, that is to much copy() been called --------- so, it's slow down.

check MEM pprof, MARK 2 , flatc use double pre-allocated memory MARK 4, and copy old binary data to new allocate memory.

so, look like a probrem that make flatc go encoder slow down.

i have some idea but not a perfect solution, so, any suggestion / solution is welcome.

@rw , @somombo

rw commented 4 years ago

Quick thoughts:

1) Have you tried re-using the FlatBufferBuilder using its Reset function? That will re-use the scratch space, which makes it "warm". Right now, you are using a builder that has no scratch space, so it must grow it each time.

2) Perhaps more importantly, I'll note that the codes in this discussion provided by @tsingson use the newer "object" FlatBuffers API (which is more ergonomic, but requires allocations). In my experience with golang, the many small allocations you are making, one for each of your "Vertices" type, probably puts a lot of memory pressure on golang's GC. Here is the code I'm referring to:

    vlist := make([]*flat.VerticesT, 0)
    for _, v := range in {
        vl := &flat.VerticesT{X: v.X, Y: v.Y}
        vlist = append(vlist, vl)
    }

My understanding is that this code allocates 8 1024 1024 VerticesT objects on the heap. It may be better to allocate them using the flat (traditional) FlatBuffers types used in the initial post: "Vertices", not "VerticesT".

3) Finally, given that @debabratadebroy is serializing three million Polygon objects in the same FlatBuffer, and those objects are Tables (in the FlatBuffers nomenclature), it's possible that our vtable deduplicator algorithm is causing you problems. Therefore, the next experiment for one of you to run is to see how long serialization takes with increasing numbers of Polygons being serialized: 1, 2, 4, 8, 16, ..., 8 1024 1024. If the serialization takes much more than linear time, then this is the culprit and we can suggest ways to work around it. (In Rust, we've had interest in providing a pluggable deduplicator implementation, and it might make sense to add that to Go, too).

debabratadebroy commented 4 years ago

I can do it for sure and share you the details

debabratadebroy commented 4 years ago

Just attaching few analysis already i did for 1000 to 3M polygons with MeasurementForgit.xlsx

different set of vertices

tsingson commented 4 years ago

@rw , thanks for your comments.

Quick thoughts:

  1. Have you tried re-using the FlatBufferBuilder using its Reset function? That will re-use the scratch space, which makes it "warm". Right now, you are using a builder that has no scratch space, so it must grow it each time.

------- yes. it's should be do like this.

  1. Perhaps more importantly, I'll note that the codes in this discussion provided by @tsingson use the newer "object" FlatBuffers API (which is more ergonomic, but requires allocations). In my experience with golang, the many small allocations you are making, one for each of your "Vertices" type, probably puts a lot of memory pressure on golang's GC. Here is the code I'm referring to:
  vlist := make([]*flat.VerticesT, 0)
  for _, v := range in {
      vl := &flat.VerticesT{X: v.X, Y: v.Y}
      vlist = append(vlist, vl)
  }

My understanding is that this code allocates 8 1024 1024 VerticesT objects on the heap. It may be better to allocate them using the flat (traditional) FlatBuffers types used in the initial post: "Vertices", not "VerticesT".

------- good point, i should improve the test data generate function ( read test data from a file........ ) , and use traditional "Vertices" but not "VerticesT"

  1. Finally, given that @debabratadebroy is serializing three million Polygon objects in the same FlatBuffer, and those objects are Tables (in the FlatBuffers nomenclature), it's possible that our vtable deduplicator algorithm is causing you problems. Therefore, the next experiment for one of you to run is to see how long serialization takes with increasing numbers of Polygons being serialized: 1, 2, 4, 8, 16, ..., 8 1024 1024. If the serialization takes much more than linear time, then this is the culprit and we can suggest ways to work around it. (In Rust, we've had interest in providing a pluggable deduplicator implementation, and it might make sense to add that to Go, too).

------- i will try a serious test again.

more detail here

i use flatc --proto to transfer .proto to .fbs:

syntax = "proto3";
package proto;

message Vertices {
    double x = 1;
    double y = 2;
}

message Polygon {
    repeated Vertices polygons = 1;

}

message Layer {
    repeated Polygon polygon = 1;
}

message Vertices in protoc's message should turn to flatc's struct.

tsingson commented 4 years ago

Just attaching few analysis already i did for 1000 to 3M polygons with MeasurementForgit.xlsx

different set of vertices

thanks for share.

i 'm try more serious test and try pprof for sure.

rw commented 4 years ago

Can you share the numbers in plaintext, instead of a spreadsheet?

debabratadebroy commented 4 years ago

MeasurementForgit.txt

rw commented 4 years ago

@debabratadebroy All of those times seem pretty fast, could you point out where the slowness is? Also, it would be more convenient if you included the text in a comment here instead of attaching it as a file.

debabratadebroy commented 4 years ago

If you see in every case flatbuffer serialization time is quite high compared to protobuf.

1000000 5 proto takes 143.342215ms for serialization and 250.809553 ms for de-serialization while flatbuffer took 428.257128 ms 3000000 10 proto takes 526.430084 737.654052 while flatbuffer took 2635.236248ms

And more surprisingly in java i see better result with flatbuffer. But with go results are very high

rw commented 4 years ago

@debabratadebroy What I see in your text file is this:

Polygon size,Num of vertices,Seri Time prot,Deseri Time proto,Seri Time flat,Deseri Time flat
10000,5,1.810929,2.73897,4.840526, 630ns 
10000,10,2.699907,2.865603,8.128324, 621ns 
10000,20,3.178384,3.791683,13.805347, 499ns 
100000,5,17.988453,29.931799,54.262156, 1.479æs 
100000,10,20.953287,24.290156,89.144263, 731ns 
100000,20,26.784921,35.707918,150.023134, 701ns 
500000,5,85.069564,102.150938,205.527666, 759ns 
500000,10,95.372696,125.155915,374.518318, 668ns 
500000,20,142.959145,171.535879,699.386822, 788ns 
1000000,5,143.342215,250.809553,428.257128, 731ns 
1000000,10,175.240556,254.896566,713.370745, 720ns 
1000000,20,280.443402,351.806425,1150, 762ns 
3000000,5,443.883073,658.381558,1551.458035, 837ns 
3000000,10,526.430084,737.654052,2635.236248, 813ns 
3000000,20,724.353883,1026.814232,3518.614021, 957ns 
5000000,5,714.807835,1274.749182,2014.221521, 1.006æs 
5000000,10,979.767504,1512.634871,3623.323316, 970ns 
5000000,20,1452.606876,1687.081408,,

Perhaps it's an encoding issue with your spreadsheet, but I see mostly nanosecond (ns) measurements. Is this correct?

Edit: I was confused by the header. Are the timing measurements in milliseconds, except for the last field in each row?

debabratadebroy commented 4 years ago

Yes you are right , Sorry to plot a graph actually I have removed milliseconds. Last parameter is about de-serialization of flatbuffer which is in nano second. I dont have issue with it.

github-actions[bot] commented 4 years ago

This issue is stale because it has been open 6 months with no activity. Please comment or this will be closed in 14 days.

danthegoodman1 commented 4 months ago

Is this still the case? Don't want to shoot myself in the foot by trying flatbuffers in go if protobuf is faster