Closed hectorj closed 8 years ago
This has been tried before, and demonstrated worse performance. See :https://github.com/tinylib/msgp/pull/72
You shouldn't have to address your locals to call a method; in the expression a.foo()
, foo
can take a
as a pointer receiver. Conversely, if you're addressing the struct in order to cast it to an interface, it's already being boxed, so it will live on the heap regardless of the method receiver. In any case, escape analysis should rarely (perhaps never?) conclude that the MarshalMsg
or Size
methods cause the receiver to escape.
Your point about escape analysis seems right.
So it is mostly about convenience: at some place in my code I take an i interface{}
, and later have a type switch checking if i
is msgp.Encodable
.
With pointer receivers if I pass just my struct, it is not. I have to pass a pointer to my struct, which doesn't fit my needs.
About performances, I see @zond said
Never mind, I just made it all work with my branch, and my first trivial timing of the new performance showed worse performance.
But running the benchmarks in master, performances do not seem to be affected. (I only get some small variance going both ways, which was expected):
$ go version
go version go1.5.3 linux/amd64
# master
$ git rev-parse HEAD
cf4d6d402b01d9b359f52fc88be0f582402177c0
$ go install ./...
$ go generate ./...
======== MessagePack Code Generator =======
>>> Input: "defs_test.go"
>>> Wrote and formatted "defgen_test.go"
$ go test -v -cpu=2 ./... -bench .
# [All tests pass]
PASS
BenchmarkLocate-2 20000000 97.8 ns/op 531.83 MB/s 0 B/op 0 allocs/op
BenchmarkReadWriteFloat32-2 20000000 80.0 ns/op
BenchmarkReadWriteFloat64-2 20000000 82.3 ns/op
BenchmarkUnmarshalAsJSON-2 1000000 1698 ns/op 93.59 MB/s 16 B/op 1 allocs/op
BenchmarkCopyToJSON-2 1000000 1989 ns/op 79.92 MB/s 48 B/op 1 allocs/op
BenchmarkStdlibJSON-2 200000 5783 ns/op 29.40 MB/s 920 B/op 36 allocs/op
BenchmarkReadMapHeaderBytes-2 200000000 8.32 ns/op 360.43 MB/s 0 B/op 0 allocs/op
BenchmarkReadArrayHeaderBytes-2 200000000 7.61 ns/op 394.27 MB/s 0 B/op 0 allocs/op
BenchmarkReadNilByte-2 1000000000 2.72 ns/op 367.11 MB/s 0 B/op 0 allocs/op
BenchmarkReadFloat64Bytes-2 200000000 9.20 ns/op 978.47 MB/s 0 B/op 0 allocs/op
BenchmarkReadFloat32Bytes-2 300000000 5.27 ns/op 948.08 MB/s 0 B/op 0 allocs/op
BenchmarkReadBoolBytes-2 200000000 6.24 ns/op 160.23 MB/s 0 B/op 0 allocs/op
BenchmarkReadTimeBytes-2 100000000 16.3 ns/op 920.73 MB/s 0 B/op 0 allocs/op
BenchmarkSkipBytes-2 10000000 164 ns/op 908.50 MB/s 0 B/op 0 allocs/op
BenchmarkReadMapHeader-2 100000000 20.9 ns/op 95.71 MB/s 0 B/op 0 allocs/op
BenchmarkReadArrayHeader-2 100000000 20.6 ns/op 96.94 MB/s 0 B/op 0 allocs/op
BenchmarkReadNil-2 100000000 16.9 ns/op 59.09 MB/s 0 B/op 0 allocs/op
BenchmarkReadFloat64-2 50000000 25.6 ns/op 351.97 MB/s 0 B/op 0 allocs/op
BenchmarkReadFloat32-2 100000000 22.2 ns/op 225.27 MB/s 0 B/op 0 allocs/op
BenchmarkReadInt64-2 100000000 24.8 ns/op 161.34 MB/s 0 B/op 0 allocs/op
BenchmarkReadUint64-2 50000000 24.8 ns/op 80.63 MB/s 0 B/op 0 allocs/op
BenchmarkRead16Bytes-2 30000000 40.1 ns/op 448.85 MB/s 0 B/op 0 allocs/op
BenchmarkRead256Bytes-2 20000000 100 ns/op 2574.03 MB/s 0 B/op 0 allocs/op
BenchmarkRead2048Bytes-2 3000000 510 ns/op 4014.73 MB/s 0 B/op 0 allocs/op
BenchmarkRead16StringAsBytes-2 30000000 40.2 ns/op 422.87 MB/s 0 B/op 0 allocs/op
BenchmarkRead256StringAsBytes-2 20000000 107 ns/op 2418.58 MB/s 0 B/op 0 allocs/op
BenchmarkRead16String-2 10000000 127 ns/op 133.53 MB/s 16 B/op 1 allocs/op
BenchmarkRead256String-2 5000000 266 ns/op 971.61 MB/s 256 B/op 1 allocs/op
BenchmarkReadComplex64-2 50000000 31.5 ns/op 317.30 MB/s 0 B/op 0 allocs/op
BenchmarkReadComplex128-2 50000000 38.7 ns/op 464.71 MB/s 0 B/op 0 allocs/op
BenchmarkReadTime-2 30000000 38.9 ns/op 385.45 MB/s 0 B/op 0 allocs/op
BenchmarkSkip-2 5000000 413 ns/op 360.76 MB/s 0 B/op 0 allocs/op
BenchmarkAppendMapHeader-2 200000000 8.02 ns/op 0 B/op 0 allocs/op
BenchmarkAppendArrayHeader-2 200000000 7.92 ns/op 0 B/op 0 allocs/op
BenchmarkAppendFloat64-2 100000000 12.2 ns/op 736.77 MB/s 0 B/op 0 allocs/op
BenchmarkAppendFloat32-2 100000000 10.4 ns/op 479.46 MB/s 0 B/op 0 allocs/op
BenchmarkAppendInt64-2 100000000 18.6 ns/op 0 B/op 0 allocs/op
BenchmarkAppendUint64-2 100000000 19.3 ns/op 0 B/op 0 allocs/op
BenchmarkAppend16Bytes-2 100000000 18.2 ns/op 1156.18 MB/s 0 B/op 0 allocs/op
BenchmarkAppend256Bytes-2 50000000 27.2 ns/op 9584.15 MB/s 0 B/op 0 allocs/op
BenchmarkAppend2048Bytes-2 10000000 109 ns/op 18743.55 MB/s 0 B/op 0 allocs/op
BenchmarkAppend16String-2 100000000 16.7 ns/op 1254.41 MB/s 0 B/op 0 allocs/op
BenchmarkAppend256String-2 50000000 28.5 ns/op 9154.87 MB/s 0 B/op 0 allocs/op
BenchmarkAppend2048String-2 20000000 81.2 ns/op 25293.70 MB/s 0 B/op 0 allocs/op
BenchmarkAppendBool-2 300000000 3.60 ns/op 277.61 MB/s 0 B/op 0 allocs/op
BenchmarkAppendTime-2 50000000 24.2 ns/op 620.99 MB/s 0 B/op 0 allocs/op
BenchmarkWriteMapHeader-2 200000000 8.64 ns/op 0 B/op 0 allocs/op
BenchmarkWriteArrayHeader-2 200000000 8.97 ns/op 0 B/op 0 allocs/op
BenchmarkWriteFloat64-2 100000000 14.4 ns/op 624.26 MB/s 0 B/op 0 allocs/op
BenchmarkWriteFloat32-2 100000000 12.2 ns/op 408.34 MB/s 0 B/op 0 allocs/op
BenchmarkWriteInt64-2 100000000 13.8 ns/op 652.07 MB/s 0 B/op 0 allocs/op
BenchmarkWriteUint64-2 100000000 13.9 ns/op 648.52 MB/s 0 B/op 0 allocs/op
BenchmarkWrite16Bytes-2 50000000 23.5 ns/op 0 B/op 0 allocs/op
BenchmarkWrite256Bytes-2 50000000 33.1 ns/op 0 B/op 0 allocs/op
BenchmarkWrite2048Bytes-2 20000000 86.8 ns/op 0 B/op 0 allocs/op
BenchmarkWriteTime-2 50000000 24.6 ns/op 610.03 MB/s 0 B/op 0 allocs/op
BenchmarkWriteReadFile-2 2000000 766 ns/op 105.71 MB/s
ok github.com/tinylib/msgp/msgp 374.252s
# PR 156
$ git checkout avoid-pointers-receivers
Switched to branch 'avoid-pointers-receivers'
Your branch is up-to-date with 'fork/avoid-pointers-receivers'.
$ git rev-parse HEAD
4416ec38a88dcd4b55b36ff34d92950d684edc1f
$ go install ./...
$ go generate ./...
======== MessagePack Code Generator =======
>>> Input: "defs_test.go"
>>> Wrote and formatted "defgen_test.go"
$ go test -v -cpu=2 ./... -bench .
# [All tests pass]
PASS
BenchmarkLocate-2 20000000 97.9 ns/op 531.23 MB/s 0 B/op 0 allocs/op
BenchmarkReadWriteFloat32-2 20000000 85.8 ns/op
BenchmarkReadWriteFloat64-2 20000000 81.5 ns/op
BenchmarkUnmarshalAsJSON-2 1000000 1891 ns/op 84.05 MB/s 16 B/op 1 allocs/op
BenchmarkCopyToJSON-2 1000000 2206 ns/op 72.05 MB/s 48 B/op 1 allocs/op
BenchmarkStdlibJSON-2 200000 7715 ns/op 22.03 MB/s 920 B/op 36 allocs/op
BenchmarkReadMapHeaderBytes-2 200000000 8.41 ns/op 356.77 MB/s 0 B/op 0 allocs/op
BenchmarkReadArrayHeaderBytes-2 200000000 7.77 ns/op 386.24 MB/s 0 B/op 0 allocs/op
BenchmarkReadNilByte-2 500000000 2.95 ns/op 339.48 MB/s 0 B/op 0 allocs/op
BenchmarkReadFloat64Bytes-2 200000000 9.36 ns/op 961.76 MB/s 0 B/op 0 allocs/op
BenchmarkReadFloat32Bytes-2 300000000 5.46 ns/op 916.55 MB/s 0 B/op 0 allocs/op
BenchmarkReadBoolBytes-2 200000000 6.66 ns/op 150.10 MB/s 0 B/op 0 allocs/op
BenchmarkReadTimeBytes-2 100000000 16.1 ns/op 930.48 MB/s 0 B/op 0 allocs/op
BenchmarkSkipBytes-2 10000000 171 ns/op 868.05 MB/s 0 B/op 0 allocs/op
BenchmarkReadMapHeader-2 100000000 20.3 ns/op 98.31 MB/s 0 B/op 0 allocs/op
BenchmarkReadArrayHeader-2 100000000 20.6 ns/op 96.97 MB/s 0 B/op 0 allocs/op
BenchmarkReadNil-2 100000000 16.6 ns/op 60.16 MB/s 0 B/op 0 allocs/op
BenchmarkReadFloat64-2 50000000 27.5 ns/op 327.17 MB/s 0 B/op 0 allocs/op
BenchmarkReadFloat32-2 50000000 24.0 ns/op 208.24 MB/s 0 B/op 0 allocs/op
BenchmarkReadInt64-2 50000000 35.9 ns/op 111.41 MB/s 0 B/op 0 allocs/op
BenchmarkReadUint64-2 50000000 27.7 ns/op 72.26 MB/s 0 B/op 0 allocs/op
BenchmarkRead16Bytes-2 30000000 49.9 ns/op 360.56 MB/s 0 B/op 0 allocs/op
BenchmarkRead256Bytes-2 10000000 137 ns/op 1878.40 MB/s 0 B/op 0 allocs/op
BenchmarkRead2048Bytes-2 2000000 547 ns/op 3743.05 MB/s 0 B/op 0 allocs/op
BenchmarkRead16StringAsBytes-2 30000000 46.2 ns/op 368.24 MB/s 0 B/op 0 allocs/op
BenchmarkRead256StringAsBytes-2 10000000 124 ns/op 2075.45 MB/s 0 B/op 0 allocs/op
BenchmarkRead16String-2 10000000 110 ns/op 154.20 MB/s 16 B/op 1 allocs/op
BenchmarkRead256String-2 5000000 256 ns/op 1010.92 MB/s 256 B/op 1 allocs/op
BenchmarkReadComplex64-2 50000000 37.3 ns/op 267.92 MB/s 0 B/op 0 allocs/op
BenchmarkReadComplex128-2 30000000 49.4 ns/op 364.74 MB/s 0 B/op 0 allocs/op
BenchmarkReadTime-2 50000000 40.7 ns/op 368.44 MB/s 0 B/op 0 allocs/op
BenchmarkSkip-2 3000000 398 ns/op 374.23 MB/s 0 B/op 0 allocs/op
BenchmarkAppendMapHeader-2 200000000 7.87 ns/op 0 B/op 0 allocs/op
BenchmarkAppendArrayHeader-2 200000000 7.82 ns/op 0 B/op 0 allocs/op
BenchmarkAppendFloat64-2 100000000 11.7 ns/op 768.19 MB/s 0 B/op 0 allocs/op
BenchmarkAppendFloat32-2 200000000 9.69 ns/op 515.96 MB/s 0 B/op 0 allocs/op
BenchmarkAppendInt64-2 100000000 20.2 ns/op 0 B/op 0 allocs/op
BenchmarkAppendUint64-2 100000000 18.9 ns/op 0 B/op 0 allocs/op
BenchmarkAppend16Bytes-2 100000000 19.2 ns/op 1093.52 MB/s 0 B/op 0 allocs/op
BenchmarkAppend256Bytes-2 50000000 27.5 ns/op 9494.18 MB/s 0 B/op 0 allocs/op
BenchmarkAppend2048Bytes-2 20000000 109 ns/op 18832.78 MB/s 0 B/op 0 allocs/op
BenchmarkAppend16String-2 100000000 16.5 ns/op 1275.48 MB/s 0 B/op 0 allocs/op
BenchmarkAppend256String-2 50000000 26.6 ns/op 9803.78 MB/s 0 B/op 0 allocs/op
BenchmarkAppend2048String-2 20000000 84.7 ns/op 24227.11 MB/s 0 B/op 0 allocs/op
BenchmarkAppendBool-2 300000000 3.59 ns/op 278.51 MB/s 0 B/op 0 allocs/op
BenchmarkAppendTime-2 50000000 23.9 ns/op 626.37 MB/s 0 B/op 0 allocs/op
BenchmarkWriteMapHeader-2 200000000 8.52 ns/op 0 B/op 0 allocs/op
BenchmarkWriteArrayHeader-2 200000000 8.88 ns/op 0 B/op 0 allocs/op
BenchmarkWriteFloat64-2 100000000 13.9 ns/op 645.32 MB/s 0 B/op 0 allocs/op
BenchmarkWriteFloat32-2 100000000 11.3 ns/op 443.03 MB/s 0 B/op 0 allocs/op
BenchmarkWriteInt64-2 100000000 14.4 ns/op 624.81 MB/s 0 B/op 0 allocs/op
BenchmarkWriteUint64-2 100000000 13.1 ns/op 686.05 MB/s 0 B/op 0 allocs/op
BenchmarkWrite16Bytes-2 100000000 22.0 ns/op 0 B/op 0 allocs/op
BenchmarkWrite256Bytes-2 50000000 31.8 ns/op 0 B/op 0 allocs/op
BenchmarkWrite2048Bytes-2 20000000 84.9 ns/op 0 B/op 0 allocs/op
BenchmarkWriteTime-2 50000000 28.0 ns/op 535.39 MB/s 0 B/op 0 allocs/op
BenchmarkWriteReadFile-2 2000000 730 ns/op 110.88 MB/s
ok github.com/tinylib/msgp/msgp 300.308s
(To show that the small differences in my 2 benchmarks are just variance, I did a second run with #156 : )
PASS
BenchmarkLocate-2 20000000 97.2 ns/op 534.81 MB/s 0 B/op 0 allocs/op
BenchmarkReadWriteFloat32-2 20000000 80.6 ns/op
BenchmarkReadWriteFloat64-2 20000000 81.2 ns/op
BenchmarkUnmarshalAsJSON-2 1000000 1702 ns/op 93.37 MB/s 16 B/op 1 allocs/op
BenchmarkCopyToJSON-2 1000000 2000 ns/op 79.47 MB/s 48 B/op 1 allocs/op
BenchmarkStdlibJSON-2 200000 5663 ns/op 30.02 MB/s 920 B/op 36 allocs/op
BenchmarkReadMapHeaderBytes-2 200000000 7.80 ns/op 384.40 MB/s 0 B/op 0 allocs/op
BenchmarkReadArrayHeaderBytes-2 200000000 8.34 ns/op 359.69 MB/s 0 B/op 0 allocs/op
BenchmarkReadNilByte-2 1000000000 2.84 ns/op 352.63 MB/s 0 B/op 0 allocs/op
BenchmarkReadFloat64Bytes-2 200000000 9.26 ns/op 971.95 MB/s 0 B/op 0 allocs/op
BenchmarkReadFloat32Bytes-2 300000000 5.34 ns/op 936.68 MB/s 0 B/op 0 allocs/op
BenchmarkReadBoolBytes-2 200000000 6.70 ns/op 149.27 MB/s 0 B/op 0 allocs/op
BenchmarkReadTimeBytes-2 100000000 16.5 ns/op 908.66 MB/s 0 B/op 0 allocs/op
BenchmarkSkipBytes-2 10000000 168 ns/op 884.07 MB/s 0 B/op 0 allocs/op
BenchmarkReadMapHeader-2 100000000 21.2 ns/op 94.43 MB/s 0 B/op 0 allocs/op
BenchmarkReadArrayHeader-2 100000000 20.8 ns/op 96.05 MB/s 0 B/op 0 allocs/op
BenchmarkReadNil-2 100000000 17.1 ns/op 58.40 MB/s 0 B/op 0 allocs/op
BenchmarkReadFloat64-2 50000000 35.4 ns/op 254.51 MB/s 0 B/op 0 allocs/op
BenchmarkReadFloat32-2 50000000 35.3 ns/op 141.69 MB/s 0 B/op 0 allocs/op
BenchmarkReadInt64-2 50000000 23.2 ns/op 172.10 MB/s 0 B/op 0 allocs/op
BenchmarkReadUint64-2 100000000 25.0 ns/op 80.02 MB/s 0 B/op 0 allocs/op
BenchmarkRead16Bytes-2 20000000 52.1 ns/op 345.16 MB/s 0 B/op 0 allocs/op
BenchmarkRead256Bytes-2 20000000 113 ns/op 2271.98 MB/s 0 B/op 0 allocs/op
BenchmarkRead2048Bytes-2 3000000 468 ns/op 4379.13 MB/s 0 B/op 0 allocs/op
BenchmarkRead16StringAsBytes-2 30000000 46.2 ns/op 367.59 MB/s 0 B/op 0 allocs/op
BenchmarkRead256StringAsBytes-2 20000000 135 ns/op 1913.54 MB/s 0 B/op 0 allocs/op
BenchmarkRead16String-2 10000000 119 ns/op 142.13 MB/s 16 B/op 1 allocs/op
BenchmarkRead256String-2 5000000 312 ns/op 827.54 MB/s 256 B/op 1 allocs/op
BenchmarkReadComplex64-2 50000000 37.4 ns/op 267.07 MB/s 0 B/op 0 allocs/op
BenchmarkReadComplex128-2 50000000 42.5 ns/op 423.25 MB/s 0 B/op 0 allocs/op
BenchmarkReadTime-2 30000000 41.4 ns/op 362.65 MB/s 0 B/op 0 allocs/op
BenchmarkSkip-2 3000000 435 ns/op 341.87 MB/s 0 B/op 0 allocs/op
BenchmarkAppendMapHeader-2 100000000 11.0 ns/op 0 B/op 0 allocs/op
BenchmarkAppendArrayHeader-2 100000000 11.7 ns/op 0 B/op 0 allocs/op
BenchmarkAppendFloat64-2 100000000 19.0 ns/op 472.99 MB/s 0 B/op 0 allocs/op
BenchmarkAppendFloat32-2 100000000 15.4 ns/op 325.56 MB/s 0 B/op 0 allocs/op
BenchmarkAppendInt64-2 50000000 33.7 ns/op 0 B/op 0 allocs/op
BenchmarkAppendUint64-2 50000000 29.4 ns/op 0 B/op 0 allocs/op
BenchmarkAppend16Bytes-2 100000000 21.1 ns/op 993.56 MB/s 0 B/op 0 allocs/op
BenchmarkAppend256Bytes-2 50000000 27.7 ns/op 9420.12 MB/s 0 B/op 0 allocs/op
BenchmarkAppend2048Bytes-2 20000000 110 ns/op 18523.14 MB/s 0 B/op 0 allocs/op
BenchmarkAppend16String-2 100000000 25.3 ns/op 829.57 MB/s 0 B/op 0 allocs/op
BenchmarkAppend256String-2 50000000 33.1 ns/op 7886.48 MB/s 0 B/op 0 allocs/op
BenchmarkAppend2048String-2 20000000 98.0 ns/op 20952.67 MB/s 0 B/op 0 allocs/op
BenchmarkAppendBool-2 300000000 4.31 ns/op 231.91 MB/s 0 B/op 0 allocs/op
BenchmarkAppendTime-2 50000000 26.1 ns/op 574.90 MB/s 0 B/op 0 allocs/op
BenchmarkWriteMapHeader-2 200000000 9.63 ns/op 0 B/op 0 allocs/op
BenchmarkWriteArrayHeader-2 200000000 9.30 ns/op 0 B/op 0 allocs/op
BenchmarkWriteFloat64-2 100000000 15.2 ns/op 593.26 MB/s 0 B/op 0 allocs/op
BenchmarkWriteFloat32-2 100000000 11.7 ns/op 427.58 MB/s 0 B/op 0 allocs/op
BenchmarkWriteInt64-2 100000000 15.0 ns/op 601.99 MB/s 0 B/op 0 allocs/op
BenchmarkWriteUint64-2 100000000 14.6 ns/op 615.61 MB/s 0 B/op 0 allocs/op
BenchmarkWrite16Bytes-2 100000000 23.5 ns/op 0 B/op 0 allocs/op
BenchmarkWrite256Bytes-2 50000000 36.7 ns/op 0 B/op 0 allocs/op
BenchmarkWrite2048Bytes-2 20000000 92.0 ns/op 0 B/op 0 allocs/op
BenchmarkWriteTime-2 50000000 29.4 ns/op 509.40 MB/s 0 B/op 0 allocs/op
BenchmarkWriteReadFile-2 2000000 822 ns/op 98.48 MB/s
You need to benchmark the code in ./_generated
explicitly; the go tool
will ignore it otherwise. All of those benchmarks are for library support
code, not generated methods.
On Wed, May 11, 2016 at 9:24 PM, Hector Jusforgues <notifications@github.com
wrote:
(To show that the small differences in my 2 benchmarks are just variance, I did a second run with #156 https://github.com/tinylib/msgp/pull/156 :
PASS BenchmarkLocate-2 20000000 97.2 ns/op 534.81 MB/s 0 B/op 0 allocs/op BenchmarkReadWriteFloat32-2 20000000 80.6 ns/op BenchmarkReadWriteFloat64-2 20000000 81.2 ns/op BenchmarkUnmarshalAsJSON-2 1000000 1702 ns/op 93.37 MB/s 16 B/op 1 allocs/op BenchmarkCopyToJSON-2 1000000 2000 ns/op 79.47 MB/s 48 B/op 1 allocs/op BenchmarkStdlibJSON-2 200000 5663 ns/op 30.02 MB/s 920 B/op 36 allocs/op BenchmarkReadMapHeaderBytes-2 200000000 7.80 ns/op 384.40 MB/s 0 B/op 0 allocs/op BenchmarkReadArrayHeaderBytes-2 200000000 8.34 ns/op 359.69 MB/s 0 B/op 0 allocs/op BenchmarkReadNilByte-2 1000000000 2.84 ns/op 352.63 MB/s 0 B/op 0 allocs/op BenchmarkReadFloat64Bytes-2 200000000 9.26 ns/op 971.95 MB/s 0 B/op 0 allocs/op BenchmarkReadFloat32Bytes-2 300000000 5.34 ns/op 936.68 MB/s 0 B/op 0 allocs/op BenchmarkReadBoolBytes-2 200000000 6.70 ns/op 149.27 MB/s 0 B/op 0 allocs/op BenchmarkReadTimeBytes-2 100000000 16.5 ns/op 908.66 MB/s 0 B/op 0 allocs/op BenchmarkSkipBytes-2 10000000 168 ns/op 884.07 MB/s 0 B/op 0 allocs/op BenchmarkReadMapHeader-2 100000000 21.2 ns/op 94.43 MB/s 0 B/op 0 allocs/op BenchmarkReadArrayHeader-2 100000000 20.8 ns/op 96.05 MB/s 0 B/op 0 allocs/op BenchmarkReadNil-2 100000000 17.1 ns/op 58.40 MB/s 0 B/op 0 allocs/op BenchmarkReadFloat64-2 50000000 35.4 ns/op 254.51 MB/s 0 B/op 0 allocs/op BenchmarkReadFloat32-2 50000000 35.3 ns/op 141.69 MB/s 0 B/op 0 allocs/op BenchmarkReadInt64-2 50000000 23.2 ns/op 172.10 MB/s 0 B/op 0 allocs/op BenchmarkReadUint64-2 100000000 25.0 ns/op 80.02 MB/s 0 B/op 0 allocs/op BenchmarkRead16Bytes-2 20000000 52.1 ns/op 345.16 MB/s 0 B/op 0 allocs/op BenchmarkRead256Bytes-2 20000000 113 ns/op 2271.98 MB/s 0 B/op 0 allocs/op BenchmarkRead2048Bytes-2 3000000 468 ns/op 4379.13 MB/s 0 B/op 0 allocs/op BenchmarkRead16StringAsBytes-2 30000000 46.2 ns/op 367.59 MB/s 0 B/op 0 allocs/op BenchmarkRead256StringAsBytes-2 20000000 135 ns/op 1913.54 MB/s 0 B/op 0 allocs/op BenchmarkRead16String-2 10000000 119 ns/op 142.13 MB/s 16 B/op 1 allocs/op BenchmarkRead256String-2 5000000 312 ns/op 827.54 MB/s 256 B/op 1 allocs/op BenchmarkReadComplex64-2 50000000 37.4 ns/op 267.07 MB/s 0 B/op 0 allocs/op BenchmarkReadComplex128-2 50000000 42.5 ns/op 423.25 MB/s 0 B/op 0 allocs/op BenchmarkReadTime-2 30000000 41.4 ns/op 362.65 MB/s 0 B/op 0 allocs/op BenchmarkSkip-2 3000000 435 ns/op 341.87 MB/s 0 B/op 0 allocs/op BenchmarkAppendMapHeader-2 100000000 11.0 ns/op 0 B/op 0 allocs/op BenchmarkAppendArrayHeader-2 100000000 11.7 ns/op 0 B/op 0 allocs/op BenchmarkAppendFloat64-2 100000000 19.0 ns/op 472.99 MB/s 0 B/op 0 allocs/op BenchmarkAppendFloat32-2 100000000 15.4 ns/op 325.56 MB/s 0 B/op 0 allocs/op BenchmarkAppendInt64-2 50000000 33.7 ns/op 0 B/op 0 allocs/op BenchmarkAppendUint64-2 50000000 29.4 ns/op 0 B/op 0 allocs/op BenchmarkAppend16Bytes-2 100000000 21.1 ns/op 993.56 MB/s 0 B/op 0 allocs/op BenchmarkAppend256Bytes-2 50000000 27.7 ns/op 9420.12 MB/s 0 B/op 0 allocs/op BenchmarkAppend2048Bytes-2 20000000 110 ns/op 18523.14 MB/s 0 B/op 0 allocs/op BenchmarkAppend16String-2 100000000 25.3 ns/op 829.57 MB/s 0 B/op 0 allocs/op BenchmarkAppend256String-2 50000000 33.1 ns/op 7886.48 MB/s 0 B/op 0 allocs/op BenchmarkAppend2048String-2 20000000 98.0 ns/op 20952.67 MB/s 0 B/op 0 allocs/op BenchmarkAppendBool-2 300000000 4.31 ns/op 231.91 MB/s 0 B/op 0 allocs/op BenchmarkAppendTime-2 50000000 26.1 ns/op 574.90 MB/s 0 B/op 0 allocs/op BenchmarkWriteMapHeader-2 200000000 9.63 ns/op 0 B/op 0 allocs/op BenchmarkWriteArrayHeader-2 200000000 9.30 ns/op 0 B/op 0 allocs/op BenchmarkWriteFloat64-2 100000000 15.2 ns/op 593.26 MB/s 0 B/op 0 allocs/op BenchmarkWriteFloat32-2 100000000 11.7 ns/op 427.58 MB/s 0 B/op 0 allocs/op BenchmarkWriteInt64-2 100000000 15.0 ns/op 601.99 MB/s 0 B/op 0 allocs/op BenchmarkWriteUint64-2 100000000 14.6 ns/op 615.61 MB/s 0 B/op 0 allocs/op BenchmarkWrite16Bytes-2 100000000 23.5 ns/op 0 B/op 0 allocs/op BenchmarkWrite256Bytes-2 50000000 36.7 ns/op 0 B/op 0 allocs/op BenchmarkWrite2048Bytes-2 20000000 92.0 ns/op 0 B/op 0 allocs/op BenchmarkWriteTime-2 50000000 29.4 ns/op 509.40 MB/s 0 B/op 0 allocs/op BenchmarkWriteReadFile-2 2000000 822 ns/op 98.48 MB/s
— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/tinylib/msgp/issues/155#issuecomment-218657054
You'll also take a perf hit when you turn a value into interface{}
because it will have to be both copied and boxed.
On Wed, May 11, 2016 at 9:42 PM, Philip Hofer phofer@umich.edu wrote:
You need to benchmark the code in
./_generated
explicitly; the go tool will ignore it otherwise. All of those benchmarks are for library support code, not generated methods.On Wed, May 11, 2016 at 9:24 PM, Hector Jusforgues < notifications@github.com> wrote:
(To show that the small differences in my 2 benchmarks are just variance, I did a second run with #156 https://github.com/tinylib/msgp/pull/156 :
PASS BenchmarkLocate-2 20000000 97.2 ns/op 534.81 MB/s 0 B/op 0 allocs/op BenchmarkReadWriteFloat32-2 20000000 80.6 ns/op BenchmarkReadWriteFloat64-2 20000000 81.2 ns/op BenchmarkUnmarshalAsJSON-2 1000000 1702 ns/op 93.37 MB/s 16 B/op 1 allocs/op BenchmarkCopyToJSON-2 1000000 2000 ns/op 79.47 MB/s 48 B/op 1 allocs/op BenchmarkStdlibJSON-2 200000 5663 ns/op 30.02 MB/s 920 B/op 36 allocs/op BenchmarkReadMapHeaderBytes-2 200000000 7.80 ns/op 384.40 MB/s 0 B/op 0 allocs/op BenchmarkReadArrayHeaderBytes-2 200000000 8.34 ns/op 359.69 MB/s 0 B/op 0 allocs/op BenchmarkReadNilByte-2 1000000000 2.84 ns/op 352.63 MB/s 0 B/op 0 allocs/op BenchmarkReadFloat64Bytes-2 200000000 9.26 ns/op 971.95 MB/s 0 B/op 0 allocs/op BenchmarkReadFloat32Bytes-2 300000000 5.34 ns/op 936.68 MB/s 0 B/op 0 allocs/op BenchmarkReadBoolBytes-2 200000000 6.70 ns/op 149.27 MB/s 0 B/op 0 allocs/op BenchmarkReadTimeBytes-2 100000000 16.5 ns/op 908.66 MB/s 0 B/op 0 allocs/op BenchmarkSkipBytes-2 10000000 168 ns/op 884.07 MB/s 0 B/op 0 allocs/op BenchmarkReadMapHeader-2 100000000 21.2 ns/op 94.43 MB/s 0 B/op 0 allocs/op BenchmarkReadArrayHeader-2 100000000 20.8 ns/op 96.05 MB/s 0 B/op 0 allocs/op BenchmarkReadNil-2 100000000 17.1 ns/op 58.40 MB/s 0 B/op 0 allocs/op BenchmarkReadFloat64-2 50000000 35.4 ns/op 254.51 MB/s 0 B/op 0 allocs/op BenchmarkReadFloat32-2 50000000 35.3 ns/op 141.69 MB/s 0 B/op 0 allocs/op BenchmarkReadInt64-2 50000000 23.2 ns/op 172.10 MB/s 0 B/op 0 allocs/op BenchmarkReadUint64-2 100000000 25.0 ns/op 80.02 MB/s 0 B/op 0 allocs/op BenchmarkRead16Bytes-2 20000000 52.1 ns/op 345.16 MB/s 0 B/op 0 allocs/op BenchmarkRead256Bytes-2 20000000 113 ns/op 2271.98 MB/s 0 B/op 0 allocs/op BenchmarkRead2048Bytes-2 3000000 468 ns/op 4379.13 MB/s 0 B/op 0 allocs/op BenchmarkRead16StringAsBytes-2 30000000 46.2 ns/op 367.59 MB/s 0 B/op 0 allocs/op BenchmarkRead256StringAsBytes-2 20000000 135 ns/op 1913.54 MB/s 0 B/op 0 allocs/op BenchmarkRead16String-2 10000000 119 ns/op 142.13 MB/s 16 B/op 1 allocs/op BenchmarkRead256String-2 5000000 312 ns/op 827.54 MB/s 256 B/op 1 allocs/op BenchmarkReadComplex64-2 50000000 37.4 ns/op 267.07 MB/s 0 B/op 0 allocs/op BenchmarkReadComplex128-2 50000000 42.5 ns/op 423.25 MB/s 0 B/op 0 allocs/op BenchmarkReadTime-2 30000000 41.4 ns/op 362.65 MB/s 0 B/op 0 allocs/op BenchmarkSkip-2 3000000 435 ns/op 341.87 MB/s 0 B/op 0 allocs/op BenchmarkAppendMapHeader-2 100000000 11.0 ns/op 0 B/op 0 allocs/op BenchmarkAppendArrayHeader-2 100000000 11.7 ns/op 0 B/op 0 allocs/op BenchmarkAppendFloat64-2 100000000 19.0 ns/op 472.99 MB/s 0 B/op 0 allocs/op BenchmarkAppendFloat32-2 100000000 15.4 ns/op 325.56 MB/s 0 B/op 0 allocs/op BenchmarkAppendInt64-2 50000000 33.7 ns/op 0 B/op 0 allocs/op BenchmarkAppendUint64-2 50000000 29.4 ns/op 0 B/op 0 allocs/op BenchmarkAppend16Bytes-2 100000000 21.1 ns/op 993.56 MB/s 0 B/op 0 allocs/op BenchmarkAppend256Bytes-2 50000000 27.7 ns/op 9420.12 MB/s 0 B/op 0 allocs/op BenchmarkAppend2048Bytes-2 20000000 110 ns/op 18523.14 MB/s 0 B/op 0 allocs/op BenchmarkAppend16String-2 100000000 25.3 ns/op 829.57 MB/s 0 B/op 0 allocs/op BenchmarkAppend256String-2 50000000 33.1 ns/op 7886.48 MB/s 0 B/op 0 allocs/op BenchmarkAppend2048String-2 20000000 98.0 ns/op 20952.67 MB/s 0 B/op 0 allocs/op BenchmarkAppendBool-2 300000000 4.31 ns/op 231.91 MB/s 0 B/op 0 allocs/op BenchmarkAppendTime-2 50000000 26.1 ns/op 574.90 MB/s 0 B/op 0 allocs/op BenchmarkWriteMapHeader-2 200000000 9.63 ns/op 0 B/op 0 allocs/op BenchmarkWriteArrayHeader-2 200000000 9.30 ns/op 0 B/op 0 allocs/op BenchmarkWriteFloat64-2 100000000 15.2 ns/op 593.26 MB/s 0 B/op 0 allocs/op BenchmarkWriteFloat32-2 100000000 11.7 ns/op 427.58 MB/s 0 B/op 0 allocs/op BenchmarkWriteInt64-2 100000000 15.0 ns/op 601.99 MB/s 0 B/op 0 allocs/op BenchmarkWriteUint64-2 100000000 14.6 ns/op 615.61 MB/s 0 B/op 0 allocs/op BenchmarkWrite16Bytes-2 100000000 23.5 ns/op 0 B/op 0 allocs/op BenchmarkWrite256Bytes-2 50000000 36.7 ns/op 0 B/op 0 allocs/op BenchmarkWrite2048Bytes-2 20000000 92.0 ns/op 0 B/op 0 allocs/op BenchmarkWriteTime-2 50000000 29.4 ns/op 509.40 MB/s 0 B/op 0 allocs/op BenchmarkWriteReadFile-2 2000000 822 ns/op 98.48 MB/s
— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/tinylib/msgp/issues/155#issuecomment-218657054
Ok, the benchmarks in _generated/ indeed show a difference. One more allocation for:
I'll see if I can improve that
To be clear, my benchmark was of my own code, using my fork of msgp.
I was unable to compare my code with the fork vs my code with mainline msgp since I was unable to get my code working with mainline.
I didn't think to benchmark msgp in the fork on its own vs mainline.
That's making me wonder if I missed something with https://github.com/mailru/easyjson/pull/15 or if some difference in the implementations makes it efficient with easyjson but not with msgp...
Gotta do some digging
I'm not sure I understand what you mean, but just to make sure I'll clarify even more :)
I benchmarked my own code using https://github.com/vmihailenco/msgpack vs my own code using a fork of msgp that just added shims to some more types.
This benchmark showed the unreasonable result that the code became slower with msgp.
This was unreasonable because msgpack used reflection, while msgp uses generated hard coded coders.
This made me give up and forget all about it.
TL;DR
I don't believe msgp is slower than msgpack, and I don't necessarily think more indirection via pointers or shims in msgp would make things relevantly slower.
@zond: oh, thanks for the clarification. To clarify too, my last comment was not about your observations but about the results I get from my benchmarks run.
The non-pointer-receiver way is less efficient for msgp, but it did not seem to be for easyjson (which does something similar to msgp, just for faster json Marshaling/Unmarshaling)
@hectorj Ah, thanks!
Weird, please explain what caused it if you find out :)
Closing for now, as I haven't been able to produce code with the same features & performances and non-pointers receivers for now, and I don't have enough time to keep trying.
Thanks all for your inputs.
I see from this part of the code that the generator prefer pointer receivers for struct with > 3 fields and arrays.
This seems unnecessary (this methods do not modify the data they operate on, so they do not require a pointer) and is inconvenient (I'd like to be able to Marshal my structs without referencing them, which possibly increases the generated garbage).
Folks at Easyjson accepted my PR after checking benchmarks. Would you accept something similar for the msgp generator?.
Or is there some benchmark showing that the use of a pointer receiver actually improves performances?