Open richardartoul opened 5 years ago
@richardartoul, thanks for opening this issue. I'm happy to discuss how to improve performance in this library.
I'll address these in bullets, corresponding to each bullet you wrote:
int32
,uint32
,int64
,uint64
,float32
,float64
,bool
,string
,[]byte
, pointer to message} (which would be a truly tremendously wide API). Instead, we could just double the current method count and replace interface{}
with *Value
(or Value
, if the union type can be sufficiently compact).codedBuffer
(or maybe wraps the exported proto.Buffer
type?). The entries coming from that stream would be a tuple of "tag", "wire type", and "value" (where value could be int64
or []byte
and then re-interpreted by the consumer, if it has more type information from the tag number).@jhump Regarding #4 correct thats what I would need. I guess my comment was more around whether or not I could introduce such an iterator into this library (so that I could re-use a lot of the logic you've already written).
With recent optimizations the largest bottleneck in my code right now is the Unmarshal()
method and this would allow me to get around that entirely (the package I'm working on basically receives a stream of marshaled protobufs and compresses them in real-time into a custom format so I don't really need the *dynamic.Message
there so much as I need the stream of values
@jhump I'm hoping to get started on #4 soon. To clarify, are you ok with me adding some kind of exported wrapper around codedBuffer
?
Can you not use the exported proto.Buffer
(in the "github.com/golang/protobuf/proto" package)?
If not, I don't think a wrapper belongs in the dynamic
package, since it is not really specific to dynamic messages. Also, it's not a lot of code -- I wonder if it wouldn't be better to just fork. (That's basically what I did -- there was one thing I wanted from proto.Buffer
, but they wouldn't accept a pull request. So I forked it.)
If forking is really that bad of an option, let's first decide where to put this. Maybe a new sub-package in this repo?
I didn't know about proto.Buffer
and I think I'll have the same problem as you, I'll need something like eof()
so I can loop until I hit the end.
Forking isn't terrible, just seems like I'll be copying a lot of your code, but maybe it won't be so bad.
How about I start by forking / copying and pasting and see how far I get and then if I reach a point where I think what I have is worth upstreaming or would be beneficial to this library we can talk about it then?
It seems like you might be right though.
@richardartoul, SGTM
I totally understand not wanting to fork. (I made the same appeal to the protobuf project.) I just hesitate to make this repo the home for that. That being said, I'm not necessarily opposed to it. I just want to make sure it's been thought through before adding it.
@richardartoul, I think I may be flip-flopping regarding a public API for the binary protobuf format. I now have need for something similar, too, outside of this repo. So I'm now thinking of the same thing you were: wanting a single, shared public API to use.
So I've created a candidate for this exported API with #196. I need to write some more tests before I could merge that PR. I'm also still debating whether this is the right place for such a package. I think I'll start with following my own advice about forking it in my other project and see how that goes. That will give some time to think about this API a bit and decide whether to merge a new codec
package or to close the PR and just stick with a fork...
If you're curious about the API I've proposed, take a look at the PR and let me know what you think. The codec.Buffer
could possibly could use some more methods; i.e. there may be some logic still inside of the dynamic
package (for marshaling/unmarshaling dynamic messages) that could be moved to that package.
I took a look. It seems like mostly you've just made the buffer public (and a few new APIs like EncodeMessage
). Is that correct? I could definitely use that.
For context on what I'm doing with it, this is my P.R: https://github.com/m3db/m3/pull/1597
Basically buffer.go
is copy-pasta of your buffer and then custom_unmarshal.go
is where I implement the "efficient" API for unmarshaling (mainly can decode top-level scalar fields with no allocations).
There is also an explanation in unmarshal.md
I tried to implement it as an iterator at first but it didn't work out since my use-cases requires knowing if the stream is completely valid before doing anything else with it since rolling back is difficult
I have a couple more ideas for performance improvements and am wondering how you feel about all of the following:
m.values
could become[]interface{}
m.GetInt64FieldByFieldNumber(fieldNum)
. My profiling shows that the library spends a lot of time allocating (and collecting)interface{}
. We could alleviate this a lot by adding support for typed APIs and then using union types internally for storing the data.[]byte
to be controlled. Right now I have no control on how[]byte
fields will be allocated. It would be nice if I could configure my own allocation function or somehow indicate that I would rather the library take a subslice of the buffer its marshaling from instead of allocating a new one.codedBuffer
andm.unmarshal
method do internally where I could have an iterator that reads the stream and gives me the value of each field one at a time. That would probably make a huge performance improvement for my workload.I realize that these changes may not align with your vision for the library so just wanted to get your thoughts.