[PERF] Further performance improvements discussion

richardartoul commented 5 years ago

I have a couple more ideas for performance improvements and am wondering how you feel about all of the following:

How do you feel about replacing some of the internal maps that are keyed on field number with slices? My profiling shows a large chunk of time spent in map_access routines that could probably be heavily reduced by using fixed size slices (i.e if a messages maximum field number is 100 then we would use a slice with size 101. I.E m.values could become []interface{}
Adding "typed" APIs so that I can do something like m.GetInt64FieldByFieldNumber(fieldNum). My profiling shows that the library spends a lot of time allocating (and collecting) interface{}. We could alleviate this a lot by adding support for typed APIs and then using union types internally for storing the data.
Allowing allocation of []byte to be controlled. Right now I have no control on how []byte fields will be allocated. It would be nice if I could configure my own allocation function or somehow indicate that I would rather the library take a subslice of the buffer its marshaling from instead of allocating a new one.
Probably the biggest improvement for me would be if I could unmarshal buffers in an iterative fashion (basically if I could do what the codedBuffer and m.unmarshal method do internally where I could have an iterator that reads the stream and gives me the value of each field one at a time. That would probably make a huge performance improvement for my workload.

I realize that these changes may not align with your vision for the library so just wanted to get your thoughts.

jhump commented 5 years ago

@richardartoul, thanks for opening this issue. I'm happy to discuss how to improve performance in this library.

I'll address these in bullets, corresponding to each bullet you wrote:

Using just a slice will not be sufficient: there are pathological edge cases, particularly around extensions, where a message could have very high tag numbers. This will make the slice allocation consume a silly amount of memory, far more than the number of actual fields that will ever be used. So I think this would actually need to be a hybrid solution, such as maybe a slice for tags 1 up to 64 and a map for everything else. While I think a solution like this could be workable, I worry a little bit about how intrusive it would be. It would likely be necessary to also flesh out a lot of the existing TODOs in test coverage to have confidence that the change doesn't introduce regressions.
Adding typed APIs will introduce way too many methods, just because there are already so many permutations of methods. Maybe instead we could just make the union type an exported type and use that directly. That way, we won't need a version of every method for every type in {int32,uint32,int64,uint64,float32,float64,bool,string,[]byte, pointer to message} (which would be a truly tremendously wide API). Instead, we could just double the current method count and replace interface{} with *Value (or Value, if the union type can be sufficiently compact).
I worry a little that this could complicate the API, but I am receptive to the idea. Maybe make some concrete proposals for which methods need alternatives and what those alternatives would look like?
I don't quite understand this one. If you just need access to the stream of values, you don't need a dynamic message for that. It sounds like you just need something that wraps codedBuffer (or maybe wraps the exported proto.Buffer type?). The entries coming from that stream would be a tuple of "tag", "wire type", and "value" (where value could be int64 or []byte and then re-interpreted by the consumer, if it has more type information from the tag number).

richardartoul commented 5 years ago

@jhump Regarding #4 correct thats what I would need. I guess my comment was more around whether or not I could introduce such an iterator into this library (so that I could re-use a lot of the logic you've already written).

With recent optimizations the largest bottleneck in my code right now is the Unmarshal() method and this would allow me to get around that entirely (the package I'm working on basically receives a stream of marshaled protobufs and compresses them in real-time into a custom format so I don't really need the *dynamic.Message there so much as I need the stream of values

richardartoul commented 5 years ago

@jhump I'm hoping to get started on #4 soon. To clarify, are you ok with me adding some kind of exported wrapper around codedBuffer?

jhump commented 5 years ago

Can you not use the exported proto.Buffer (in the "github.com/golang/protobuf/proto" package)?

If not, I don't think a wrapper belongs in the dynamic package, since it is not really specific to dynamic messages. Also, it's not a lot of code -- I wonder if it wouldn't be better to just fork. (That's basically what I did -- there was one thing I wanted from proto.Buffer, but they wouldn't accept a pull request. So I forked it.)

If forking is really that bad of an option, let's first decide where to put this. Maybe a new sub-package in this repo?

richardartoul commented 5 years ago

I didn't know about proto.Buffer and I think I'll have the same problem as you, I'll need something like eof() so I can loop until I hit the end.

Forking isn't terrible, just seems like I'll be copying a lot of your code, but maybe it won't be so bad.

How about I start by forking / copying and pasting and see how far I get and then if I reach a point where I think what I have is worth upstreaming or would be beneficial to this library we can talk about it then?

It seems like you might be right though.

jhump commented 5 years ago

@richardartoul, SGTM

jhump commented 5 years ago

I totally understand not wanting to fork. (I made the same appeal to the protobuf project.) I just hesitate to make this repo the home for that. That being said, I'm not necessarily opposed to it. I just want to make sure it's been thought through before adding it.

jhump commented 5 years ago

@richardartoul, I think I may be flip-flopping regarding a public API for the binary protobuf format. I now have need for something similar, too, outside of this repo. So I'm now thinking of the same thing you were: wanting a single, shared public API to use.

So I've created a candidate for this exported API with #196. I need to write some more tests before I could merge that PR. I'm also still debating whether this is the right place for such a package. I think I'll start with following my own advice about forking it in my other project and see how that goes. That will give some time to think about this API a bit and decide whether to merge a new codec package or to close the PR and just stick with a fork...

If you're curious about the API I've proposed, take a look at the PR and let me know what you think. The codec.Buffer could possibly could use some more methods; i.e. there may be some logic still inside of the dynamic package (for marshaling/unmarshaling dynamic messages) that could be moved to that package.

richardartoul commented 5 years ago

I took a look. It seems like mostly you've just made the buffer public (and a few new APIs like EncodeMessage). Is that correct? I could definitely use that.

For context on what I'm doing with it, this is my P.R: https://github.com/m3db/m3/pull/1597

Basically buffer.go is copy-pasta of your buffer and then custom_unmarshal.go is where I implement the "efficient" API for unmarshaling (mainly can decode top-level scalar fields with no allocations).

There is also an explanation in unmarshal.md

I tried to implement it as an iterator at first but it didn't work out since my use-cases requires knowing if the stream is completely valid before doing anything else with it since rolling back is difficult

jhump / protoreflect

[PERF] Further performance improvements discussion #189