[CSharp] Inefficient descriptor literals

mkosieradzki commented 7 years ago

@jskeet I have noticed that protobuf generated assemblies are HUGE and did some investigation.

Today descriptors are stored as base64-encoded string literals (and they are concatenated what makes it worse. This is just evil and has negative impact on three elements:

application startup time
memory footprint
size of assemblies

Why is base64 string literal so evil... .NET stores string internally as UTF16 what basically means: to store 1024 bytes, you need: 1366 UTF16 characters => 2732 bytes, what roughly evaluates to 2.66 ratio (without length delimiters!).

Also you need to do in-memory concatenation (!) and base64 parsing during Descriptor initialization. Increasing temporal memory footprint more than twofold.

I have done some basic tests storing byte array as array function returning byte array literal...

1000 loops x 100KB takes roughly (on my machine - it's not a proper benchmark, but good enough to prove my point): 16ms for byte arrays 750ms for base64 (without concatenation!) It means this is 2 orders of magnitude better performance.

I would also consider storing descriptors as manifest resource streams... but haven't done appropriate tests yet.

So basically I would start by replacing codegen to use byte array (debug dll with 100 KB literal has 133 KB (while base64 literal has 346KB) so it doesn't look like a huge overhead...

jskeet commented 7 years ago

Do you believe this memory/size/performance difference to actually be significant in any real-world scenarios? I would be quite surprised. (I previously had a comment here about the literals being stored in UTF-8. I know attributes are, but it seems string constants elsewhere aren't. I need to track down the details of that.)

If you have 1000 descriptors, each 100K (which are pretty large descriptors), that suggests you've got a pretty enormous application - adding 750ms of time over the course of the entire application lifetime is likely to be irrelevant. That "2 orders of magnitude better performance" sounds important until you recognize that it only happens a single time.

As an example, descriptor.proto is 837 lines - a reasonably large proto. The base64 is just under 8K. I don't personally believe the wasted memory is worth worrying about, compared with the very significant simplicity benefit in having C# code that can be added to a project without having to worry about manifest resource streams.

Using a byte array literal might work, but IIRC there are issues in terms of maximum sizes of byte array literals that compilers can handle, so you'd need to be careful about that. (The reason we concatenate the base64 strings is for that reason too - see https://github.com/jskeet/protobuf-csharp-port/issues/66. It may be that the compilers have all been fixed now, but I'd prefer not to rely on it...)

In short, I very much doubt that this is a truly significant problem in many cases.

You're welcome to investigate generating alternative code, but I'd be somewhat reluctant to move away from the current scheme without significant real world benefits being shown. You should also consider the cost in terms of viewing source code - a byte array literal is likely to create much larger source code than the base64 approach, which can then make it more painful to load/view that source file.

jskeet commented 7 years ago

To strike a more positive note, the good news is that this would be an entirely invisible change in terms of public API, so if we do decide to do it, it won't break compatibility...

I'm definitely not worried about the time aspect - parsing the data is likely to be more expensive than constructing the byte array, and it's once only. But the assembly size and memory are at least interesting.

mkosieradzki commented 7 years ago

If it's UTF8 how would you explain 346KB vs 133KB? I would recommend opening a .net assembly and try to find encoded stream. You will see it's UTF-16...

I am sure this is UTF16 and it creates a huge overhead during JSON parsing (there are some attempts to build raw UTF8 parsers in corefxlabs repo (BTW. there are so many great areas of improvements shown there, including stackallocks, but that's a story for different issue in future ;) ).

I have couple of quite large protos together about ~150KB of raw data of descriptors. Entire assembly has 800KB. So I predict about 250-300 KB savings. This might sound like a nonsense optimization, but it's basically a free and, as you have figured out, transparent optimization.

I see a good point about optimizing a wrong area, but there is a reason I would like to to this. In microservices era it's important to have smaller and faster applications and there are some scenarios where protobufs are dominant operations. OS-level memory reuse is often not an option .

Protocol buffers are very useful in microservices and nanoservices and this is definitely a place where you want small and robust assemblies. That's where .NET Core modularization was born...

BTW. We can always go with a CLI-option :).

mkosieradzki commented 7 years ago

I forgot to add one clear benefit for an array: you can do a multiline array without any concatenations what is tricky with string literals.

Regarding readability - it could be easily fixed by regions..

jskeet commented 7 years ago

@mkosieradzki: Yes, I edited my first comment about UTF-8 vs UTF-16 shortly after posting it.

I still don't think this will be a significant optimization in terms of time - you talk about "there are some scenarios where protobufs are dominant operations" and I agree with that... but making a one time operation faster isn't going to help much. Note that this code isn't even executed until you access something to do with reflection, IIRC. The size-of-assembly argument is stronger, as that's a cost in all cases.

We can't "always go with a CLI option" though - the protobuf team have made it really clear that they want to keep the number of options absolutely minimal. I'm not sure regions will really help much either.

Anyway, I'm sufficiently convinced that it's worth at least trying, and seeing what the results are. Are you happy to put together a PR? If I have to write the code I'm going to view this as fairly low priority (protobufs aren't made day job) but reviewing a PR would be simpler...

mkosieradzki commented 7 years ago

I am definitely happy to create a PR :).

Regarding significance of decreasing the startup time and memory cost just think about Azure Functions, AWS Lambda or their GCP counterpart - application activation time is definitely a significant problem there. And we are avoiding here: base64 parsing, unnecessary strings concatenations, unnecessary GC tainting with temporary strings, assembly load time due to its size alone. I also think that AOT compilers will definitely deal more nicely with the byte array version.

Why is it so important? First of all "single time" occurs there multiple times even today, but also because longer-activation forces to keep a pre-activated applications/containers in memory to avoid even more activations. I believe that future serverless schedulers will adjust the activation policy based on the past activation time/memory measurements.

jskeet commented 7 years ago

That would be important if it were a really significant portion of startup time, but:

This only happens when the descriptor is accessed, which is usually only during reflection (including JSON parsing/formatting). Many applications will never need to do that at all, and those that do are likely to need it on-demand rather than during startup.
You gave an example of 1000 descriptors each of which were 100K, and only saved 750ms. I don't believe that's significant, as any application that large will be doing a lot of other work.
Even within fetching the descriptor, this is likely to be dwarfed by the actual parsing time.

In short, I think it's a very rare application that would actually see benefit in terms of running time, but as I said before I'm happy if we can improve assembly size.

Due to the "once only" nature of the descriptor parsing, it may well be hard to benchmark the parsing time difference, but we could give it a go once the change is available - I strongly suspect we'd need a very large set of proto descriptors to see the difference though.

mkosieradzki commented 7 years ago

This only happens when the descriptor is accessed, which is usually only during reflection (including JSON parsing/formatting). Many applications will never need to do that at all, and those that do are likely to need it on-demand rather than during startup.

That's a good point - as JSON parsing/formatting is definitely not THAT common scenario as raw protobuf parsing/formatting.

So I agree. The only valid reason is the assembly size.

github-actions[bot] commented 3 months ago

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago.

github-actions[bot] commented 3 months ago

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please reopen it.

This issue was closed and archived because there has been no new activity in the 14 days since the inactive label was added.

protocolbuffers / protobuf

[CSharp] Inefficient descriptor literals #3253