Significant Performance Regression in ts-proto v2 Compared to v1

HofmannZ commented 3 months ago

Hello ts-proto maintainers,

First, I want to thank you for your work on ts-proto—it's been an invaluable tool for our projects. We recently benchmarked three libraries (ts-proto, google-protobuf, and protobuf-ts) as part of our ongoing efforts to optimize our performance. Based on those benchmarks, we initially selected ts-proto (v1) as it outperformed the other libraries.

Recently, we decided to include ts-proto v2 in our benchmark tests to evaluate any improvements or changes in performance. However, we were surprised to observe a significant performance regression in ts-proto v2 compared to v1.

Here are the benchmark results:

google-protobuf v3.21.4:

readInputs: 7,184 ops/sec (139,186 ns average time)
updateInputs: 6,109 ops/sec (163,671 ns average time)

ts-proto v1.181.2:

readInputs: 7,505 ops/sec (133,227 ns average time)
updateInputs: 6,353 ops/sec (157,404 ns average time)

ts-proto v2.0.3:

readInputs: 6,372 ops/sec (156,918 ns average time)
updateInputs: 4,624 ops/sec (216,218 ns average time)

protobuf-ts v2.9.4:

readInputs: 6,252 ops/sec (159,943 ns average time)
updateInputs: 4,435 ops/sec (225,466 ns average time)

As you can see, the performance of ts-proto v2 is significantly lower than v1, particularly in the updateInputs task, where the operations per second dropped by almost 27%, and the average time increased by about 37%.

Steps to Reproduce:

Benchmark ts-proto v1 and v2 using our test suite.
Compare the performance metrics (ops/sec and average time) for tasks like readInputs and updateInputs.

Expected Behavior:

We expected ts-proto v2 to at least maintain the performance levels of v1, if not improve upon them.

Actual Behavior:

The performance of ts-proto v2 is significantly lower, particularly in the updateInputs task.

Environment:

Node.js version: v20.16.0
ts-proto versions: v1.181.2 and v2.0.3
Operating System: MacOS 14.6.1

Additional Context:

We would greatly appreciate any insights into what might be causing this regression. If there's anything specific in the new version that we should be aware of, or if there are configuration changes we can make to mitigate this, please let us know.

Thank you for your time and for all the work you put into maintaining this library!

stephenh commented 3 months ago

Hi @HofmannZ ! Thanks for filing the issue!

That is an interesting but unfortunate result...

The biggest change in ts-proto v2 is that we'd migrated to a different binary serialization library, @bufbuild/protobuf, as the previous library we sat on top of, protobuf.js, has been stagnant (albeit technically maintained) for quite awhile, around issues like ESM, Bigint, and custom type support that, over the years, have been painful to deal with.

So, in theory this move was a big win in terms of a getting a nice/needed refresh to our overall stack -- but definitely here you about the performance regression.

@timostamm do you have thoughts/comments on ^?

Fwiw @HofmannZ the benchmark repo you linked to might still be private? The link 404s for me...

HofmannZ commented 3 months ago

@stephenh - My apologies! I had mistakenly linked to a private repo, but it's now public.

timostamm commented 3 months ago

We've been waiting for resizable ArrayBuffers - there's potential for a significant boost in write performance. They aren't widely enough available yet, though.

I think that conditional imports like they are used in protobuf.js should be avoided because of issues with bundlers and ESM, but it's possible that perf could be improved without that.

It would be interesting to take a look at flame graphs to see where the time is spent compared to the protobuf.js reader and writer. I'm happy to help take a look and make improvements. ts-proto, protobuf-ts, and protobuf-es would all benefit from that.

stephenh commented 3 months ago

Selfishly that would be really great & appreciated @timostamm ! :-D

Fwiw @HofmannZ since this is important to you, would also be great ofc if you wanted to analyze/dive deeper into the protobufjs/buildbuf flamegraphs, and I assume you can stay on ts-proto 1.x until the performance is back on par.

Thanks!

HofmannZ commented 3 months ago

Hi @stephenh and @timostamm,

Thanks for the insights—much appreciated!

I've done some preliminary research on resizable ArrayBuffers, and they seem particularly promising for improving performance in protocol buffer processing. Here’s why they might be beneficial in this context:

Dynamic Memory Management for Protobuf Messages: Protocol buffers often deal with messages of varying sizes. With traditional fixed-size ArrayBuffers, we either need to over-allocate memory or handle complex resizing logic. Resizable ArrayBuffers would allow the buffer to grow dynamically as more data is encoded or decoded, reducing the need for costly reallocations and memory copying during serialization and deserialization.
Performance Gains in Write Operations: When encoding large or complex protocol buffer messages, resizable ArrayBuffers can optimize memory usage by expanding only when necessary. This can lead to faster write operations, as the overhead of managing buffer sizes manually is minimized, directly impacting performance where it matters most in encoding tasks.
Simplified Buffer Management in Protobuf Implementations: Managing memory efficiently is crucial in high-performance applications. Resizable ArrayBuffers can help simplify the implementation of protocol buffers by removing the need for pre-allocating large buffers or implementing custom resizing logic, making the codebase cleaner and less error-prone.
Alignment with Future Protocol Buffer Features: As protocol buffers evolve to support more complex data types and larger messages, leveraging resizable ArrayBuffers can ensure that the library remains capable of handling these future demands efficiently, keeping pace with modern JavaScript capabilities.

However, I noticed that resizable ArrayBuffers are only supported starting from Node.js version 20. Adopting them would mean dropping support for Node.js version 18, which might be a significant decision for the project. It’s definitely something to weigh as we consider optimization strategies.

Regarding the flame graph analysis, I'm keen to contribute and would love your guidance on a few points:

Excluding Warm-up Rounds: My benchmark code includes warm-up rounds that shouldn't influence the analysis. I plan to exclude these from the flame graph profiling by starting data capture after a set number of iterations. If you have preferred methods or tools for this, I'd appreciate any recommendations.
Focus on Key Iterations: Given that the benchmarks run 10,000 times, I’m considering focusing the flame graph analysis on a representative subset of iterations to avoid excessive data and make it easier to identify performance bottlenecks. What would you suggest as the best approach here?
Comparative Flame Graphs: Creating flame graphs for both protobuf.js and @bufbuild/protobuf could provide valuable insights into where performance differences arise. If this sounds useful, I can generate these profiles and share them for further discussion.

I'd appreciate any suggestions or best practices you might have for generating and interpreting these flame graphs effectively. I'm committed to helping find a solution that brings the performance of ts-proto v2 back on par with v1.

stephenh commented 2 months ago

Hey @HofmannZ ! Thanks for the questions/context on the flamegraphs...

Honestly when I've done flamegraphs in the past, the issue is either pretty obvious ... or it's not, so I'm not sure some of the fancier "focus on key iterations"/etc approaches are necessary.

Hopefully just seeing protobufjs and @bufbuild/protobuf flamegraphs side-by-side will make the issue easy to see; granted, sometimes it doesn't, if it's an issue that is "death by a thousand cuts" and not an obvious/single hotspot.

But, side by side flamegraphs does seem like the best place to start. Thanks!

HofmannZ commented 1 month ago

@stephenh - Sorry for the delay in getting back to this! I was swamped with the US tax deadline (October 15), so it took me a while to follow up.

I’ve now added support for creating flame graphs in our benchmark repository: https://github.com/expatfile/protobuf-benchmarks/commit/5ee4b5b08da1cecbbb6e62f802651d334c4d08a1

To help narrow down the performance issues, I suggest you generate flame graphs using ./run_flame_ts_proto.sh from the root of the benchmark project. Any additional insights would be great!

HofmannZ commented 5 days ago

@stephenh @timostamm - Can you indicate how we can move this item forward?

stephenh commented 5 days ago

@HofmannZ realistically the best way for this to move forward is for you to dig in, find the hot spot(s), and open a PR, either to ts-proto or the underlying bufbuild/protobuf library.

This is just the way open source works--if maintainers don't have the time/resources/"itch" to fix issues personally, then it's up to contributors/potential contributors.

Thanks!

stephenh / ts-proto

Significant Performance Regression in ts-proto v2 Compared to v1 #1098