moleculerjs / moleculer

:rocket: Progressive microservices framework for Node.js
https://moleculer.services/
MIT License
6.14k stars 582 forks source link

Performance issue with NATS transporter (v2.x.x) #1237

Closed icebob closed 1 year ago

icebob commented 1 year ago

Discussed in https://github.com/moleculerjs/moleculer/discussions/1235

Originally posted by **mrprigun** August 8, 2023 Hello, I encountered certain obstacles in my use case while attempting to execute the molecular actions. Briefly, I have a service (gateway for the main app with `moleculer-web`) that looks like this : ```js module.exports = { name: "test", actions: { hello: { rest: "GET /hello", handler(ctx) { ctx.meta.$responseType = "text/plain"; return "Hello Moleculer"; } }, second_call: { rest: "GET /second_call", handler(ctx) { // some actions return ctx.call('external.call', params_action_results_above); } }, } }; ``` `external.call` is located on another node and invocation happens using NATS transporter. By design actions `hello` and `second_call` are gonna be loaded to process unique tasks. Before publishing it to production I made benchmarks for each action and received such results on my laptop: - The `hello` action works pretty well, the result was ~600rps - But the second one `second_call` was very much degraded, I received just ~40rps Also, I received similar results after deployment to a prod-like environment. I believe this occurs due to external triggering, which is anticipated, but why so much? I've tried to use different load-balancing strategies and `bulkhead` but didn't receive any significant improvements. Is there a possibility of configuring Moleculer to enhance this behavior or it's some kind of bug?
icebob commented 1 year ago

It looks that it's an issue with the nats library version 2.x.x. With the previous 1.x.x version the performance is fine. There is an open issue (2 years ago) about it in NATS repo: https://github.com/nats-io/nats.js/issues/438

mrprigun commented 1 year ago

For more context, libs:

"moleculer": "0.14.31",
"nats": "2.15.1",

Used OS for local testing: MacOS 13.5 Service images in the k8s cluster are based on node:18-alpine

icebob commented 1 year ago

Could you switch back to nats 1.4.12 to check the performance with this version as well?

mrprigun commented 1 year ago

Already tried. Initially, it was using nats 1.4.12 and the results were pretty similar, that's why I moved to 2.15.1.

icebob commented 1 year ago

What is the NATS server version?

mrprigun commented 1 year ago

2.9.11, if more concrete docker.io/bitnami/nats:2.9.11-debian-11-r0

icebob commented 1 year ago

plz try with the latest version 2.9.21

mrprigun commented 1 year ago

It appears that I've identified the root cause of the issue. Upon experimenting with various broker settings, I found that disabling the metrics feature resolved the problem. In my project, I use a StatsD reporter. Everything is functioning as anticipated with the Prometheus reporter. The StatsD reporter configuration looks like this

{
        type: 'StatsD',
        options: {
          // Server host
          host: 'localhost',
          // Server port
          port: 8125,
          // Maximum payload size.
          maxPayloadSize: 1300,
        }
},

So it seems definitely not the NATS issue. I'll turn off StatsD for now.

icebob commented 1 year ago

It's strange because I can reproduce this issue without any metrics, only with 2.x.x nats lib.

icebob commented 1 year ago

I've found the problem inside the nats library. The sending logic is changed to queue-based in 2.x.x version. Skipping this logic I could reach 30.000 msg/sec instead of 40 msg/sec. I've opened an issue in the NATS client repo: https://github.com/nats-io/nats.js/issues/581

mrprigun commented 1 year ago

This is a significant improvement, waiting for this fix then 😃

Pushed my tests just in case https://github.com/mrprigun/moleculer-benchmark-test There are two dedicated nodes and nats in docker-compose. With enabled StatsD reporter I receive ~170rps, with disabled reporter it's ~2k rps on my laptop.

Nats service: 2.9.21 Nats lib: 2.15.1 Node: v18.15.0 OS: MacOS 13.5

icebob commented 1 year ago

NATS fixed the issue in 2.16.0, my results:

image