moleculerjs / moleculer

:rocket: Progressive microservices framework for Node.js
https://moleculer.services/
MIT License
6.12k stars 580 forks source link

Broker.call waits forever if Redis transporter reconnects while transmitting a message #1031

Closed FFKL closed 1 year ago

FFKL commented 2 years ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

Current Behavior

Broker.call waits forever if Redis transporter reconnects while transmitting a message. Of course, we have requestTimeout, but I don't think this is the correct behavior.

Expected Behavior

Returned Promise must be rejected on connection close.

Failure Information

Example here: https://github.com/FFKL/moleculer-redis-freeze

Steps to Reproduce

Please provide detailed steps for reproducing the issue.

  1. Call remote action with large buffer via redis transporter
  2. Redis closes connection with subscribe scheduled to be closed ASAP for overcoming of output buffer limits. message
  3. Returned Promise will always wait.

Reproduce code snippet

const { ServiceBroker } = require('moleculer');

const mainBroker = new ServiceBroker({
  logger: console,
  logLevel: 'debug',
  nodeID: 'main',
  transporter: 'redis://localhost:6379',
});

const brokerWithWorker = new ServiceBroker({
  logger: console,
  logLevel: 'debug',
  nodeID: 'with-worker',
  transporter: 'redis://localhost:6379',
});

brokerWithWorker.createService({
  name: 'worker',
  actions: {
    async getBigMessage() {
      return Buffer.alloc(60 * 1024 * 1024);
    },
  },
});

brokerWithWorker.start().then(async () => {
  await mainBroker.start();
  await mainBroker
    .call('worker.getBigMessage')
    // will wait forever!
    .then((res) => mainBroker.logger.info(`Size: ${res.data.length}`))
    .catch((err) => mainBroker.logger.error(`Error: ${err}`));
});

Context

Please provide any relevant information about your setup. This is important in case the issue is not reproducible except for under certain conditions.

Failure Logs

[2021-11-15T18:53:47.458Z] DEBUG with-worker/TRANSIT: <= Request 'worker.getBigMessage' received from 'main' node.
[2021-11-15T18:53:58.152Z] WARN  main/TRANSPORTER: Redis-sub client is disconnected.
[2021-11-15T18:53:58.230Z] INFO  main/TRANSPORTER: Redis-sub client is connected.
[2021-11-15T18:53:58.231Z] INFO  main/TRANSPORTER: Setting Redis transporter
[2021-11-15T18:53:58.237Z] INFO  main/TRANSPORTER: Redis-pub client is connected.
AndreMaz commented 2 years ago

Thanks for the repro repo, it was really helpful to see what's happening under the hood.

Unfortunately, I don't think that it can be fixed on moleculer side. I'll try to explain what happens

  1. main node connects to Redis and calls the worker
  2. worker sends the "giant message" via send method https://github.com/moleculerjs/moleculer/blob/b48ec4f3d39951ad433a64c4c36ecdc2af824c24/src/transporters/redis.js#L133-L140
  3. The message is successfully sent to Redis. Redis does not throw any error message to worker node. So on worker side everything is good.
  4. Redis receives the message and checks its size. If the message size is larger than output buffer limit then Redis will close connection of all subscribers/consumers
  5. main node reconnects to Redis but it doesn't know anything about the "giant message". Since the message is larger than the output buffer it won't be sent anywhere. Because of this the main node gets stuck forever.

The main issue is that Redis does not inform the client about the reason for closing the connection. The reason is only logged on the server-side. So even if we "break" the await mainBroker.call("worker.getBigMessage") we don't know the exact cause. What would be the error that moleculer should throw?

I've searched for the scheduled to be closed ASAP for overcoming of output buffer limits. Here's what I found https://gist.github.com/amgorb/c38a77599c1f30e853cc9babf596d634 The solution is basically to increase the output buffer size.

FFKL commented 2 years ago

@AndreMaz Thank you for the explanation. It took me a few hours to figure out that this is a Redis failure so I decided to open this issue;) I thought it might be possible to fix this on the moleculer side due to Promise memory leak. Can we keep track of such pending Promises and show a warning message or configure Transporter to reject these Promises?

My solution at the moment:

And the main problem is that in my case it is a heisenbug so Redis has also time limit - a soft limit of 32 megabytes per 10 seconds means that if the client has an output buffer bigger than 32 megabytes for, continuously, 10 seconds, the connection gets closed.. The "giant message" sometimes can be successfully delivered.

AndreMaz commented 2 years ago

It took me a few hours to figure out that this is a Redis failure so I decided to open this issue;)

Yeah, it's definitely a tricky situation. Without your repro repo it would have been really difficult to find the issue.

Can we keep track of such pending Promises and show a warning message or configure Transporter to reject these Promises?

We could track a pending Promise by adding a timeout but that is already done with requestTimeout. Do you have any suggestions?

Can you use streams in this case?

FFKL commented 2 years ago

@AndreMaz Unfortunately, transporter sends all chunks to Redis one by one. In this case, nothing changes. I've added the example to the repo.

I think, requestTimeout is not the same as the potential cutAllPendingRequests option (it does not reject connection instantly on close event), but this is the only working solution right now.

icebob commented 1 year ago

I'm closing this issue because we don't know, how we can solve it inside Moleculer. If you have an idea or solution please reopen or open a PR.