Broker.call waits forever if Redis transporter reconnects while transmitting a message

FFKL commented 2 years ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

[x] I am running the latest version
[x] I checked the documentation and found no answer
[x] I checked to make sure that this issue has not already been filed
[x] I'm reporting the issue to the correct repository

Current Behavior

Broker.call waits forever if Redis transporter reconnects while transmitting a message. Of course, we have requestTimeout, but I don't think this is the correct behavior.

Expected Behavior

Returned Promise must be rejected on connection close.

Failure Information

Example here: https://github.com/FFKL/moleculer-redis-freeze

Steps to Reproduce

Please provide detailed steps for reproducing the issue.

Call remote action with large buffer via redis transporter
Redis closes connection with subscribe scheduled to be closed ASAP for overcoming of output buffer limits. message
Returned Promise will always wait.

Reproduce code snippet

const { ServiceBroker } = require('moleculer');

const mainBroker = new ServiceBroker({
  logger: console,
  logLevel: 'debug',
  nodeID: 'main',
  transporter: 'redis://localhost:6379',
});

const brokerWithWorker = new ServiceBroker({
  logger: console,
  logLevel: 'debug',
  nodeID: 'with-worker',
  transporter: 'redis://localhost:6379',
});

brokerWithWorker.createService({
  name: 'worker',
  actions: {
    async getBigMessage() {
      return Buffer.alloc(60 * 1024 * 1024);
    },
  },
});

brokerWithWorker.start().then(async () => {
  await mainBroker.start();
  await mainBroker
    .call('worker.getBigMessage')
    // will wait forever!
    .then((res) => mainBroker.logger.info(`Size: ${res.data.length}`))
    .catch((err) => mainBroker.logger.error(`Error: ${err}`));
});

Context

Please provide any relevant information about your setup. This is important in case the issue is not reproducible except for under certain conditions.

Moleculer version: 0.14.18
Ioredis version: 4.28.0
NodeJS version: 12.16.3
Operating System: Ubuntu 20.04

Failure Logs

[2021-11-15T18:53:47.458Z] DEBUG with-worker/TRANSIT: <= Request 'worker.getBigMessage' received from 'main' node.
[2021-11-15T18:53:58.152Z] WARN  main/TRANSPORTER: Redis-sub client is disconnected.
[2021-11-15T18:53:58.230Z] INFO  main/TRANSPORTER: Redis-sub client is connected.
[2021-11-15T18:53:58.231Z] INFO  main/TRANSPORTER: Setting Redis transporter
[2021-11-15T18:53:58.237Z] INFO  main/TRANSPORTER: Redis-pub client is connected.

AndreMaz commented 2 years ago

Thanks for the repro repo, it was really helpful to see what's happening under the hood.

Unfortunately, I don't think that it can be fixed on moleculer side. I'll try to explain what happens

main node connects to Redis and calls the worker
worker sends the "giant message" via send method https://github.com/moleculerjs/moleculer/blob/b48ec4f3d39951ad433a64c4c36ecdc2af824c24/src/transporters/redis.js#L133-L140
The message is successfully sent to Redis. Redis does not throw any error message to worker node. So on worker side everything is good.
Redis receives the message and checks its size. If the message size is larger than output buffer limit then Redis will close connection of all subscribers/consumers
main node reconnects to Redis but it doesn't know anything about the "giant message". Since the message is larger than the output buffer it won't be sent anywhere. Because of this the main node gets stuck forever.

The main issue is that Redis does not inform the client about the reason for closing the connection. The reason is only logged on the server-side. So even if we "break" the await mainBroker.call("worker.getBigMessage") we don't know the exact cause. What would be the error that moleculer should throw?

I've searched for the scheduled to be closed ASAP for overcoming of output buffer limits. Here's what I found https://gist.github.com/amgorb/c38a77599c1f30e853cc9babf596d634 The solution is basically to increase the output buffer size.

FFKL commented 2 years ago

@AndreMaz Thank you for the explanation. It took me a few hours to figure out that this is a Redis failure so I decided to open this issue;) I thought it might be possible to fix this on the moleculer side due to Promise memory leak. Can we keep track of such pending Promises and show a warning message or configure Transporter to reject these Promises?

My solution at the moment:

Use requestTimeout
Increase output buffers limits

And the main problem is that in my case it is a heisenbug so Redis has also time limit - a soft limit of 32 megabytes per 10 seconds means that if the client has an output buffer bigger than 32 megabytes for, continuously, 10 seconds, the connection gets closed.. The "giant message" sometimes can be successfully delivered.

AndreMaz commented 2 years ago

It took me a few hours to figure out that this is a Redis failure so I decided to open this issue;)

Yeah, it's definitely a tricky situation. Without your repro repo it would have been really difficult to find the issue.

Can we keep track of such pending Promises and show a warning message or configure Transporter to reject these Promises?

We could track a pending Promise by adding a timeout but that is already done with requestTimeout. Do you have any suggestions?

Can you use streams in this case?

FFKL commented 2 years ago

@AndreMaz Unfortunately, transporter sends all chunks to Redis one by one. In this case, nothing changes. I've added the example to the repo.

I think, requestTimeout is not the same as the potential cutAllPendingRequests option (it does not reject connection instantly on close event), but this is the only working solution right now.

icebob commented 1 year ago

I'm closing this issue because we don't know, how we can solve it inside Moleculer. If you have an idea or solution please reopen or open a PR.

moleculerjs / moleculer