Closed bagusandrian closed 6 months ago
Hi @bagusandrian, I appreciate you taking the time to submit this, but this has been a much discussed aspect of NSQ (see #510, with many links to other threads). If we ever decide to actually invest in meaningfully changing this aspect of NSQ, I don't think that a retry mechanism is the way to go.
Issue
The current NSQ implementation lacks proper handling for message distribution errors, particularly in scenarios where errors such as Out of Memory (OOM) or Out of Disk Space occur during message distribution from topic to channel. This results in lost messages for some channels listed on the topic.
Changes Made
I have introduced a retry mechanism to address this issue. The code now includes retry , which is based on the value of msg.maxRetryChannel. This modification aims to improve the reliability of message distribution and prevent message loss when errors such as OOM or disk space exhaustion are encountered.
Proposed Solution
The solution involves checking the value of msg.maxRetryChannel and retrying the message distribution process accordingly. By incorporating this retry mechanism, we aim to enhance the robustness of NSQ in handling errors during message distribution.
Impact
These changes should have a positive impact on the reliability of NSQ, especially in environments where issues like OOM or disk space constraints may arise during message distribution. However, it's important to note any potential side effects and risks associated with the retry mechanism.
I welcome feedback and suggestions for further improvements. This enhancement aims to address a critical issue in NSQ's reliability and prevent message loss in error-prone scenarios.