Closed JohnCMcDonough closed 5 years ago
I think this may be the root cause of other issues such as: https://github.com/vernemq/vernemq/issues/556 and https://github.com/vernemq/vernemq/issues/612
Thanks @JohnCMcDonough for your work on this! So apparently this fix for the Hackney socket leak wasn't enough (or it was fixing some other issue actually): https://github.com/vernemq/vernemq/commit/b9e966280a19bd7c312e09bacdb7599870770c0c
Do we have an indication that Hackney 1.12 is actually free of this issue? (cc @larshesel @dergraf )
I've created a PR (#1168) which upgrades hackney
to version 1.15.1 - can you test if the PR solves the issue?
In any case a lot of bugs have been fixed since 1.8.6 and 1.15.1 so it was about time to get it upgraded.
Environment
Expected behavior
Webhooks continue to function, even after a network failure.
Actual behaviour
After receiving many ECONN Resets due to networking issues in the cluster, the webhooks no longer function until after Vernemq instances have been rebooted. We just get a constant stream of Connection Timeouts in the logs. Even if we remove all load, and attempt to connect a single device, it fails.
We've been able to decrease how often this happens by setting:
This prolongs the issue, but does not solve it. It appears that the version of Hackney being used by Vernemq is hackney/1.8.6. This has a known issue describing this exact problem.
https://github.com/benoitc/hackney/issues/462