Closed silviucpp closed 7 years ago
Are you able to reproduce the error? Can you give me any reproduction steps, possibly a core file or a stack trace?
We're running etls in production on different sites for about year and a half now without running into uncaught exception, so it's possible it's triggered by a specific chain of operations that are not in our use case.
You need to install nagios and then :
/usr/lib/nagios/plugins/check_tcp -H HOST_HERE -p PORT_HERE -w .1 -c .2
I've tried to reproduce it overnight using a simple echo server (from ranch example files): https://github.com/kzemek/echo
I've used etls 1.1.2 with Erlang 19.3, and used check_tcp
from Nagios 4.3.1 via while /usr/local/sbin/check_tcp -H localhost -p 5555 -w .1 -c .2; do sleep 0.1; done
. The overnight test ran on macOS, although I also tried it on Linux in Docker containers.
I couldn't reproduce this issue with the aforementioned setup. Can you confirm that you still encounter the crash using similar setup? If not, can I ask you for more details on how etls is configured when you encounter the problem?
Hello,
I'll close it then. We are not using etls any longer and it's taking time for me to try to replicate the problem.
Hello,
We have some nagios script that's trying to see if our service still listen on the specified port. It looks like network errors kill the etls NIF. For instance, the ping from nagios leads to :
terminate called after throwing an instance of 'std::system_error' what(): remote_endpoint: Transport endpoint is not connected
The phrase "Transport endpoint is not connected" corresponds to enotconn error in the erlang ssl/gen_tcp code.
Make sure you catch all errors sent by asio.
Silviu