kzemek / etls

An alternative NIF-based implementation of Erlang ssl module.
Other
36 stars 2 forks source link

Sometimes etls crashes the whole VM #10

Closed silviucpp closed 7 years ago

silviucpp commented 7 years ago

Hello,

We have some nagios script that's trying to see if our service still listen on the specified port. It looks like network errors kill the etls NIF. For instance, the ping from nagios leads to :

terminate called after throwing an instance of 'std::system_error' what(): remote_endpoint: Transport endpoint is not connected

The phrase "Transport endpoint is not connected" corresponds to enotconn error in the erlang ssl/gen_tcp code.

Make sure you catch all errors sent by asio.

Silviu

kzemek commented 7 years ago

Are you able to reproduce the error? Can you give me any reproduction steps, possibly a core file or a stack trace?

We're running etls in production on different sites for about year and a half now without running into uncaught exception, so it's possible it's triggered by a specific chain of operations that are not in our use case.

silviucpp commented 7 years ago

You need to install nagios and then :

/usr/lib/nagios/plugins/check_tcp -H HOST_HERE -p PORT_HERE -w .1 -c .2

kzemek commented 7 years ago

I've tried to reproduce it overnight using a simple echo server (from ranch example files): https://github.com/kzemek/echo

I've used etls 1.1.2 with Erlang 19.3, and used check_tcp from Nagios 4.3.1 via while /usr/local/sbin/check_tcp -H localhost -p 5555 -w .1 -c .2; do sleep 0.1; done. The overnight test ran on macOS, although I also tried it on Linux in Docker containers.

I couldn't reproduce this issue with the aforementioned setup. Can you confirm that you still encounter the crash using similar setup? If not, can I ask you for more details on how etls is configured when you encounter the problem?

silviucpp commented 7 years ago

Hello,

I'll close it then. We are not using etls any longer and it's taking time for me to try to replicate the problem.