nymtech / nym

Nym provides strong network-level privacy against sophisticated end-to-end attackers, and anonymous transactions using blinded, re-randomizable, decentralized credentials.
https://nymtech.net
1.26k stars 232 forks source link

Handling of error case during outbound connection step in `nym_network_requester::core` crashes process #4375

Open numbleroot opened 7 months ago

numbleroot commented 7 months ago

On Nym version nym-binaries-v2023.5-rolo, the top-level process of a gateway with an embedded network requester crashes if the outwards TCP connection upon request by a Nym client results in an error: https://github.com/nymtech/nym/blob/00600ddeeba4ed453c27bd7ea3b144d3d8560141/service-providers/network-requester/src/core.rs#L460:L477

In my case this happens when the destination domain (e.g., analytics.faw.sky.com) doesn't resolve to an IP address (no A record can be found). However, instead of signalling the requesting Nym client via the mixnet and gracefully returning from the function, the entire gateway process is crashed. I believe this is due to a missing shutdown.mark_as_success();, but I might be wrong, of course.

Relevant log lines of the gateway with embedded network requester:

ERROR nym_network_requester::core          > error while connecting to analytics.faw.sky.com:443: failed to lookup address information: No address associated with hostname
DEBUG TaskClient-NetworkRequester-child    > the task client is getting dropped
TRACE TaskClient-NetworkRequester-child    > Notifying stop on unexpected drop

As a test, I added the following lines after line 474 in above referenced code part:

if err.to_string() == "failed to lookup address information: No address associated with hostname" {
    shutdown.mark_as_success();
}

which is of course overly specific to my error case, but does seem to at least prevent the gateway process from crashing. (shutdown needs to become mut in the function parameters).

I got this inspiration from another place in the core.rs source file, handling a similar situation: https://github.com/nymtech/nym/blob/00600ddeeba4ed453c27bd7ea3b144d3d8560141/service-providers/network-requester/src/core.rs#L552:L568

In particular, line 566: https://github.com/nymtech/nym/blob/00600ddeeba4ed453c27bd7ea3b144d3d8560141/service-providers/network-requester/src/core.rs#L566

After adding this line, the relevant log parts of the gateway with embedded network requester change to:

ERROR nym_network_requester::core          > error while connecting to analytics.faw.sky.com:443: failed to lookup address information: No address associated with hostname
TRACE TaskClient-NetworkRequester-child    > the task client is getting dropped (but instructed to not signal)

Could it be the case that the effort to gracefully shut down or drop tasks hadn't reached this part of the code just yet and marking shutdown as success is all that is needed here to fix it? Or are more substantial changes needed here?

octol commented 7 months ago

Yeah we had a few of these bugs with the graceful shutdown system, I'll have a look!