nymtech / nym

Nym provides strong network-level privacy against sophisticated end-to-end attackers, and anonymous transactions using blinded, re-randomizable, decentralized credentials.
https://nymtech.net
1.28k stars 233 forks source link

[Issue] Suspiciously high rate of write ops and syscalls in gateway process #1975

Open ststefa opened 2 years ago

ststefa commented 2 years ago

Describe the issue

I run the nym-gateway for a while as part of the shipyard program. Most of the time it has two socks-clients attached which are not reachable by anyone except me.

Being puzzled about some strange-looking os-level metrics I found that the number of context switches of the gateway process is suspiciously high.

$ pidstat -w -t -p $(pidof nym-gateway) 5 1
Linux 5.10.0-17-amd64 (he5)     22/08/22     _x86_64_    (3 CPU)

14:31:12      UID      TGID       TID   cswch/s nvcswch/s  Command
14:31:17     1005      5265         -      0.00      0.00  nym-gateway
14:31:17     1005         -      5265      0.00      0.00  |__nym-gateway
14:31:17     1005         -      5266    536.60     32.40  |__tokio-runtime-w
14:31:17     1005         -      5267    539.40     30.40  |__tokio-runtime-w
14:31:17     1005         -      5268    527.00     30.40  |__tokio-runtime-w
14:31:17     1005         -    333326    125.20      5.20  |__sqlx-sqlite-wor
14:31:17     1005         -    333330    122.40      2.40  |__sqlx-sqlite-wor
14:31:17     1005         -    333613    124.20      3.00  |__sqlx-sqlite-wor
14:31:17     1005         -    334007    122.00      2.40  |__sqlx-sqlite-wor
14:31:17     1005         -    334042    123.00      2.80  |__sqlx-sqlite-wor
14:31:17     1005         -    334216    123.80      2.40  |__sqlx-sqlite-wor
14:31:17     1005         -    334386    122.20      3.20  |__sqlx-sqlite-wor
14:31:17     1005         -    334502      0.40      0.20  |__tokio-runtime-w

The process had been running for roughly six days at that moment. I started to take hourly snapshots of the amount of context switches for two weeks to learn how this evolves over time. They show that the values are very high on average but at least constant over time.

Digging further into that, I found that the process produces an huge number of syscalls which is most likely the reason for the amount of context switches (as syscalls require context switches).

Here are (roughly) ten seconds of tracing the number of syscalls per process (some omitted). The last two ("sqlx-sqlite-wor" and "tokio-runtime-w") are the threads spawned by nym-gateway:

$ bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
Attaching 1 probe...
^C
...
@[rs:main Q:Reg]: 6
@[in:imuxsock]: 6
@[sudo]: 7
@[systemd]: 9
@[auditd]: 9
@[systemd-network]: 12
@[sshd]: 18
@[unattended-upgr]: 20
@[ntpd]: 20
@[systemd-logind]: 32
@[ulogd]: 34
@[dbus-daemon]: 36
@[nftables-export]: 66
@[bpftrace]: 105
@[systemd-journal]: 183
@[splunkd]: 2582
@[collectd]: 2684
@[sqlx-sqlite-wor]: 27709
@[tokio-runtime-w]: 36179

So that's more than 6k syscalls per second for the "nym threads" combined. Which is a lot but not critically high. Nevertheless I cannot see how that corresponds to the fact that it's mostly idling. That is, the primary thing that it does is probably to create cover traffic. How can that be so heavy on syscalls?

Sidenote: The gateway (while being mostly idle) produces a constant network traffic of roughly 5Mbps rx + 4.5Mbps tx = 9.5Mbps. This splits up into (roughly) 60% for port 1789 and 40% for port 9000. That seems to me like a lot for "cover traffic."

Next I sampled (roughly) ten seconds to trace the number of syscalls per syscall (some omitted):

$ sudo bpftrace -e 'tracepoint:syscalls:sys_enter_* { @[probe] = count(); }'
Attaching 336 probes...
^C
...
@[tracepoint:syscalls:sys_enter_writev]: 1973
@[tracepoint:syscalls:sys_enter_sendto]: 3440
@[tracepoint:syscalls:sys_enter_fcntl]: 6584
@[tracepoint:syscalls:sys_enter_recvfrom]: 7806
@[tracepoint:syscalls:sys_enter_lseek]: 8350
@[tracepoint:syscalls:sys_enter_write]: 10021
@[tracepoint:syscalls:sys_enter_read]: 11443
@[tracepoint:syscalls:sys_enter_epoll_wait]: 11511
@[tracepoint:syscalls:sys_enter_futex]: 28042

So this looks a lot like a lot of read/write/lock activities. This corresponds to os-level metrics that show that the process (with two clients attached most of the time) is constantly performing 300 write ops/s. Which is also something I cannot see a plausible reason for:

gateway disk write ops

Additionally, the amount of consumed CPU system-time relative to user-time is very high. This is also a plausible symptom corresponding to the high rate of syscalls:

gateway cpu %

From some experiments with attaching an additional (=third) client to the gateway, I would further estimate that each connected client adds roughly another 150 write ops/s:

added a third client

The observations combined suggest that a gateway will not scale above a few (<20 or so) connected clients. A load test (i.e. attaching a lot of clients to a specific gateway) should be done to validate this. A workable way to do this might be to use the members of the current grantees group.

Expected behaviour

Given the network structure, a gateway should (imho) at least be capable of serving a few hundred clients. And being able to serve a few thousand clients would certainly also not be an exaggerated scenario.

I'd think that in order to reach a reasonable efficiency, it will be required to severely (thinking 1-2 orders of magnitude) bring down the number of syscalls as well as the number of disk writes. They might be caused by libs or frameworks which are currently in use.

Steps to Reproduce

Which area of Nym were you using?

sven-hash commented 2 years ago

Do we have an update about the resolution of this problem ?

ststefa commented 1 year ago

@tommyv1987, if you need any more input on this topic, please feel free to ask