Realm got crashed due to panic a few hours later

zhboner / realm

A network relay tool

MIT License

1.54k stars 285 forks source link

Realm got crashed due to panic a few hours later #38

Closed fuyutsuki closed 3 years ago

fuyutsuki commented 3 years ago

Describe the bug Realm got crashed due to panic a few hours later with below message

thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 24, kind: Other, message: "Too many open files" }', src/relay/udp.rs:30:88
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I'm in the process of verifying to get a backtrace...

To Reproduce

Run ./realm -l 0.0.0.0:50000 -r x.x.x.x:50000 (udp)
Get error

Expected behavior Don't panicking

Screenshots None

Environment

Kernel: Linux version 5.13.12-200.fc34.x86_64 (mockbuild@bkernel01.iad2.fedoraproject.org) (gcc (GCC) 11.2.1 20210728 (Red Hat 11.2.1-1), GNU ld version 2.35.2-4.fc34) #1 SMP Wed Aug 18 13:27:18 UTC 2021
OS: Fedora release 34 (Thirty Four)
rustc: rustc 1.54.0 (Fedora 1.54.0-1.fc34)
Cargo: cargo 1.54.0

Additional context Add any other context about the problem here.

Soniccube commented 3 years ago

You should increase the max file descriptor limit by ulimit -n 51200

here is more optimizations for your server https://github.com/shadowsocks/shadowsocks/wiki/Optimizing-Shadowsocks

fuyutsuki commented 3 years ago

Thanks for the reply, I'll try it!

fuyutsuki commented 3 years ago

Hi, the next one crashes with the following error, is there any solution?

thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 98, kind: AddrInUse, message: "Address already in use" }', src/udp.rs:29:88

zephyrchien commented 3 years ago

It seems there was no more available port to be allocated, the OS was complaining. I have to point out that there is a potential resource leak in the current code, for which I am responsible.

The udp module was initially created by me a few months ago, to solve #26. It is not so well designed(I supposed that people seldom use this program to forward udp, so I tried to minimize the overhead..). After the merge(#27), it just works.

There is only one global timer, which finally clears states and close fds. If the relay continues to receive UDP packets, the cleaning up logic will never be executed, thus leading to a resource leak. Morever, the hardcoded TIMEOUT value is 15min, making the problem even more severe.

The solution is simple: set timeout for each socket & set a proper TIMEOUT value.

fuyutsuki commented 3 years ago

I tried to fix it by tweaking the timeout value, but I wasn't able to set the timeout for every single socket, thank you!