shadowsocks / shadowsocks-rust

A Rust port of shadowsocks
https://shadowsocks.org/
MIT License
8.65k stars 1.17k forks source link

windows server端经常卡死 #899

Closed moqi2011 closed 9 months ago

moqi2011 commented 2 years ago

版本是shadowsocks-v1.14.3.x86_64-pc-windows-gnu.zip

运行一段时间后毫无征兆卡住,客户端连接不上,然后进服端按ctrl+c程序不会退出,出现一些连接失败的日志,然后又恢复正常。

下面是按ctrl+c后输出的日志

2022-07-17T17:41:50.793324300+08:00 ERROR tcp tunnel 192.168.1.3:54210 -> appcloud2.in.zhihu.com:443 connect failed, error: dns resolve appcloud2.in.zhihu.com:443 error: 不知道这样的主机。 (os error 11001)
2022-07-17T17:55:05.862293900+08:00 ERROR tcp tunnel 192.168.1.3:54304 -> p214-acsegateway.icloud.com.cn:443 connect failed, error: dns resolve p214-acsegateway.icloud.com.cn:443 error: 不知道这样的主机。 (os error 11001)
2022-07-17T17:55:05.992211500+08:00 ERROR tcp tunnel 192.168.1.3:54357 -> p214-acsegateway.icloud.com.cn:443 connect failed, error: dns resolve p214-acsegateway.icloud.com.cn:443 error: 不知道这样的主机。 (os error 11001)
zonyitoo commented 2 years ago

os error 11001

The socket library returned 11001 error code when ssserver was trying to resolve domain names. I am not familiar to Windows, it seems that 11001 is related to Windows' DNS configuration: https://www.remoteutilities.com/support/kb/socket-error-11001-host-not-found/

moqi2011 commented 2 years ago

It's normal for a network access error to occur, but why would it cause a denial of service for the entire program? Is there any way for him to ignore this error and continue to serve subsequent requests.

os error 11001

The socket library returned 11001 error code when ssserver was trying to resolve domain names. I am not familiar to Windows, it seems that 11001 is related to Windows' DNS configuration: https://www.remoteutilities.com/support/kb/socket-error-11001-host-not-found/

zonyitoo commented 2 years ago

Well, in this case, there should be something wrong in your DNS resolver that blocks the trust-dns resolver to resolve domain names. trust-dns resolver has a default 5 seconds timeout, so every connections may have to wait for at least 5 seconds to know that there was something wrong in DNS resolution.

https://github.com/shadowsocks/shadowsocks-rust/blob/f533195d015948268d99f537f89cc2f061a30869/crates/shadowsocks-service/src/dns/mod.rs#L17-L23

You may try to start the ssserver with environment variable SS_SYSTEM_DNS_RESOLVER_FORCE_BUILTIN=1 to use system builtin DNS resolution API and see if you can observe the same problem.

moqi2011 commented 2 years ago

This configuration may solve the dns problem. But the root of this problem has not been solved. Other network problems also seem to cause a denial of service throughout the program. For example, in the log below, when I press ctrl+c, the program returns to normal.

2022-07-19T11:53:22.930130600+08:00 WARN  handshake failed, maybe wrong method or key, or under replay attacks. peer: 194.247.178.81:43856, error: invalid tag-in
2022-07-19T13:25:06.097286700+08:00 WARN  handshake failed, maybe wrong method or key, or under replay attacks. peer: 192.241.219.61:43512, error: invalid tag-in
2022-07-19T13:25:06.098302100+08:00 WARN  handshake failed, maybe wrong method or key, or under replay attacks. peer: 192.241.219.80:60240, error: invalid tag-in
2022-07-19T13:25:06.099500100+08:00 WARN  handshake failed, maybe wrong method or key, or under replay attacks. peer: 210.3.15.174:56665, error: invalid tag-in

The root cause seems to be that it cannot handle multiple client requests at the same time, where can I configure the number of threads? @zonyitoo Thanks for your answer

zonyitoo commented 2 years ago

I don't think it is related to the number of threads, because the whole program is running with multi-coroutines on multi-threads.

https://github.com/shadowsocks/shadowsocks-rust/blob/master/src/monitor/windows.rs#L9

The ctrl-c signal should be captured by this task and kill the program entirely. So what exactly triggered when you pressed ctrl+c? I am not an expert of Windows, I really don't know what was happening.

Did you enabled fast_open? Try to disable it and try again.

moqi2011 commented 2 years ago

I don't have it enabledfast_open.Press ctrl+c I also think it should stop the program.But this is not the case, even after returning to normal, sometimes pressing ctrl+c cannot exit the program normally. It takes about three repetitions to exit.

zonyitoo commented 2 years ago

Interesting. So who consumed the ctrl+c signal? Do you have any ideas?

moqi2011 commented 2 years ago

Did the log component cause a deadlock? Because every time I get stuck there is log output when I press ctrl+c.

moqi2011 commented 2 years ago

Normally the log output should be when something happens, but looking at the time display above is not what happened when I pressed ctrl+c. When the program is stuck, there are no logs except that the port listen is successful, but when I press ctrl+c I lose those logs.

zonyitoo commented 2 years ago

https://github.com/shadowsocks/shadowsocks-rust/blob/master/Cargo.toml#L58

You could try to compile a binary without any logging facilities. Just remove this "logging" feature and compile.

moqi2011 commented 2 years ago

OK, thanks.

database64128 commented 2 years ago

Because the symptoms match, just in case you didn't already know, if you are using the legacy conhost (not the new Windows Terminal or Git Bash), selecting text will suspend the running program, and pressing ^C will unselect and resume execution of the program.

moqi2011 commented 2 years ago

After disabling the log module, it ran stably for 24 hours without any stuck.

zonyitoo commented 2 years ago

Since you have disabled the log module, there will be nothing output to the console when the program running. So your problem must be related to the console output. Did you check the "legacy conhost" as database64128 mentioned?

moqi2011 commented 2 years ago

My system is windows server 2019. I try cmd and PowerShell. After starting the program I switched it to the background. I didn't do anything with the computer before it got stuck.

dev4u commented 2 years ago

disable Quick Edit Mode in console option

moqi2011 commented 9 months ago

disable Quick Edit Mode in console option

This is the correct answer.