Closed imReker closed 1 year ago
Dumped File descriptor info, a DNS query from APP makes shadowsocks create at least 2 handle in VpnService, first one is local_dns_path
of UDP query, and second is protect_path
of TCP query.
fd list size = 928
fd list- 1bf: SOCK: socket:[23488393] UNIX / -- /
fd list- 1c0: SOCK: socket:[23472492] UNIX /data/user_de/0/org.shadowsocks.xx/no_backup/local_dns_path -- /dev/socket/dnsproxyd
fd list- 1c2: SOCK: socket:[23490568] UNIX /data/user_de/0/org.shadowsocks.xx/no_backup/protect_path -- /dev/socket/dnsproxyd
fd list- 1c3: SOCK: socket:[23478133] UNIX / -- /
fd list- 1c4: SOCK: socket:[23466596] UNIX / -- /
fd list- 1c6: SOCK: socket:[23478364] UNIX / -- /
fd list- 1c7: SOCK: socket:[23466600] UNIX / -- /
fd list- 1c8: SOCK: socket:[23468862] UNIX / -- /
fd list- 1c9: SOCK: socket:[23490569] UNIX /data/user_de/0/org.shadowsocks.xx/no_backup/protect_path -- /dev/socket/dnsproxyd
fd list- 1ca: SOCK: socket:[23466603] UNIX / -- /
fd list- 1cb: SOCK: socket:[23478366] UNIX / -- /
fd list- 1cc: SOCK: socket:[23488399] UNIX / -- /
fd list- 1ce: SOCK: socket:[23466607] UNIX / -- /
fd list- 1cf: SOCK: socket:[23484823] UNIX / -- /
fd list- 1d1: SOCK: socket:[23476731] UNIX /data/user_de/0/org.shadowsocks.xx/no_backup/local_dns_path -- /dev/socket/dnsproxyd
fd list- 1d2: SOCK: socket:[23490570] UNIX /data/user_de/0/org.shadowsocks.xx/no_backup/protect_path -- /dev/socket/dnsproxyd
fd list- 1d3: SOCK: socket:[23467117] UNIX / -- /
fd list- 1d5: SOCK: socket:[23476736] UNIX /data/user_de/0/org.shadowsocks.xx/no_backup/local_dns_path -- /dev/socket/dnsproxyd
fd list- 1d6: SOCK: socket:[23488418] UNIX /data/user_de/0/org.shadowsocks.xx/no_backup/protect_path -- /dev/socket/dnsproxyd
fd list- 1d7: SOCK: socket:[23480411] UNIX /data/user_de/0/org.shadowsocks.xx/no_backup/local_dns_path -- /dev/socket/dnsproxyd
fd list- 1d8: SOCK: socket:[23479978] UNIX / -- /
fd list- 1d9: SOCK: socket:[23461630] UNIX / -- /
fd list- 1da: SOCK: socket:[23467746] UNIX /data/user_de/0/org.shadowsocks.xx/no_backup/local_dns_path -- /dev/socket/dnsproxyd
fd list- 1db: SOCK: socket:[23479988] UNIX / -- /
fd list- 1dc: SOCK: socket:[23467121] UNIX / -- /
fd list- 1dd: SOCK: socket:[23467126] UNIX / -- /
fd list- 1de: SOCK: socket:[23467748] UNIX /data/user_de/0/org.shadowsocks.xx/no_backup/local_dns_path -- /dev/socket/dnsproxyd
fd list- 1df: SOCK: socket:[23480413] UNIX /data/user_de/0/org.shadowsocks.xx/no_backup/local_dns_path -- /dev/socket/dnsproxyd
fd list- 1e0: SOCK: socket:[23466611] UNIX / -- /
fd list- 1e1: SOCK: socket:[23479119] UNIX / -- /
fd list- 1e2: SOCK: socket:[23480797] UNIX / -- /
fd list- 1e3: SOCK: socket:[23478626] UNIX /data/user_de/0/org.shadowsocks.xx/no_backup/local_dns_path -- /dev/socket/dnsproxyd
fd list- 1e4: SOCK: socket:[23480440] UNIX /data/user_de/0/org.shadowsocks.xx/no_backup/local_dns_path -- /dev/socket/dnsproxyd
fd list- 1e5: SOCK: socket:[23466617] UNIX / -- /
fd list- 1e6: SOCK: socket:[23477117] UNIX / -- /
fd list- 1e7: SOCK: socket:[23432212] UNIX / -- /
fd list- 1e8: SOCK: socket:[23480445] UNIX /data/user_de/0/org.shadowsocks.xx/no_backup/local_dns_path -- /dev/socket/dnsproxyd
fd list- 1e9: SOCK: socket:[23480447] UNIX /data/user_de/0/org.shadowsocks.xx/no_backup/local_dns_path -- /dev/socket/dnsproxyd
.................
These seem normal. These fds are not closed? Does your server connection work properly?
These seem normal. These fds are not closed? Does your server connection work properly?
Most of them will be closed, or crash because of Too many open files
.
In some terrible network environment, UDP packet loss rate can be very high, and this issue would be triggered.
Key point of this issue is ifferent timeout in Java and Rust side:
LocalDnsWorker
use Java's getAllByName
, it's timeout is defined by system, usually 90 seconds. But in Rust side, the timeout is 5 seconds.
So, when the network is very slow or UDP filtered, a DNS query send to Java side, it will wait for system I/O until 90s timeout, but Rust side will fail in 5s and return the fail result to App who made the DNS request. And App will then request again, if App retry without any interval and count limit, LocalDnsWorker.accept
will create thousands socket FD in this 90 seconds.
But I still don't know why socket of protect_path
is leaked too.
It is technically not a leak if they are eventually closed? Although I am down to tweak timeouts. Where did you find the 90s timeout?
90s timeout is an experience value by the log, it's not accurate.
Though, it's not a traditional 'leak', but I think it is still an issue because of the different timeout and the thousands DNS retries it caused. Maybe 'deny of service' is more accurate?
Currently, to solve this issue, I set a counter in LocalDnsWorker.accept
, when pending DNS queries over 200, the accept
just return an empty response to sslocal (this limit could be done in sslocal either).
I think correct method to fix this issue is replace getAllByName
by dnsjava, which can set a timeout on query. But we need modify it and makes caller can set a Network
for it to create socket.
Sounds good. I will take a look sometime.
Does this issue go away if you use the "All" Route?
Currently I use ACL with Bypass Lan. I think this issue doesn't exists in 'All' route case since DNS query will not be passed to Java side (so no extra FDs created) and it has 5s timeout.
And, maybe unix socket connection reuse ( ref #2751 ) is still needed? Because rust will make 2-3 DNS queries for 1 connection, there still has very little chance to create over 1000 FDs before the 5s timeout.
@Mygod
I modified a little code of dnsjava(mainly Network
related works and Java8/Android adaptation) and it works!
Only downside is dnsjava 3.4.1 doesn't support Android 6.x because of Java NIO. (Old version support Android 6.x but it use blocking socket, so may still result same issue)
I'll perform a stress test again tomorrow.
Closing as Android versions too old.
LocalDnsWorker.accept
will throwBroken pipe
when UDP is filtered or network is disconnected. And then, if DNS query continue incoming, the unix socks handle ofVpnService
process will exceeds handle limit which is 1024 (32768 on Android 9.0 and newer), so finallyVpnService
process will get exceptionToo many opened files
andBad file descriptor
everywhere. Meanwhile, because Java side UDP DNS query is timeout, sslocal will send TCP DNS query with 'java protected' socket, which create same amount of socket handles in sslocal. (Why sslocal makes a TCP query again?) As a result, bothVpnService
and sslocal crash at random time.Logs: org.shadowsocks.xx_issue_19bc73ad993aad4d5fe278892d584231_error_session_61279182004D00013C85A04AC568A81B_DNE_5_v2.log org.shadowsocks.xx_issue_274bc2d242720049275714683d3d4cc5_error_session_6127B31401C900010D2CF9C39D05D8E2_DNE_0_v2.log org.shadowsocks.xx_issue_2d5e1ddcbf72ff6f25953b540bd48ff5_error_session_6127C4D9026F00011AC0A04AC568A81B_DNE_0_v2.log