Open josecelano opened 2 months ago
It seems there are a couple of IPs making a lot of requests:
From: 2024-11-15T11:50:26.058130Z (initial record time in logs) To: 2024-11-15T11:59:14.013546Z (final record time in logs)
This is the number of records containing the error per IP:
11249 ***.181.47.215
10321 ***.38.196.30
1499 ***.223.68.238
951 ***.119.120.243
943 ***.0.0.0
935 ***.101.195.3
834 ***.65.200.220
630 ***.203.102.228
574 ***.223.5.76
535 ***.155.138.252
460 ***.151.164.47
428 ***.69.249.56
393 ***.222.100.18
386 ***.29.114.32
363 ***.0.5.10
347 ***.234.37.79
308 ***.89.28.234
301 ***.229.191.13
288 ***.97.50.183
276 ***.158.172.143
274 ***.97.50.172
258 ***.107.158.197
247 ***.229.106.137
241 ***.226.159.156
214 ***.149.119.171
211 ***.166.14.69
206 ***.178.10.168
206 ***.98.131.32
200 ***.163.118.85
195 ***.74.11.23
192 ***.147.21.36
191 ***.141.206.245
181 ***.18.34.133
177 ***.164.14.112
170 ***.44.119.149
151 ***.80.9.163
149 ***.233.109
144 ***.95.40.108
138 ***.80.9.243
136 ***.81.185.211
134 ***.111.17.112
127 ***.52.86.157
127 ***.80.9.208
127 ***.203.250.69
125 ***.182.100.130
121 ***.72.33.10
121 ***.172.5.182
120 ***..80.9.65
119 ***.88.128.13
119 ***.90.177.100
118 ***.87.174.105
114 ***.141.100.59
112 ***.219.30.138
112 ***.225.218.245
110 ***.140.134.59
105 ***.49.128.219
104 ***.19.60.153
100 ***.18.134.24
I've tried with another UDP tracker client and it works:
I've added a new UDP tracker to the demo on a new port 6868
. It's working using our tracker client and also qBitorrent:
$ cargo run --bin udp_tracker_client announce 144.126.245.19:6868 9c38422213e30bff212b30c360d26f9a02136422
Compiling torrust-tracker-client v3.0.0-develop (/home/josecelano/Documents/git/committer/me/github/torrust/torrust-tracker/console/tracker-client)
Finished `dev` profile [optimized + debuginfo] target(s) in 7.78s
Running `/home/josecelano/Documents/git/committer/me/github/torrust/torrust-tracker/target/debug/udp_tracker_client announce '144.126.245.19:6868' 9c38422213e30bff212b30c360d26f9a02136422`
sending connection request...
connection_id: ConnectionId(I64(-9193186782767578663))
sending announce request...
{
"AnnounceIpv4": {
"transaction_id": -888840697,
"announce_interval": 300,
"leechers": 0,
"seeders": 1,
"peers": []
}
}
Either, the clients are not sending the connection ID correctly or there is a bug when the load is high.
I'm going to enable the trace level to check if the clients are sending the connection ID they get in the connect
request.
cc @da2ce7
I think there is another possible cause. If announce requests take longer than 2 minutes to be processed, the connection ID is expired.
Let's move to symmetric encrypted connection ID's so that we can give an error response telling when it expired, (or just invalid).
@josecelano I have created a draft (unfinished for now) implmentation of the encrypted connection ID:
@josecelano I have created a draft (unfinished for now) implmentation of the encrypted connection ID:
Hi @da2ce7, cool, I will take a look tomorrow. We have to check the performance, I think the hashing for the connection ID was one of the things that takes a big percentage of the total time of request processing.
Hi @da2ce7, I have updated the tracker demo after merging your PR. I've checked the logs, and we are still getting many errors. However, now we know the exact reason:
tracker | 2024-11-19T11:55:34.607629Z ERROR process_request:handle_packet{cookie_time_values=CookieTimeValues { issue_time: 1732017334.607587, valid_range: 1732017213.607587..1732017335.607587 }}:handle_request{cookie_time_values=CookieTimeValues { issue_time: 1732017334.607587, valid_range: 1732017213.607587..1732017335.607587 }}:handle_announce{remote_addr=x.x.243.90:55801 announce_request=AnnounceRequest { connection_id: ConnectionId(I64(5089751780245633787)), action_placeholder: AnnounceActionPlaceholder(I32(1)), transaction_id: TransactionId(I32(86974521)), info_hash: InfoHash([3, 193, 2, 218, 90, 252, 215, 97, 138, 105, 135, 38, 246, 146, 158, 204, 101, 126, 251, 181]), peer_id: PeerId([45, 113, 66, 52, 54, 51, 65, 45, 40, 53, 55, 107, 45, 88, 126, 116, 102, 121, 90, 121]), bytes_downloaded: NumberOfBytes(I64(2441216)), bytes_left: NumberOfBytes(I64(5933010514)), bytes_uploaded: NumberOfBytes(I64(0)), event: AnnounceEventBytes(I32(0)), ip_address: Ipv4AddrBytes([0, 0, 0, 0]), key: PeerKey(I32(-1872933448)), peers_wanted: NumberOfPeers(I32(200)), port: Port(U16(6279)) } cookie_valid_range=1732017213.607587..1732017335.607587}: torrust_tracker::servers::udp::handlers: error=connection id produces a future value
tracker | 2024-11-19T11:55:34.608274Z ERROR process_request:handle_packet{cookie_time_values=CookieTimeValues { issue_time: 1732017334.6081936, valid_range: 1732017213.6081936..1732017335.6081936 }}:handle_request{cookie_time_values=CookieTimeValues { issue_time: 1732017334.6081936, valid_range: 1732017213.6081936..1732017335.6081936 }}:handle_announce{remote_addr=x.x.243.90:55801 announce_request=AnnounceRequest { connection_id: ConnectionId(I64(5089751780245633787)), action_placeholder: AnnounceActionPlaceholder(I32(1)), transaction_id: TransactionId(I32(-1123371308)), info_hash: InfoHash([177, 53, 173, 166, 50, 153, 156, 70, 195, 247, 241, 9, 154, 50, 139, 168, 220, 166, 79, 5]), peer_id: PeerId([45, 113, 66, 52, 54, 51, 65, 45, 67, 51, 76, 126, 119, 117, 73, 90, 49, 105, 69, 99]), bytes_downloaded: NumberOfBytes(I64(859106175)), bytes_left: NumberOfBytes(I64(285212672)), bytes_uploaded: NumberOfBytes(I64(450582314)), event: AnnounceEventBytes(I32(0)), ip_address: Ipv4AddrBytes([0, 0, 0, 0]), key: PeerKey(I32(1502953212)), peers_wanted: NumberOfPeers(I32(200)), port: Port(U16(6279)) } cookie_valid_range=1732017213.6081936..1732017335.6081936}: torrust_tracker::servers::udp::handlers: error=connection id produces a future value
It would be convenient to include the unencrypted issue time in the logs to know the exact value. Anyway, I'm going to continue with my debug plan. I want to check if these failing requests have a previous connect
request from the same IP. If not, this could be just spam or a DDoS attack.
@josecelano I will try and make another PR that makes the log messages nicer for this case.
Hi @da2ce7, I'm getting many announce request errors from my BitTorrent client. I think it is because the demo tracker is not responding in time. The server CPU has been under 50% in the last 14 days. I suppose the problem is that we have a lot of request errors in the logs for this issue, and that makes the UDP tracker slower because it has to write the error to the log file.
I think we should either:
Could it be that some clients don't make the connect request, or they make it wrong?
Here is what the KI says.
Given your requirements, using a Counting Bloom Filter (CBF) or a similar probabilistic data structure can be adapted to handle this scenario effectively. Here's how you could implement a solution that addresses your concerns:
Hierarchical Counting Bloom Filters:
Individual IP Layer: Use one CBF to track individual IP addresses. This will allow for fine-grained detection of misbehaving IPs.
Subnet Layer: Implement another level of CBFs where each filter covers a subnet range. This allows for detecting patterns of misbehavior across a subnet without penalizing neighboring IPs inappropriately.
Individual IP CBF:
Subnet CBF:
Granularity: This approach gives you both detailed control over individual IPs and the ability to detect broader patterns of misbehavior within subnets.
Performance: CBFs are efficient in terms of memory usage and speed, allowing for quick lookups and updates even with large datasets.
Flexibility: You can adjust the thresholds for individual IPs and subnets separately, allowing for different levels of tolerance based on your policy or observed behavior patterns.
False Positives: While CBFs have a small chance of false positives, by using multiple levels (individual and subnet), you can mitigate the impact. For example, if an IP is flagged at both levels, it's more likely to be a true positive.
Configuration: You need to decide on the size of the CBFs, the number of hash functions, and the error thresholds for both individual IPs and subnets. This requires some experimentation or simulation to find the right balance between false positives, memory usage, and effectiveness.
Complexity: Managing two layers of CBFs introduces additional complexity in terms of implementation and maintenance.
False Positives at Subnet Level: If a subnet contains both misbehaving and well-behaving IPs, the well-behaving ones might suffer from the actions taken against the subnet.
Decide on the Subnet Size: Determine what constitutes a subnet for your purposes (e.g., /24, /16, etc.).
Initialize CBFs:
Error Handling Logic:
Action Protocol:
Decay Mechanism: Implement a decay or aging process for counts to ensure that past behavior doesn't indefinitely affect current interactions unless the behavior persists.
By employing this hierarchical approach with Counting Bloom Filters, you can effectively manage IP-based errors at different levels of granularity, protecting your network's performance while minimizing the impact on innocent IPs.
Hi @da2ce7, I've downloaded 3-minute tracker logs with trace
level.
Socket addresses ordered by number of logs records (10 first ones):
$ grep -oP "\d+\.\d+\.\d+\.\d+:\d+" ./tracker_logs.txt | sort | uniq -c | sort -nr | head
84398 0.0.0.0:6969
4632 *.142.69.160:53234
3311 *.248.77.85:38473
2581 *.241.177.250:43822
2276 *.182.117.236:43124
2216 *.86.246.61:1024
1895 *.118.235.66:6881
1846 *.38.196.30:6816
1845 *.181.47.215:6817
1818 *.181.47.215:6812
IP addresses ordered by number of logs records (10 first ones):
$ grep -oP "\d+\.\d+\.\d+\.\d+" ./tracker_logs.txt | sort | uniq -c | sort -nr | head
85291 0.0.0.0
13435 *.38.196.30
12789 *.181.47.215
5052 *.65.200.220
4869 *.119.120.243
4785 *.79.3.85
4632 *.142.69.160
3895 *.248.77.85
3734 *.165.243.54
3403 *.236.91.11
Socket addresses (for a given IP, the second one in number of logs records) ordered by number of logs records (10 first ones):
$ grep -oP "*.38.196.30:.\d+" ./tracker_logs.txt | sort | uniq -c | sort -nr | head
1846 *.38.196.30:6816
1755 *.38.196.30:6812
1728 *.38.196.30:6814
1710 *.38.196.30:6811
1689 *.38.196.30:6817
1602 *.38.196.30:6813
1566 *.38.196.30:6818
1539 *.38.196.30:6815
I've also checked one socket address. The client is making the connect request, but it's not using the received ID in the announce
request. It's always using this connection ID:
ConnectionId(I64(-2325979768337461225))
The 0.0.0.0
IP on the lists above are due to these log records:
tracker | 2024-11-20T10:04:09.991917Z TRACE run_udp_server_main{cookie_lifetime=120s}: UDP TRACKER: Udp::run_udp_server (wait for request) local_addr="udp://0.0.0.0:6969"
tracker | 2024-11-20T10:04:09.991942Z TRACE run_udp_server_main{cookie_lifetime=120s}: UDP TRACKER: Udp::run_udp_server::loop (in) local_addr="udp://0.0.0.0:6969"
I guess this is a bad client implementation. I can email you the logs if you want to check other things.
@josecelano I will try and make another PR that makes the log messages nicer for this case.
I've been using the demo tracker logs, and there are a lot of errors verifying the connection ID: