torrust / torrust-demo

CI and configuration for the Torrust live demo.
https://index.torrust-demo.com
0 stars 1 forks source link

Error `connection id could not be verified` #14

Open josecelano opened 2 months ago

josecelano commented 2 months ago

I've been using the demo tracker logs, and there are a lot of errors verifying the connection ID:

tracker  | 2024-09-10T12:07:18.588007Z ERROR process_request:handle_packet:handle_request:handle_announce{remote_addr=x.x.x.x:5330 announce_request=AnnounceRequest { connection_id: ConnectionId(I64(5555803590518745593)), action_placeholder: AnnounceActionPlaceholder(I32(1)), transaction_id: TransactionId(I32(-1689416333)), info_hash: InfoHash([150, 11, 1, 98, 95, 68, 96, 255, 224, 206, 19, 87, 8, 196, 243, 90, 108, 48, 40, 241]), peer_id: PeerId([45, 66, 67, 48, 50, 48, 50, 45, 62, 61, 89, 40, 3, 137, 229, 116, 139, 237, 6, 109]), bytes_downloaded: NumberOfBytes(I64(0)), bytes_left: NumberOfBytes(I64(7667712)), bytes_uploaded: NumberOfBytes(I64(0)), event: AnnounceEventBytes(I32(0)), ip_address: Ipv4AddrBytes([0, 0, 0, 0]), key: PeerKey(I32(46505)), peers_wanted: NumberOfPeers(I32(200)), port: Port(U16(20020)) }}: torrust_tracker::servers::udp::handlers: error=connection id could not be verified
tracker  | 2024-09-10T12:07:18.602411Z ERROR process_request:handle_packet:handle_request:handle_announce{remote_addr=x.x.x.x:3210 announce_request=AnnounceRequest { connection_id: ConnectionId(I64(-8281966802190684203)), action_placeholder: AnnounceActionPlaceholder(I32(1)), transaction_id: TransactionId(I32(1028202677)), info_hash: InfoHash([235, 97, 210, 21, 134, 188, 225, 199, 111, 88, 34, 70, 227, 133, 53, 163, 223, 212, 16, 219]), peer_id: PeerId([45, 66, 67, 48, 49, 57, 50, 45, 179, 134, 156, 221, 255, 153, 186, 188, 248, 43, 205, 103]), bytes_downloaded: NumberOfBytes(I64(0)), bytes_left: NumberOfBytes(I64(0)), bytes_uploaded: NumberOfBytes(I64(0)), event: AnnounceEventBytes(I32(0)), ip_address: Ipv4AddrBytes([0, 0, 0, 0]), key: PeerKey(I32(35442)), peers_wanted: NumberOfPeers(I32(200)), port: Port(U16(17545)) }}: torrust_tracker::servers::udp::handlers: error=connection id could not be verified
josecelano commented 2 weeks ago

It seems there are a couple of IPs making a lot of requests:

From: 2024-11-15T11:50:26.058130Z (initial record time in logs) To: 2024-11-15T11:59:14.013546Z (final record time in logs)

This is the number of records containing the error per IP:

  11249 ***.181.47.215
  10321 ***.38.196.30
   1499 ***.223.68.238
    951 ***.119.120.243
    943 ***.0.0.0
    935 ***.101.195.3
    834 ***.65.200.220
    630 ***.203.102.228
    574 ***.223.5.76
    535 ***.155.138.252
    460 ***.151.164.47
    428 ***.69.249.56
    393 ***.222.100.18
    386 ***.29.114.32
    363 ***.0.5.10
    347 ***.234.37.79
    308 ***.89.28.234
    301 ***.229.191.13
    288 ***.97.50.183
    276 ***.158.172.143
    274 ***.97.50.172
    258 ***.107.158.197
    247 ***.229.106.137
    241 ***.226.159.156
    214 ***.149.119.171
    211 ***.166.14.69
    206 ***.178.10.168
    206 ***.98.131.32
    200 ***.163.118.85
    195 ***.74.11.23
    192 ***.147.21.36
    191 ***.141.206.245
    181 ***.18.34.133
    177 ***.164.14.112
    170 ***.44.119.149
    151 ***.80.9.163
    149 ***.233.109
    144 ***.95.40.108
    138 ***.80.9.243
    136 ***.81.185.211
    134 ***.111.17.112
    127 ***.52.86.157
    127 ***.80.9.208
    127 ***.203.250.69
    125 ***.182.100.130
    121 ***.72.33.10
    121 ***.172.5.182
    120 ***..80.9.65
    119 ***.88.128.13
    119 ***.90.177.100
    118 ***.87.174.105
    114 ***.141.100.59
    112 ***.219.30.138
    112 ***.225.218.245
    110 ***.140.134.59
    105 ***.49.128.219
    104 ***.19.60.153
    100 ***.18.134.24
josecelano commented 2 weeks ago

I've tried with another UDP tracker client and it works:

image

josecelano commented 2 weeks ago

I've added a new UDP tracker to the demo on a new port 6868. It's working using our tracker client and also qBitorrent:

$ cargo run --bin udp_tracker_client  announce 144.126.245.19:6868 9c38422213e30bff212b30c360d26f9a02136422
   Compiling torrust-tracker-client v3.0.0-develop (/home/josecelano/Documents/git/committer/me/github/torrust/torrust-tracker/console/tracker-client)
    Finished `dev` profile [optimized + debuginfo] target(s) in 7.78s
     Running `/home/josecelano/Documents/git/committer/me/github/torrust/torrust-tracker/target/debug/udp_tracker_client announce '144.126.245.19:6868' 9c38422213e30bff212b30c360d26f9a02136422`
sending connection request...
connection_id: ConnectionId(I64(-9193186782767578663))
sending announce request...
{
  "AnnounceIpv4": {
    "transaction_id": -888840697,
    "announce_interval": 300,
    "leechers": 0,
    "seeders": 1,
    "peers": []
  }
}

image

Either, the clients are not sending the connection ID correctly or there is a bug when the load is high.

I'm going to enable the trace level to check if the clients are sending the connection ID they get in the connect request.

cc @da2ce7

josecelano commented 2 weeks ago

I think there is another possible cause. If announce requests take longer than 2 minutes to be processed, the connection ID is expired.

da2ce7 commented 2 weeks ago

Let's move to symmetric encrypted connection ID's so that we can give an error response telling when it expired, (or just invalid).

da2ce7 commented 2 weeks ago

@josecelano I have created a draft (unfinished for now) implmentation of the encrypted connection ID:

https://github.com/torrust/torrust-tracker/pull/1091

josecelano commented 2 weeks ago

@josecelano I have created a draft (unfinished for now) implmentation of the encrypted connection ID:

https://github.com/torrust/torrust-tracker/pull/1091

Hi @da2ce7, cool, I will take a look tomorrow. We have to check the performance, I think the hashing for the connection ID was one of the things that takes a big percentage of the total time of request processing.

josecelano commented 2 weeks ago

Hi @da2ce7, I have updated the tracker demo after merging your PR. I've checked the logs, and we are still getting many errors. However, now we know the exact reason:

tracker  | 2024-11-19T11:55:34.607629Z ERROR process_request:handle_packet{cookie_time_values=CookieTimeValues { issue_time: 1732017334.607587, valid_range: 1732017213.607587..1732017335.607587 }}:handle_request{cookie_time_values=CookieTimeValues { issue_time: 1732017334.607587, valid_range: 1732017213.607587..1732017335.607587 }}:handle_announce{remote_addr=x.x.243.90:55801 announce_request=AnnounceRequest { connection_id: ConnectionId(I64(5089751780245633787)), action_placeholder: AnnounceActionPlaceholder(I32(1)), transaction_id: TransactionId(I32(86974521)), info_hash: InfoHash([3, 193, 2, 218, 90, 252, 215, 97, 138, 105, 135, 38, 246, 146, 158, 204, 101, 126, 251, 181]), peer_id: PeerId([45, 113, 66, 52, 54, 51, 65, 45, 40, 53, 55, 107, 45, 88, 126, 116, 102, 121, 90, 121]), bytes_downloaded: NumberOfBytes(I64(2441216)), bytes_left: NumberOfBytes(I64(5933010514)), bytes_uploaded: NumberOfBytes(I64(0)), event: AnnounceEventBytes(I32(0)), ip_address: Ipv4AddrBytes([0, 0, 0, 0]), key: PeerKey(I32(-1872933448)), peers_wanted: NumberOfPeers(I32(200)), port: Port(U16(6279)) } cookie_valid_range=1732017213.607587..1732017335.607587}: torrust_tracker::servers::udp::handlers: error=connection id produces a future value
tracker  | 2024-11-19T11:55:34.608274Z ERROR process_request:handle_packet{cookie_time_values=CookieTimeValues { issue_time: 1732017334.6081936, valid_range: 1732017213.6081936..1732017335.6081936 }}:handle_request{cookie_time_values=CookieTimeValues { issue_time: 1732017334.6081936, valid_range: 1732017213.6081936..1732017335.6081936 }}:handle_announce{remote_addr=x.x.243.90:55801 announce_request=AnnounceRequest { connection_id: ConnectionId(I64(5089751780245633787)), action_placeholder: AnnounceActionPlaceholder(I32(1)), transaction_id: TransactionId(I32(-1123371308)), info_hash: InfoHash([177, 53, 173, 166, 50, 153, 156, 70, 195, 247, 241, 9, 154, 50, 139, 168, 220, 166, 79, 5]), peer_id: PeerId([45, 113, 66, 52, 54, 51, 65, 45, 67, 51, 76, 126, 119, 117, 73, 90, 49, 105, 69, 99]), bytes_downloaded: NumberOfBytes(I64(859106175)), bytes_left: NumberOfBytes(I64(285212672)), bytes_uploaded: NumberOfBytes(I64(450582314)), event: AnnounceEventBytes(I32(0)), ip_address: Ipv4AddrBytes([0, 0, 0, 0]), key: PeerKey(I32(1502953212)), peers_wanted: NumberOfPeers(I32(200)), port: Port(U16(6279)) } cookie_valid_range=1732017213.6081936..1732017335.6081936}: torrust_tracker::servers::udp::handlers: error=connection id produces a future value

It would be convenient to include the unencrypted issue time in the logs to know the exact value. Anyway, I'm going to continue with my debug plan. I want to check if these failing requests have a previous connect request from the same IP. If not, this could be just spam or a DDoS attack.

da2ce7 commented 2 weeks ago

@josecelano I will try and make another PR that makes the log messages nicer for this case.

josecelano commented 2 weeks ago

Hi @da2ce7, I'm getting many announce request errors from my BitTorrent client. I think it is because the demo tracker is not responding in time. The server CPU has been under 50% in the last 14 days. I suppose the problem is that we have a lot of request errors in the logs for this issue, and that makes the UDP tracker slower because it has to write the error to the log file.

image

I think we should either:

Could it be that some clients don't make the connect request, or they make it wrong?

da2ce7 commented 1 week ago

Here is what the KI says.

Given your requirements, using a Counting Bloom Filter (CBF) or a similar probabilistic data structure can be adapted to handle this scenario effectively. Here's how you could implement a solution that addresses your concerns:

Solution Overview:

  1. Hierarchical Counting Bloom Filters:

    • Individual IP Layer: Use one CBF to track individual IP addresses. This will allow for fine-grained detection of misbehaving IPs.

    • Subnet Layer: Implement another level of CBFs where each filter covers a subnet range. This allows for detecting patterns of misbehavior across a subnet without penalizing neighboring IPs inappropriately.

Implementation Details:

Advantages:

Challenges:

Implementation Steps:

  1. Decide on the Subnet Size: Determine what constitutes a subnet for your purposes (e.g., /24, /16, etc.).

  2. Initialize CBFs:

    • Create one CBF for individual IPs.
    • Create another CBF for subnets, where each bucket represents a subnet.
  3. Error Handling Logic:

    • When an error occurs:
      • Hash the IP to update the individual IP CBF.
      • Extract the subnet from the IP and hash it to update the subnet CBF.
  4. Action Protocol:

    • If an individual IP's count exceeds a threshold, apply rate limiting or other measures to that IP.
    • If a subnet's count exceeds a higher threshold, consider similar measures but at the subnet level.
  5. Decay Mechanism: Implement a decay or aging process for counts to ensure that past behavior doesn't indefinitely affect current interactions unless the behavior persists.

By employing this hierarchical approach with Counting Bloom Filters, you can effectively manage IP-based errors at different levels of granularity, protecting your network's performance while minimizing the impact on innocent IPs.

josecelano commented 1 week ago

Hi @da2ce7, I've downloaded 3-minute tracker logs with trace level.

Socket addresses ordered by number of logs records (10 first ones):

$ grep -oP "\d+\.\d+\.\d+\.\d+:\d+" ./tracker_logs.txt | sort | uniq -c | sort -nr | head
  84398 0.0.0.0:6969
   4632 *.142.69.160:53234
   3311 *.248.77.85:38473
   2581 *.241.177.250:43822
   2276 *.182.117.236:43124
   2216 *.86.246.61:1024
   1895 *.118.235.66:6881
   1846 *.38.196.30:6816
   1845 *.181.47.215:6817
   1818 *.181.47.215:6812

IP addresses ordered by number of logs records (10 first ones):

$ grep -oP "\d+\.\d+\.\d+\.\d+" ./tracker_logs.txt | sort | uniq -c | sort -nr | head
  85291 0.0.0.0
  13435 *.38.196.30
  12789 *.181.47.215
   5052 *.65.200.220
   4869 *.119.120.243
   4785 *.79.3.85
   4632 *.142.69.160
   3895 *.248.77.85
   3734 *.165.243.54
   3403 *.236.91.11

Socket addresses (for a given IP, the second one in number of logs records) ordered by number of logs records (10 first ones):

$ grep -oP "*.38.196.30:.\d+" ./tracker_logs.txt | sort | uniq -c | sort -nr | head
   1846 *.38.196.30:6816
   1755 *.38.196.30:6812
   1728 *.38.196.30:6814
   1710 *.38.196.30:6811
   1689 *.38.196.30:6817
   1602 *.38.196.30:6813
   1566 *.38.196.30:6818
   1539 *.38.196.30:6815

I've also checked one socket address. The client is making the connect request, but it's not using the received ID in the announce request. It's always using this connection ID:

ConnectionId(I64(-2325979768337461225))

The 0.0.0.0 IP on the lists above are due to these log records:

tracker  | 2024-11-20T10:04:09.991917Z TRACE run_udp_server_main{cookie_lifetime=120s}: UDP TRACKER: Udp::run_udp_server (wait for request) local_addr="udp://0.0.0.0:6969"
tracker  | 2024-11-20T10:04:09.991942Z TRACE run_udp_server_main{cookie_lifetime=120s}: UDP TRACKER: Udp::run_udp_server::loop (in) local_addr="udp://0.0.0.0:6969"

I guess this is a bad client implementation. I can email you the logs if you want to check other things.

da2ce7 commented 1 week ago

@josecelano I will try and make another PR that makes the log messages nicer for this case.

https://github.com/torrust/torrust-tracker/pull/1095