Closed josecelano closed 2 years ago
I think we could research a little bit about what other implementations do:
webtorrent
Repo: https://github.com/webtorrent/bittorrent-tracker
Connection id generation: https://github.com/webtorrent/bittorrent-tracker/blob/ff20a05e4830dd62df16da0a549e69ae96843b4d/lib/common-node.js#L12
It's strange, but it seems they use a fixed value.
UPDATE: it's not a fixed connection id. There are two different connection ids, as you can read here:
https://libtorrent.org/udp_tracker_protocol.html
The client connection request uses a magic number 0x41727101980. So I have to look for the one in the response.
UPDATE 2: it seems they always use the same fix value, not only in the first connection request.
lafayette
Repo: https://github.com/lafayette/udp-torrent-tracker
It seems they generate a random one, but they do not check it:
troydm
Repo: https://github.com/troydm/udpt
It seems to be the same implementation we are using but in C++:
static uint64_t _genCiD (uint32_t ip, uint16_t port)
{
uint64_t x;
x = (time(NULL) / 3600) * port; // x will probably overload.
x = (ip ^ port);
x <<= 16;
x |= (~port);
return x;
}
elektito
Repo: https://github.com/elektito/pybtracker Language: Python
It generates a random identifier for the connection:
https://github.com/elektito/pybtracker/blob/master/pybtracker/server.py#L25-L31
self.server.logger.info('Received connect message.')
if connid == 0x41727101980:
connid = randint(0, 0xffffffffffffffff)
self.server.connids[connid] = datetime.now()
self.server.activity[addr] = datetime.now()
return struct.pack('!IIQ', 0, tid, connid)
else:
return self.error(tid, 'Invalid protocol identifier.'.encode('utf-8'))
The ID is in the range [0 .. 0xffffffffffffffff]
It validates the connection id on each request:
https://github.com/elektito/pybtracker/blob/master/pybtracker/server.py#L47-L57
# make sure the provided connection identifier is valid
timestamp = self.server.connids.get(connid, None)
last_valid = datetime.now() - timedelta(seconds=self.server.connid_valid_period)
if not timestamp:
# we didn't generate that connection identifier
return self.error(tid, 'Invalid connection identifier.'.encode('utf-8'))
elif timestamp < last_valid:
# we did generate that identifier, but it's too
# old. remove it and send an error.
del self.server.connids[connid]
return self.error(tid, 'Old connection identifier.'.encode('utf-8'))
I think the connection ids are stored only in memory with a hashmap.
It validates the connection id on each request following these rules:
hi @WarmBeer @da2ce7, In the end, the current implementation could be valid.
I think it could be a way to generate expirable ids without storing them in memory or a database.
This is the current implementation.
pub fn get_connection_id(remote_address: &SocketAddr) -> ConnectionId {
match std::time::SystemTime::now().duration_since(std::time::UNIX_EPOCH) {
Ok(duration) => ConnectionId(((duration.as_secs() / 3600) | ((remote_address.port() as u64) << 36)) as i64),
Err(_) => ConnectionId(0x7FFFFFFFFFFFFFFF),
}
}
The goal of the "connection ID" is to avoid the spoofing of the IP address. UDP protocol does not have any feature to avoid it. Any client can change the "source IP" in the package. Using other client addresses and ports, you can impersonate them. The BitTorrent UDP Tracker protocol introduces this "token" which has to be used by the client in the next requests.
How it works (from BEP 15):
connect request:
Offset Size Name Value
0 64-bit integer protocol_id 0x41727101980 // magic constant
8 32-bit integer action 0 // connect
12 32-bit integer transaction_id
16
connect response from the server:
Offset Size Name Value
0 32-bit integer action 0 // connect
4 32-bit integer transaction_id
8 64-bit integer connection_id
16
The server has to generate a connection id (64-bit integer) with these rules:
It seems the ID should expire in two minutes.
Let's try to find out what our code does:
ConnectionId(((duration.as_secs() / 3600) | ((remote_address.port() as u64) << 36)) as i64)
duration.as_secs()
is a u64 representing the seconds passed since Unix Epoch.
The range for values in hex is: [0x0000000000000000 .. 0xFFFFFFFFFFFFFFFF].
Suppose the current time is t1
and the client port is 0001
:
Unix Timestamp: 946684800 (seconds since Jan 01 2000)
GTM: Sat Jan 01 2000 00:00:00 GMT+0000
The connection id would be:
Timestamp is seconds = 946684800
Timestamp in hours = 946684800 / 3600 = 262968
Timestamp in hours in hex (64 bits, u64) = 0x0000000000040338 = 0x 0000 0000 0004 0338
Client port = 0001
Clietn port in hex = 0x0000000000000001 = 0x 0000 0000 0000 0001
Client port rotate 36 to the left (<<36) = 0x0000000100000000 = 0x 0000 0001 0000 0000
The OR in the expression is:
"Timestamp in hours in hex" BIT OR "Client port rotate 36 to the left (<<36)"
that is:
0x 0000 0000 0004 0338
OR
0x 0000 0001 0000 0000
----------------------
0x 0000 0001 0004 0338
Basically, the port is moved to the first 32 bytes. And the second half is the number of hours since Unic Epoch.
If I'm not wrong this value only changes after one hour. If fact, it only changes the second 32 bits because we increase one hour.
I suppose that's a valid implementation. I do not know why 1 hour instead of 2 minutes like the protocol says. But I think it can be changed to 2 minutes just by changing the 3600 value.
Pros:
@WarmBeer does it make sense for you?
It that's correct I think we can keep it and just add this explanation to the documentation with some tests. We can test:
Given we only use the port, I suppose it will generate the same ID for all clients using the same port during the same hour. That should not be a problem because you can only impersonate another client if you know its IP and the port that it's using.
My hex<->decimal previous convertions were not exact. These are the right values:
Timestamp in hours 946684800u64 / 3600 = 262968 = 0x_0000_0000_0004_0338 = 262968
Port 0001 = 0x_0000_0000_0000_0001 = 1
Port 0001 << 36 = 0x_0000_0010_0000_0000 = 68719476736
0x_0000_0000_0004_0338 | 0x_0000_0010_0000_0000 = 0x_0000_0010_0004_0338 = 68719739704
HEX BIN DEC
--------------------------------------------------------------------------------
0x_0000_0000_0004_0338 = ... 0000000000000000001000000001100111000 = 262968
OR
0x_0000_0010_0000_0000 = ... 1000000000000000000000000000000000000 = 68719476736
-------------------------------------------------------------------
0x_0000_0010_0004_0338 = ... 1000000000000000001000000001100111000 = 68719739704
Given we only use the port, I suppose it will generate the same ID for all clients using the same port during the same hour. That should not be a problem because you can only impersonate another client if you know its IP and the port that it's using.
The Connection ID is supposed to be a secret code that is only sent to the actual owner of an IP address. With this Connection ID, a peer can proof it actually owns the IP address it announced with. If this Connection ID is the same for all clients, it is very easy for a malicious actor to announce as a different IP address. The malicious actor can then just send a connection request with their own IP, then save the Connection ID from the server response and use it in an announce request with a spoofed IP.
I suppose we could generate the Connection ID as follows (not tested):
fn generate_connection_id(time_as_seconds: u32, peer_ip: IpAddress, peer_port: u16) -> i64 {
let hash = hash((time_as_seconds / 120) + peer_ip + peer_port + SALT)
let connection_id = (hash truncated to 64 bits) as i64
return connection_id
}
let connection_id = generate_connection_id(SYSTEM_TIME_AS_SECONDS, PEER_IP, PEER_PORT);
We can then verify the Connection ID without having to keep it in memory:
fn verify_connection_id(connection_id: i64, peer_ip: IpAddress, peer_port: u16) -> Result<(), ()> {
match connection_id {
generate_connection_id(SYSTEM_TIME_AS_SECONDS, peer_ip, peer_port) => Ok(()),
generate_connection_id(SYSTEM_TIME_AS_SECONDS - 120, peer_ip, peer_port) => Ok(()),
_ => Err(())
}
}
With this implementation, the client has no influence on the Connection ID except for the IP and Port. The added SALT will also make it impossible for a client to guess the Connection ID. The Connection ID will then be the same for two minutes (although different for every IP address and Port combination).
To verify whether a Connection ID is valid, we just check the supplied Connection ID against the outcome of generating a Connection ID now. But since the Connection ID updates every two minutes and the Connection ID should also be valid for two minutes after sending it to the client, we also check it against the previous Connection ID from max two minutes ago and also consider it valid if that is a match. This means that in the worst case, a Connection ID is valid for just under 4 minutes.
I've edited my reply to also include the peer port.
BEP 15: https://www.bittorrent.org/beps/bep_0015.html
This is what the BEP 15 says about the connection ID:
And this is the current implementation:
Originally posted by @josecelano in https://github.com/torrust/torrust-tracker/issues/60#issuecomment-1210961955