webtorrent / bittorrent-tracker

🌊 Simple, robust, BitTorrent tracker (client & server) implementation
https://webtorrent.io
MIT License
1.77k stars 317 forks source link

`peer_id` from announce query string interpreted as latin1 then converted to UTF-8 before being returned as a transformed and wrong value in the peer list. #460

Open pdelagrave opened 1 year ago

pdelagrave commented 1 year ago

Hello, I'm developing a BitTorrent client from scratch for a toy project and was testing it against this tracker. Using version 10.0.2 of the tracker from the minimages/bittorrent-tracker docker image.

When announcing to the tracker with a non-UTF-8 peer_id: curl 'http://1.2.3.4:8001/announce?info_hash=escaped_20_bytes&peer_id=%F0%5B%E2%6F%AA%33%E5%A7%9B%1E%75%52%13%6E%B5%AD%51%57%21%98&port=1234&compact=0

The request succeeds and there's no error. The peer is added to the tracker's internal list. Then in the response we have the peer list with that new peer where the peer_id has a different value than the one provided in the request. In this example the transformed peer_id will be: c3b05bc3a26fc2aa33c3a5c2a7c29b1e7552136ec2b5c2ad515721c298

It's a problem because peers can't communicate with a peer which the peer_id is misrepresented by the tracker.

Using the NodeJS API we can achieve that result with:

input = 'f05be26faa33e5a79b1e7552136eb5ad51572198';
latin1_str = Buffer.from(input, 'hex').toString('binary');  // As per the Node docs, 'binary' is the same as 'latin1'.
// the latin1_str == 'ð[âoª3å§\u001euR\u0013nµ­QW!'
Buffer.from(latin1_str, 'utf8').toString('hex'); // == 'c3b05bc3a26fc2aa33c3a5c2a7c29b1e7552136ec2b5c2ad515721c298'

In the "trackers" section of BEP0003 the GET request for announce, for the query parameter peer_id:

A string of length 20 which this downloader uses as its id. Each downloader generates its own id at random at the start of a new download. This value will also almost certainly have to be escaped.

BEP0020 also specifies that the usage of peer_id as a client identification method uses only ascii characters. So there shouldn't be a need to parse the bytes as latin1 or UTF-8 or from latin1 to UTF-8.

I didn't dig much into the codebase of the tracker (I'm quite unfamiliar with javascript and nodejs) but I've seen Buffer.from() and 'binary' used around and it might be the cause.

Thanks!

ThaUnknown commented 1 year ago

can you reproduce this on 9.18.6?

pdelagrave commented 1 year ago

Same problem with 9.18.6. Also realized that the client list in the /stats page assumes the peer ids to be text strings and not bytes, they aren't escaped and displayed as is.

github-actions[bot] commented 1 year ago

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

ThaUnknown commented 1 year ago

could you walk me tru this.... slowly? I might be disabled, the issue lies in ENCODING the data, when making a request to the tracker, not decoding? to me it seems like the issue would lie on both ends?

lirener commented 8 months ago

I may be having a similar issue to you?(我可能遇到了和你类似的问题?) https://github.com/webtorrent/bittorrent-tracker/issues/508