torrust / torrust-index

This repository serves as the backend for the Torrust Index project.
https://torrust.com
GNU Affero General Public License v3.0
51 stars 19 forks source link

Search for alternatives to `serde_bencode` #311

Closed josecelano closed 11 months ago

josecelano commented 1 year ago

Context

We are using the serde_bencode crate by @toby

We use it to encode and decode a torrent file in this mod.

Problems

Bug

We have found a bug recently, but the project seems inactive.

Double parsing

On the other hand, maybe our implementation is not optimal:

pub fn decode_and_validate_torrent_file(bytes: &[u8]) -> Result<(Torrent, InfoHash), DecodeTorrentFileError> {
    let original_info_hash = calculate_info_hash(bytes)?;

    let torrent = decode_torrent(bytes).map_err(|_| DecodeTorrentFileError::InvalidBencodeData)?;

    // Make sure that the pieces key has a length that is a multiple of 20
    if let Some(pieces) = torrent.info.pieces.as_ref() {
        if pieces.as_ref().len() % 20 != 0 {
            return Err(DecodeTorrentFileError::InvalidTorrentPiecesLength);
        }
    }

    Ok((torrent, original_info_hash))
}

We parse the torrent data twice. In the first round, we calculate the info-hash, and in the second round, we build the Torrent struct.

Generic error messages

Another problem is that the error message could be more helpful when there is an error parsing the torrent. It does not give you the line, field or byte number because it's a generic Bencode deserializer. Check this example error.

Other crates are specific for torrent files like this.

Research

In addition, there are a lot of crates for encoding/decoding torrents. Some of them are very new. There could be better options regarding performance, support, documentation, number of contributors, etc.

@da2ce7 suggested researching for alternatives.

After a first fast search, I've found a lot, and one of them looks interesting, at least for the final parsed struct:

pub struct Torrent {
    /// URL of the torrent's tracker.
    pub announce: Option<String>,
    /// Announce list as defined in [BEP 12](http://bittorrent.org/beps/bep_0012.html).
    pub announce_list: Option<AnnounceList>,
    /// Total torrent size in bytes (i.e. sum of all files' sizes).
    pub length: Integer,
    /// If the torrent contains only 1 file then `files` is `None`.
    pub files: Option<Vec<File>>,
    /// If the torrent contains only 1 file then `name` is the file name.
    /// Otherwise it's the suggested root directory's name.
    pub name: String,
    /// Block size in bytes.
    pub piece_length: Integer,
    /// SHA1 hashes of each block.
    pub pieces: Vec<Piece>,
    /// Top-level fields not defined in [BEP 3](http://bittorrent.org/beps/bep_0003.html).
    pub extra_fields: Option<Dictionary>,
    /// Fields in `info` not defined in [BEP 3](http://bittorrent.org/beps/bep_0003.html).
    pub extra_info_fields: Option<Dictionary>,
}

Notice the extra fields that we are not currently parsing.

Repo:https://github.com/ttlajus/lava_torrent

Conclusion

We could search for alternatives and evaluate the options:

  1. Fork the serde_bencode crate and fix the bug.
  2. Use our implementation like the one we did here.
  3. Use a new crate.

cc @da2ce7 @mario-nt

josecelano commented 1 year ago

After a first fast search, I've found:

josecelano commented 1 year ago
Repo name Stars Last commit date Contributors
serde_bencode 1 4 years 0
rust-bencode 34 6 years 11
rust-bdecode 1 3 years 0
bt_bencode 14 1 year 2
bende 3 1 year 2
nom-bencode 8 1 year 0
bip_bencode 289 6 years 7
juicy_bencode 1 1 year 0
rust-bencode 2 8 years 0
serde_bencoded 2 2 years 2
bencode 1 1 year 0
bde 0 1 month 0
bendy 59 4 months 9
bencoderus 2 4 years 0
rust-simple-bencode 1 7 years 0
serde-bencode 52 2 years 7
serde_bencode 2 7 years 0
lava_torrent 30 5 months 5
bencode-decode 0 3 years 0

https://github.com/arjantop/rust-bencode

https://github.com/GGist/bip-rs/tree/master/bip_bencode

Forked by @da2ce7

It is one of the best alternatives.

https://github.com/knightpp/serde_bencoded

Used by 11.

https://github.com/P3KI/bendy

It is one of the best alternatives.

https://github.com/toby/serde-bencode

The one we are using.

It is one of the best alternatives if we forked it and fix the bug.

https://github.com/ttlajus/lava_torrent

It would be one of the best alternatives if it weren't in maintenance mode.

josecelano commented 1 year ago

I have not spent too much time so far comparing all the options but it seems we have three options:

Option 1. serde-bencode

Fork and fix toby/serde-bencode

Pros:

Cons:

Option 2. bip_bencode

Use our fork: https://github.com/torrust/bittorrent-infrastructure-project/tree/develop

Pros:

Cons:

Option 3. lava_torrent (discarded)

I like the completeness of the lava_torrent because they include the extra fields:

pub struct Torrent {
    pub announce: Option<String>,
    pub announce_list: Option<AnnounceList>,
    pub length: Integer,
    pub files: Option<Vec<File>>,
    pub name: String,
    pub piece_length: Integer,
    pub pieces: Vec<Piece>,
    pub extra_fields: Option<Dictionary>,
    pub extra_info_fields: Option<Dictionary>,
}

But it looks like the project is going to stay the same.

Option 4. Custom solution (discarded)

Like: https://github.com/torrust/torrust-parse-torrent/blob/main/src/utils/parse_torrent_verbose.rs

Pros:

Cons:

Current temporary conclusion

I would use the bip_bencode because of the error handling, but:

I think this solution could be similar to building our custom solution. In the end, it's similar to writing our high-level metainfo parser using the fixed/forked serde_bencoded repo. You can see that the high-level package is very simple.

If we want to have it all, we can fix/fork serde_bencode (benefit for the community) and copy the code from the metainfo package (in bip) but use the serde_bencode. The first step could be just fix serde_bencode. In the future, we can improve the other two problems with a new high-level package.

cc @da2ce7

josecelano commented 1 year ago

Finally, I've forked the serde_bencode crate: https://github.com/torrust/torrust-serde-bencode

I'm cleaning the repo a little bit. I'm planning to:

josecelano commented 1 year ago

I talked to @toby and now I'm a maintainer of the serde-bencode, and I can also publish new versions on crates.io.

I plan to finish the current PR in our fork and open a PR on the upstream repo. And then publish a new minor version.

josecelano commented 11 months ago

Finally, the bug was fixed on the original repo/crate and a new version was released:

https://crates.io/crates/serde_bencode/0.2.4

Regarding how to improve error messages in the future, we can build a more verbose torrent parser like this:

https://github.com/torrust/torrust-parse-torrent/blob/main/src/utils/parse_torrent_verbose.rs

I think it's not a priority right now.