sa3paleasm / libtorrent

Automatically exported from code.google.com/p/libtorrent
Other
0 stars 0 forks source link

Implement bep 0038 #447

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Please implement into libtorrent "Finding Local Data Via Torrent File Hints"
as described on http://bittorrent.org/beps/bep_0038.html/

Thank you

Original issue reported on code.google.com by v.korkod...@gmail.com on 18 Mar 2013 at 6:57

GoogleCodeExporter commented 8 years ago
"Implement BEP 38" could really mean many things, but I agree it is a valuable 
Bittorrent Enhancement Proposal.

Not mentioned in the BEP, I think, is that some torrent info sections include 
MD5 (yes, MD5) hashes of individual _files_.  Not standard, and I'm not sure 
which programs use that extension.  But when included, it is a brilliant way to 
confirm with certainty (particularly when coupled with the hashtable) that 
files match in entirety between very different torrents.

The BEP recommends not limiting to whole file matches, but to recognize 
matching data piecemeal (e.g. media files with same payload but different 
headers).

The BEP discusses something a bit broader than its title ("Finding Local Data 
Via Torrent File Hints") suggests.  It also is associating content in multiple 
torrent files, which --if there are sufficient clues to do so in some cases 
without complete local data -- you have linked torrents together in 
sophisticated and/or simple ways that allow you to both download and seed these 
linked files from multiple torrent swarms.  This could help retire the problem 
of dead torrents in the future.

One "torrent file hint" that should not be overlooked, are hints initiated by 
the user.  I, for example, can often find multiple torrents that include some 
or all of the same files through research/searches involving data outside of 
the torrent files themselves.  I then know/expect that certain files are the 
same between the torrents.  Currently it is possible to download those torrents 
to the same save directory and rename identical files in the various torrents 
all to the same name, and manually juggle those torrents for downloading 
purposes (selecting different files for download from each torrent, will for 
example still achieve a level of parrallelization in downloading).  Downloading 
requires a lot of manual intervention in this case currently though and does 
not utilize all the swarms concurrently for the same files.  If one swarm is 
too slow or doesn't have complete sources, the user can manually re-juggle 
which files are selected for which torrents.  When all files are downloaded, 
the user would have to pause everything, select all the files in all the 
torrents, recheck all the torrents, and then resume all the torrents in order 
to seed to everybody BEP-38 style.  Anyway, there should be an option to not 
just automate the BEP 38 stuff, but to explicitly link files (and parts of 
files) between torrents.

At the libtorrent level, this could be implemented strictly by exposing APIs 
for linking data between different torrents, and handling downloads and seeding 
appropriately for the linked torrents, but leaving the discovery of linkages up 
to torrent application developers to implement.

One exciting thing for me is that this not only improves the Bittorrent 
network, but creates an opening, in some cases, for multi-protocol clients to 
identify data they need that is available on networks other than Bittorent as 
well.

Original comment by a...@lovetour.info on 11 May 2013 at 11:55

GoogleCodeExporter commented 8 years ago
> But when included, it is a brilliant way to confirm with certainty
> (particularly when coupled with the hashtable) that files match
> in entirety between very different torrents.

You can't know for sure until you've downloaded the file and compared the 
actual data. There's nothing stopping a malicious torrent creator to put 
invalid md5 sums in the .torrent file.

They could however be used as hints, just like the other hints.

If the md5 of file A matches an existing file B, the hashes for the pieces in 
B, adjusted to match the piece alignment in A, could be computed and compared 
to those in A's torrent file.

Original comment by arvid.no...@gmail.com on 11 May 2013 at 9:36

GoogleCodeExporter commented 8 years ago
That's true.
For small files near the piece size in size, a false MD5 could go undetected
when computing the realignment.  As the difference between the file size and
piece size increases, the odds that a collision could be intentionally
created with a different hashtable but the same MD5 become very small,
making the MD5 in those cases a very strong hint.  In any case, the data
would still be verified according to the torrent used to download it, and
then only pieces, after realignment, that compute to the same hash in the
other torrent should be shared.  If the result is many "hashfails" for the
realigned data, the source could be discarded.

It would probably be desirable for a particular file source (torrent) to be
considered authoritative/trusted if the object is to find multiple sources
for that file.  I imagine each new torrent started by the user or specified
in a general purpose RSS-feed/otherwise scheduled (not necessarily the FIRST
torrent downloaded that hinted at having the same files, but each new
download is authoritative for itself unless the user specifies that it is a
backup source for a previously started/downloaded torrent(s)).

If the authority/trust is equal, e.g.: Start a download of Torrent 1 with
files A,B; without any special instructions from the user start a download
of Torrent 2 which appears to have files A,C -- both have equal trust as
there is no information otherwise.  In those cases, all the files should be
created, but hints should still be used to attempt to share any matching
file/parts of files between the torrents and minimize data
downloaded/multi-source.  The pieces will still have to match their
respective torrent's hashtables to be kept (although the moment at which
this is known is different from an isolated/na�ve download, assuming
realignment is necessary).

Original comment by a...@lovetour.info on 12 May 2013 at 12:45