transmission / transmission

Official Transmission BitTorrent client repository
https://transmissionbt.com
Other
12.28k stars 1.21k forks source link

Please ignore the padding files inserted by BitComet Client in torrent. #579

Open kirbyzhou opened 6 years ago

kirbyzhou commented 6 years ago

BitComet will insert dummy padding files with zero-filled content into its torrent files. These files are useless, and their filenames start with "_____paddingfile".

Smart BT Client such as Thunder/XunLei will ignore these files while downloading.

For example: image

DanielYWoo commented 6 years ago

+1

ghost commented 5 years ago

This is really more of a bug in BitComet...

DanielYWoo commented 5 years ago

@ymte yes, BitComet hacks the protocol and generates those padding files, it sucks, but I still hope transmission can ignore them.

Auska commented 5 years ago

+1

fjqingyou commented 4 years ago

hope add a global ignore file,

be similar to .gitignore

user can config it

kirbyzhou commented 4 years ago

hope add a global ignore file,

be similar to .gitignore

user can config it

very good idea!

chrysn commented 4 years ago

Just ignoring them will not give all the benefits a padding file has (which is to allow deduplication over different torrents / torrent versions with some identical files without having to actually transfer the padding over the wire), and would require user intervention that I'd expect rarely to be accurate in practice.

There's three aspects to it that could largely be addressed individually:

a8underscore commented 4 years ago

any updates on how this is going?

DanielYWoo commented 4 years ago

@chrysn sounds like we need something like a virtual file layer, and intercept all read/write/verify operations on padding files.

chrysn commented 4 years ago

That'd be one way to implement it -- but (without knowing the code) I'd expect that it's only two or three locations in the code that'd need changing, and then a full VFS layer might be extraneous.

DanielYWoo commented 4 years ago

@chrysn To make it simple, what if we just "unselect" the matching files when adding the torrent?

chrysn commented 4 years ago

Possibly. I'd be afraid that this still creates the files and downloads data (for it is needed to verify the piece), and that they'd show as missing during verification, but as I said I'm only commenting from an outsider's perspective.

a8underscore commented 4 years ago

Well padding files are always filled with 0 bytes, so I don’t think we would need to download it if we always know what the content will be

On Sep 6, 2020, at 6:33 AM, chrysn notifications@github.com wrote:

 Possibly. I'd be afraid that this still creates the files and downloads data (for it is needed to verify the piece), and that they'd show as missing during verification, but as I said I'm only commenting from an outsider's perspective.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

dgcampea commented 2 years ago

This should be possible "in a clean way" if support for BEP 47 gets added and treats these __padding__ files as a special non-standard case.

vt-idiot commented 9 months ago

This should be possible "in a clean way" if support for BEP 47 gets added and treats these __padding__ files as a special non-standard case.

Part of the issue is that BitComet's "padding" files don't actually comply with BEP 47, right? Any (bep_0047 compliant) client will send you all the null-bytes you want if you are trying to download .pad files from e.g. Transmission. None of those e.g. libtorrent peers actually have the .pad files stored, all of them know what the rest of the piece that contains them should be, etc.

I assume the same is also probably true for BitComet's __padding, but without all of the users of several of the most common clients "at your disposal" should you try and request the rest of that piece with Transmission.

I have no idea how you'd be able to implement something to handle BitComet's version without actually implementing support for their "flavor" of padding entirely/separately.


Out of curiosity, I searched for __padding on my computer and actually found some kicking around from a few years ago. I also still had the torrent.

  1. The torrents themselves are weird as hell. ed2k hashes?
  2. The padding files have no ed2k hashes or filehashes listed, while every other file does. I am assuming this has something to do with how BitComet is able to detect their presence.
  3. Any instance of them I can find (locally) is from... Linux ISO torrents, where the padding files were in addition to no fewer than 4 advertising/spam files. All of the aforementioned torrents should've just been for a single video file. Luckily they use 256 KiB pieces, so the padding and the garbage only amount to ~1 MiB, but a torrent without the baggage wouldn't have needed to be padded in the first place...
{
 "info" : {
  "files" : [
   {
    "ed2k" : "removed",
    "filehash" : "removed",
    "length" : 123,
    "path" : [
     "%b0%e5%9d%80.txt"
    ],
    "path.utf-8" : [
     "地址.txt"
    ]
   },
   {
    "length" : 262011,
    "path" : [
     "_____padding_file_0_%e5%a6%82%e6%9e%9c%e6%82%a8%e7%9c%8b%e5%88%b0%e6%ad%a4%e6%96%87%e4%bb%b6%ef%bc%8c%e8%af%b7%e5%8d%87%e7%ba%a7%e5%88%b0BitComet(%e6%af%94%e7%89%b9%e5%bd%97%e6%98%9f)0.85%e6%88%96%e4%bb%a5%e4%b8%8a%e7%89%88%e6%9c%ac____"
    ],
    "path.utf-8" : [
     "_____padding_file_0_如果您看到此文件,请升级到BitComet(比特彗星)0.85或以上版本____"
    ]
   },

Translated

_padding_file_0_If you see this file, please upgrade to BitComet 0.85 or above

A BEP 47 torrent, like what you'd get from a client based on libtorrent or something, has padding files that look just like the actual documentation on BEP 0047:

   {
    "attr" : "p",
    "length" : 123456,
    "path" : [
     ".pad",
     "0"
    ]
   },

Improper/uninformed use of padding goes the other way, too, as in with BEP 47 (and aligning to piece boundary for all files, which isn't explicitly required!) not BitComet's BS:

Scenario 1

Making a torrent of something like a music album with 10 individual FLAC files of varying sizes, totaling ~500 MiB? Padding isn't going to be disastrous if you enable it, and it will let "someone" grab single files 3 years from now when only partial seeds are left - and partial seeds using a client that supports BEP 47 will never have to store anything "extra" and can even verify against their "incomplete" pieces.

Assuming you used 256 KiB or 512 KiB pieces like you ought to in order to land at 1,000-2,000 pieces - at most there's going to be 2.5 or 5 MiB worth respectively, and in practice, it will be less, closer to 1.25 or 2.5 MiB total. Definitely an inconvenience for clients that don't (yet?) support it, like Transmission, but only a minor one.

Scenario 2

Making a torrent of 10,000 files of sizes varying all the way from 1 byte to 100 MiB, totaling 1 GiB? Maybe it's something like a git repo clone (9,999 tiny files) + a release build (the single 100 MiB file).

Well, it's a 1 GiB torrent, so I'll just use 512 KiB pieces. 1,000-2,000 pieces, perfect!

Got padding enabled? Congratulations, now there are potentially 2.5 GiB of ".pad files" in your 1 GiB torrent, making things miserable for anyone using a client that doesn't support BEP 47 and all the peers stuck sending them the null bytes they're begging for... image

The presence of padding files does not imply that all files are piece-aligned.

That's straight out of the spec itself, emphasis mine.

Note, "potentially" above - having e.g. qBittorrent align to piece boundary for all files (larger than 0 KiB) results in each individual (padded) piece representing only one file. I just tried it on a 900 MiB folder with 52 files ranging from ~10-30 MiB with and without padding, with 32 MiB pieces (don't do this). 51 padding files...torrent appears to be 1.605 GiB worth of data when including the padding. For the actual "Scenario 2" described, that insane 2.5 GiB number could be minimized by setting a more appropriate cut-off for the size of files that need to be padded - like the 512 KiB piece size itself as a bare minimum.


There are really only a few use cases where BEP-47 makes sense for a v1 torrent in the first place, like season packs of a TV show or something. 10-20 files, never the potential for >1% overhead vs. the rest of the data, lets someone who was only missing an episode or two start (cross-)seeding immediately, etc.

Anywhere else for v1, or where it's required (v2/hybrid)...you have to hope whoever is making the torrent is actually thinking about what they're doing.