Tool to match torrents/magnets with local files automatically and instantly

slrslr commented 7 years ago

It would bring torrent to the next level if torrent client is aware of my HDD structure (file/folder names, maybe even "file hashes") and when i add torrent/s, then torrent client will search its internal database/index of my HDD contents and match my local files to thee torrent payload files preventing duplicity and greatly saving time finding data, renaming payload or HDD file to match torrent payload files structure.

Software like Everything (Windows) or FSearch (Linux) are indexing millions of files and in maybe 1 second return files that match the name entered.

torrent client will offer me one or more save paths where it found target files or files with same "hash" without me spending time searching and checking files. I can select the SavePath under which to consolidate move the matching payload files it will offer paths where are already downloaded/matched files or custom/default save path where will be then placed remaining files. Not sure if creating symbolic links would be good)
torrent client will maybe even use files from different directories as a source for one torrent payload files
torrent client will match files which file name is different than the torrent's file name.
the file indexing can be paused/resumed by user, computer reset/suspend will not break it. User can select multiple folders on multiple drives which will be indexed
indexing runs with low CPU, IO priority
torrent client will hash newly found files regularly, upon start (some clients does not restart often), or on request.
if external drive is disconnected and some monitored directory is not available, torrent client will ask user to verify the file index configuration instead of deleting big portions or whole hash database just because ext. drive forgot to be connected
when migrating windows/linux, user should be able to update paths (maybe just making the database a non-binary/text file where search and replace can be used) so no need to re-hash terabytes of data again

Libtorrent issue: https://github.com/arvidn/libtorrent/issues/2838

Match torrents and data, remove torrents based on data, cleanup your disk for unseeded files. Autotorrent2 python script: https://github.com/JohnDoee/autotorrent2?tab=readme-ov-file#autotorrent2

Seeker2 commented 7 years ago

Torrents don't contain file hash values except sort-of in the case of single-file torrents, but even then the hash changes depending on the torrent's piece size. If a file is in a multi-file torrent, it probably shares end pieces with other files making a mess for hashing.

What this means is a file's hash cannot be computed in advance and compared against torrents and that eliminates the possibility of making this anywhere near instant. More details here: https://qbforums.shiki.hu/index.php/topic,4425.0.html

dwilbanks commented 7 years ago

I've thought about this quite a bit. Some torrents do include a sha1 attribute on each file, however it's very rare.

What could be done is to check the files have exact file lengths, and those exact file lengths can be compared to and existing directory structure, then in big files look for the individual pieces. Since torrents are broken down into pieces, it's not the entire file that's needed it's the pieces.

From what I understand, this all is moot because the actual downloading is done by libtorrent, not qbittorrent. Any changes of this nature would need to be done by libtorrent.

Ideally there would could be a plugin where libtorrent, or qbittorrent says

"I'm looking for piece x of torrent y" or "I'm looking for a segment of starting at A bytes that is B bytes long and has a sha1hash of C. The total length of the file is D."

That would be a very simple plugin to write and configure.

Godangel commented 5 years ago

I'll try to narrow possible concept for this feature.

Create new entity 'Library' which links to one or more folders/drives. Everything in Library is considered read-only.
Include new modes in Add torrent window (I guess, just two extra checkboxes) and to right-click menu:
- mode 1: everything as it is now
- mode 2: search in Library for matched files and copy them to Download folder (rename files if necessary), then behave like with normal torrent
- mode 3: "read-only mode" ("seed-only mode" #1254) - search in Library for matched files and seed them from there in read-only mode; do not download any missing parts

As suggested above files could be searched in Library by their size. Afterwards matched files should be verified to hash stored in torrent block by block. This could be done by implementing verification of individual files from torrent #257 or by more rough method - firstly searching candidates for all files in torrent, then verifying all of them at once (this could come to situations when there are 2+ candidates for one file, but in reality dealing with big files this will be pretty rare).

Surely in many cases first/last block won't be verified, but it is not a problem - in 'mode 2' missing parts will be downloaded, and in 'mode 3' missing parts remain missing, but for big files 99% of their data can be seeded - and that's already good enough. Now it's typical that many files lying scattered on drives or - on contrary - lying organized on 'collection' archive drives and can't be seeded through different filenames and folder structure. Implementing Library concept could eliminate this problem and help to seed rare files.

Also 'Library' could fulfill other common feature request - seed one file included in two different torrents.

Seeker2 commented 5 years ago

"in big files look for the individual pieces. Since torrents are broken down into pieces, it's not the entire file that's needed it's the pieces."

2 identical files in 2 different torrents can have torrent pieces that don't match because torrents can have different sized pieces and the pieces may not start at the beginning of the file (due to having other variable-length files before it in the same torrents). There is a HUGE Dl/UL/IO performance penalty for multi-file torrents as a result of this: https://qbforums.shiki.hu/index.php/topic,2627.msg12725.html#msg12725

It is possible to generate hashes for each file using either another app or a subroutine in an existing torrent app, but that might be "computationally expensive" (take a LONG time) if it's to be done on terabytes of files.

Godangel commented 5 years ago

It is possible to generate hashes for each file using either another app or a subroutine in an existing torrent app, but that might be "computationally expensive" (take a LONG time) if it's to be done on terabytes of files.

As for idea to have hash for every file in torrent and every file in 'Library' and then simply match them like that - it's really will be very heavy performance wise as we need to calculate hash for entire 'Library' and periodically refresh it as 'Library' updates.

2 identical files in 2 different torrents can have torrent pieces that don't match because torrents can have different sized pieces and the pieces may not start at the beginning of the file (due to having other variable-length files before it in the same torrents).

Yes, 2 identical files will be split to different blocks in 2 different torrents, but does it make the task too difficult? Let's look at it from different sides:

search in 'Library' will be done only on adding torrent, so it's one-time (nevertheless lengthy) task
if we search for matched file in 'Library' we can assume it size, file extension, partial name match - in most cases we get only one or none matches - and only after that we can verify it's contents
when searching 'Library' for match we can verify at first only few starting blocks (that starts on >1 byte of file to exclude "overlapping" blocks) - or as mentioned earlier match file without verification at all
after finding matched file we copy it to download folder ('mode 2') or link it to torrent ('mode 3') - the last one needs new mechanics - proxy - that will translate filepath from torrent to real file name and location for each file in torrent (and restrict writing to them)
and lastly when we copied('mode 2')/linked ('mode 3') from 'Library' matched files (one or none for each file in torrent) we treat that as typical torrent

So the 'Library' mechanics is not so heavy on performance:

it affects only adding torrents
after finishing Library search and coping/linking files it's equal to adding partially downloaded torrent

Another mentioned concern regarding performance of multi-file torrent as far as I can see does not apply to this concept as we won't do any different to what client is doing right now (as for 'mode 2'). In 'mode 3' we actually give access to single file from different torrents, but it's read-only access, so it'll be less heavy (I guess it even can be more easier permormance wise as to seed 2 identical files in separate folders). Also in many cases first/last "overlapping" blocks won't be verified (as some files will be missing from 'Library' or been edited - and thus also 'missing'), so it's a bit more easy on disk performance. So we get "incomplete" torrent that seeding the "meat" - and that's pretty good. Some time later that mechanics could be enhanced to download those missing "overlapping" first/last blocks to cache file, so torrent could be fully seeded.

Godangel commented 5 years ago

Also there is need to clarify the use cases for 'Library' feature request.

To put it simple it is "one-click seeding".

With current torrent implementation user is heavily restricted to what he can do with downloaded files. User can't delete files through Explorer, can't rearrange files to different folders, can't rename files and folders, can't edit files (subtitles for example). Any of that changes would make torrent invalid, and revalidating it without making adjustments in torrent-client would revert those changes at least, or just making big mess in most cases. The only available option to organize torrents is using qBittorrent categories or manually move downloaded torrent folder and then pinpointing its new location. And even then you have to abide to torrent creator's file/folder structure and provided content. It fulfills most users' use case to download something, watch/install and then delete, but is not encouraging for long time seeding.

To make long time seeding more viable option it is needed some new mechanics to share on per-file basis. Obviously it's better to do with something like IPFS, but we are talking about torrent-client here. So linking individual files (from 'Library') to torrent is likely the easiest way to do that without messing with torrent concept as a whole.

For example, I'm downloading lots of big rare torrents with 0 to few seeds at most. It took literally weeks to complete each of those. But seeding them becomes a problem. Either I should keep them as is - no reorganizing, no renaming etc. (and even then I should manually pinpoint their new location after moving to archive drive for EACH torrent - or separately do it through qBittorrent for files and through Explorer (totalcmd actually) for not included subtitles and stuff). Or I should keep an "unorganized" copy of them to seed, and "organized" to preserve. Last case also poses a problem - free space is still limited >_> So if contents of my archive drives could be automatched (even partially) with torrents I could keep lots of rare torrents alive even with low bandwidth as they are requested pretty rarely. Yes, I won't likely be a "full" seed, but nevertheless I'll be seeding lot's of "meat", and the "meat" what is lacking now for those torrents.

So with 'Library' feature everything user needs to do in such case is right-click bunch of downloaded, but "invalid" tasks in qBittorrent and press "search and seed from Library". Easy for user, good for community.

slrslr commented 5 years ago

is needed some new mechanics to share on per-file basis

Yes, current 20th century model of torrentting is outdated, you need to be clicking monkey. We need more clever way the torrent client link handling files. Torrent client should have database of HDDs content and automatically attach appropriate files to the torrents and automatically detect relocations and renamed files. Nowayday torrent clients is for clicking monkeys. In qbt you even can not do bulk replacing of the torrents configuration, save paths, file names. There is no human readable text configuration file where you can do it. @sledgehammer999

Seeker2 commented 5 years ago

qBitTorrent doesn't even save its working copies of the .torrent files using the original filenames for those .torrent files. It just creates a bunch of garbage filenames for them. Good luck trying to recover a couple lost .torrent files out of 50+ total .torrent files. (uTorrent did save its working .torrent files using recognizable names.)

sledgehammer999 commented 5 years ago

a bunch of garbage filenames for them

Nope. The naming scheme is <torrent_info_hash>.torrent

Seeker2 commented 5 years ago

That torrent hash is only of value if I am comparing torrents in a BitTorrent client or trying to make magnet links out of them. For all other practical purposes, especially for casual users... these are garbage filenames.

This is a level of utility I took for granted in uTorrent, because it simply made sense to handle torrents that way.

sledgehammer999 commented 5 years ago

@Seeker2 Anyway we will be migrating to an SQLite database anyway. See #10099

slrslr commented 4 years ago

ontopic comment by the libtorrent maintainer: "in BitTorrent V2 (currently implemented in libtorrent master) the hashing is done per file (more specifically in a merkle hash tree, rooted in each file). This enables functionality to match files to torrents."

styper commented 3 years ago

I've written a Python script to achieve this, it's not ideal but it works in some cases. https://github.com/styper/qbit_automatch

Patty-OFurniture commented 1 year ago

Have finished a proof of concept to find files, copy them to the correct path, and after a force recheck the files are available for seeding. It is based on file size and does compare piece hashes.

The .torrent file is required, just an infohash or magnet link won't help. Of course if you have the metadata, you can export the .torrent

https://github.com/Patty-OFurniture/TorrentFiller/tree/main

F.A.Q. here https://github.com/Patty-OFurniture/TorrentFiller/issues/4

luzpaz commented 4 weeks ago

@Patty-OFurniture how battle-tested is the code ?

Patty-OFurniture commented 4 weeks ago

@Patty-OFurniture how battle-tested is the code ?

6 stars, 0 forks, 1 total issue. Assuming you mean "lots of people have used this and any reported issues have been fixed", the second half is true. I would wager that the first half is not.

luzpaz commented 4 weeks ago

@Patty-OFurniture Good points. Do you use this code often yourself ?

Patty-OFurniture commented 4 weeks ago

This really should be a conversation over on that project page instead of here under qbittorrent. I'm not able to do this right now, but I'd recommend opening a new issue there and referencing it here.

On Thu, Aug 15, 2024, 5:48 AM luzpaz @.***> wrote:

@Patty-OFurniture https://github.com/Patty-OFurniture Good points. Do you use this code often yourself ?

— Reply to this email directly, view it on GitHub https://github.com/qbittorrent/qBittorrent/issues/6520#issuecomment-2290988976, or unsubscribe https://github.com/notifications/unsubscribe-auth/A6JQB6KG36LJICFQFKK4IJDZRR2PJAVCNFSM6AAAAABMQFMER2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJQHE4DQOJXGY . You are receiving this because you were mentioned.Message ID: @.***>

Patty-OFurniture commented 3 weeks ago

Added link to FAQ above, and also here.

https://github.com/Patty-OFurniture/TorrentFiller/issues/4

qbittorrent / qBittorrent

Tool to match torrents/magnets with local files automatically and instantly #6520