ned-martin commented 4 years ago

qBittorrent version and Operating System

4.2.5x64 / Windows 10 (10.0.1xxxx)

What is the problem

When "Create subfolder for torrents with multiple files" is enabled, and more than one torrent of the same name, and/or containing one or more files of the same name, begins to download, it overwrites the other one, resulted in lost data, and the torrent re-downloading again (which then repeats the process, the torrents would likely never complete)

What is the expected behavior

The automatically-created subfolder should use common collision avoidance naming rules, for example, Folder, Folder (1), Folder (2)

Alternatively, if that's somehow impossible, a much worse solution would be that any new torrent could go into an error-state when it tries to start by detecting a name collision with an existing file and allow the user to choose to rectify the problem in some manner, by renaming the torrent or file/s for example.

Steps to reproduce

(I don't know if all of these options are required, but this is what I have set) Enable "create subfolder for torrents with multiple/ files" Enable "Append .!qB extension to incomplete files" Default torrent management mode: Automatic Default save path: Keep incomplete torrents in:

I'm aware this issue has been reported before and is not specific to 4.2.5, but it seems to still persist and is a fundamental issue preventing the normal use of the program, that I can only assume would be trivial to fix.

glassez commented 4 years ago

"Create subfolder" does not create anything other than what is contained in the torrent. Isn't the user telling qBittorrent to save torrent to a specific location? Why should someone else be responsible for this action?

glassez commented 4 years ago

I wonder what the probability is that different torrents have content with the same names...

ned-martin commented 4 years ago

"Create subfolder" does not create anything other than what is contained in the torrent. Isn't the user telling qBittorrent to save torrent to a specific location? Why should someone else be responsible for this action?

I don't understand what the confusion is here. No, the user is not telling qBittorrent to save to a specific location, as you said yourself, the location is stored in the torrent (or is the name of the torrent). In fact, in my experience, until the torrent starts downloading, the user can't even rename the location - because until the torrent gets the data of the files in it, there's no way to know.

No program should overwrite user data. Surely that's just a given? Would you be ok with any other program doing this? Let's say you're using your internet browser, and you decide to download a picture. It doesn't prompt you for any file collision, and instead just overwrites some other important photo you already had saved. Would you be ok with that? No, of course not. You'd be very angry. Well, I'm not ok with qBittorrent doing it either.

I wonder what the probability is that different torrents have content with the same names...

Very high. It happens all the time. Personally, I had it happen to three different torrents yesterday, losing several GB of data which had taken many hours to download, which is why I posted this.

Here's an example that demonstrates how this happens, and why this is a common problem:

Decide to download a file, let's call it "Centos 8"
Go searching for torrents called "Centos 8"
Find 18 torrents called "Centos 8"
Add all 18 torrents because they all look like they'll struggle to download and no one is seeding
(Note that several of these torrents are magnet-only links, so it's not possible for the user at this stage to actually see what's inside them, and also not possible to rename what's inside them, so it's impossible to find out if they'll conflict with an existing file that's being downloaded or not - and if they start in the middle of the night when the user isn't there to watch them, then they'll overwrite data automatically)
One of the torrents starts downloading, creating a sub-folder called "Centos 8" with a file in it called "Centos8.iso"
Many, many hours later this torrent is completed! Joy!
A second torrent starts to download, creating a folder called "Centos 8", with a file in it called "Centos8.iso"...
Downloaded data is now permanently gone.

There appears to be two ways this goes wrong:

One torrent completes, then a second one starts, and when it completes, it overwrites the first one
Two or more torrents download at the same time, overwriting each other as they go along

Note: all the torrents I had problems with yesterday seemed to contain folders themselves, so the sub-folder that was created was contained within the torrent - it was not auto-created from the torrent name. I am unsure if this is a relevant factor. The problem is the same though - qBittorrent should not overwrite files or folders - it should auto-rename them, or a worse solution, put that torrent into an error state and pause it until the user can come fix the problem.

FranciscoPombal commented 4 years ago

Well, I guess qBittorrent should not silently overwrite data like this without at least some kind of warning just because the user changed an option that does not really imply that such a destructive action could happen.

glassez commented 4 years ago

just because the user changed an option that does not really imply that such a destructive action could happen.

What "option" do you talk about? The problem is the user add multiple torrents with same-name files/folders to be stored in the same location.

Let's say you're using your internet browser, and you decide to download a picture. It doesn't prompt you for any file collision, and instead just overwrites some other important photo you already had saved. Would you be ok with that?

No, I'm not OK with it. But it's incorrect example. Pointing torrent to the location with existing files is regular case. There can be files previously downloaded (fully or partially) by the same or other torrent or even obtained from other sources but matching (fully or partially) the files in the torrent so that the user may want to use them...

FranciscoPombal commented 4 years ago

@glassez

Pointing torrent to the location with existing files is regular case. There can be files previously downloaded (fully or partially) by the same or other torrent or even obtained from other sources but matching (fully or partially) the files in the torrent so that the user may want to use them...

Yes, allowing existing files to be is used supports use cases such as "cross-seeding" for example. But why not ask the user first? That way its still possible to use existing files for "cross-seeding" and also prevent overwriting when that is not desirable.

ned-martin commented 4 years ago

@glassez

Pointing torrent to the location with existing files is regular case. There can be files previously downloaded (fully or partially) by the same or other torrent or even obtained from other sources but matching (fully or partially) the files in the torrent so that the user may want to use them...

Yes, allowing existing files to be is used supports use cases such as "cross-seeding" for example. But why not ask the user first? That way its still possible to use existing files for "cross-seeding" and also prevent overwriting when that is not desirable.

If you want to allow existing files to be "used" by a torrent, then you should add some mechanism to do so - I'd suggest a menu to "import data to existing torrent" or similar.

Either way, this is not really the same situation.

In my situation, all torrents download to one location. All completed torrents are then moved to another location. That's pretty much the only options the UI gives as far as I can tell, but it seems to also be the only logical way to do things.

Here is a simple scenario to indicate why this is a big problem, which I believe is quite common:

User queues up 2 torrents to download and goes to bed. User knows both torrents are the same or similar data, though user doesn't know if they have the same filenames or not, but the user thinks that the chances of either download actually working is low because this seems to be a rare and hard to find torrent, so the user has to queue up both of them to increase the chances of either one actually downloading.
Note that it is not necessarily possible for the user (or the system) to know the exact file names of these torrents at this stage, because the torrents may be magnet links with no associated names or metadata, or the user might be using a third-party search scraper like Jackett or the python ones that are included with qBittorrent which only gives an overall name, not the exact torrent names. Either way, the user doesn't want to spend ages trying to research if the torrents have the same name as each other, because that's absurd - no other software the user has ever used would overwrite stuff like this, so the user assumes that qBittorrent will be able to correctly download torrents - that is, after all, its sole purpose.
In the middle of the night the two torrents find some peers, download their metadata, and suddenly they have the same name and/or contain the same files! The user is asleep and does not know this.
The torrents start to download. The user is still asleep, dreaming of how good it will be in the morning when the data is downloaded.
The torrents will overwrite each other!
The user wakes up to find out that one of the torrents completed in the middle of the night. It even sent the user an email to tell them it was completed! Yay!
Unfortunately, early in the morning before the user woke up, the other torrent then started downloading as well and overwrote the first torrent, and now both of them are at 3% downloaded, or one of them has gone into an error state of "missing files". The user has no data! This sucks!
The user is angry at qBittorrent because it's got such a crap bug that means it can't properly download torrents - pretty much the one job it's supposed to be able to do. The user could just give up and go use one of the many other torrent programs that don't have this ridiculous bug, but the user would like to try to improve the program so logs this bug (again - it's been logged many times before)
The user is really surprised that other people seem to find it hard to understand how this is a serious problem, but in the interests of fairness, the user would like to hear from other people how they use qBittorrent without this issue, and why they don't see how this is a serious issue? Is the user doing something wrong? Is there some way to use qBittorrent the user doesn't realise, where it can reliably download things without overwriting other things?

nokti commented 4 years ago

This is a situation that has happened to me several times, so it's not an uncommon occurence. If my memory si correct µTorrent 2.2.1 solves that problem by automatically adding a numerical to identical folders, like Folder (1) (or was it Folder.1). It would be very useful if qBit could do the same.

conkerts commented 4 years ago

Actually weird, that I searched for same filenames, but haven't found this bug report in the search.

I wonder what the probability is that different torrents have content with the same names...

I'm actually surprised that there are several resports of that especially in the last few months.

This is a situation that has happened to me several times, so it's not an uncommon occurence. If my memory si correct µTorrent 2.2.1 solves that problem by automatically adding a numerical to identical folders, like Folder (1) (or was it Folder.1). It would be very useful if qBit could do the same.

Yes, looks like it is indeed more common than I expected. At least some warning for the user would be nice and useful I mean there is no real problem if the files are supposed to be the same. But then only let one copy be actively downloaded.

In case with same file name and different checksum, just refuse to proceed or do it the utorrent way, yes, that sounds like an easy solution, too

ned-martin commented 4 years ago

I mean there is no real problem if the files are supposed to be the same. But then only let one copy be actively downloaded.

No, this would not be good. I often want to download the same file multiple times simultaneously. Just act like every other software on earth, and let the user choose what they want to download, without automatically overwriting anything else!

Two examples:

1: Downloading 2 same files on purpose:

I want to download a file, but sadly I can only find partially-seeded unreliable torrents for it. So I queue up more than one in the hope that later on more peers will come online. So I am now downloading 2 or more of the same file (but different torrents). I want all of these to keep trying to download because the chances of completion for a poorly-seeded torrent are quite low and it might take days or weeks.

2: Downloading 2 same files by accident

I want to download a file, but I only have magnet links. I don't know what the filename is inside the torrent as I don't have that info in the magnet link, so I queue up more than one magnet link in the hope that at least one will download. Then later when I'm asleep, the magnet links find some peers and it turns out they contain the same files - I had no way of knowing that when I queued it up and I can't deal with it as I'm asleep. The system needs to deal with it by renaming one.

conkerts commented 4 years ago

I mean there is no real problem if the files are supposed to be the same. But then only let one copy be actively downloaded.

No, this would not be good. I often want to download the same file multiple times simultaneously. Just act like every other software on earth, and let the user choose what they want to download, without automatically overwriting anything else!

Yeah, well I agree with you on the most parts, esp. your previously mentioned scenario happened exactly to me.

Here is a simple scenario to indicate why this is a big problem, which I believe is quite common:

1. User queues up 2 torrents to download and goes to bed. User knows both torrents are the same or similar data, though user doesn't know if they have the same filenames or not, but the user thinks that the chances of either download actually working is low because this seems to be a rare and hard to find torrent, so the user has to queue up both of them to increase the chances of either one actually downloading.

The point is what I meant: I have no clue about the libtorrent library behind it. But I never encountered something like cross seeding torrent files, etc. I don't think that libtorrent is able to download and write to the same file from multiple torrent files /swarms. But that would be actually a super cool thing, now that I think about it ... That would sort of require the torrent to sync / rescan the file every time a new part got written or something like that ... whatever, I think that would actually belong somewhere into libtorrent development.

The easiest solution is to just create a new folder and put the next torrent into the new folder, to avoid any conflicts. Then yes, the same file could be downloaded at the same time, but you're essentially downloading it twice, causing double the traffic in total. 🤷‍♂️

thalieht commented 4 years ago

Duplicate of #127

Det87 commented 4 years ago

What's so complicated about this? Why doesn't qBittorrent ask by default, if it finds a file with the exact same name whether to (1) recheck it, (2) leave it alone, and do a standard "download (1).exe" or (3) overwrite it?

The problem is the user add multiple torrents with same-name files/folders to be stored in the same location.

Okay, sir, but what does this matter? qBittorrent is not intelligent enough to do what file managers do when selecting a file and pressing Ctrl+C and Ctrl+V, or what browsers do when redownloading the same file?

ned-martin commented 4 years ago

What's so complicated about this? Why doesn't qBittorrent ask by default, if it finds a file with the exact same name whether to (1) recheck it, (2) leave it alone, and do a standard "download (1).exe" or (3) overwrite it?

I agree - nothing is complicated about this. It should have been fixed years ago.

However it is a bit more complicated than that, because lots of qBittorrent use is semi-automated (I don't want to wake up in the morning and find that nothing downloaded for the last 12 hours because it was sitting there waiting for me to click on a prompt, or that some third-party script or service has failed to work) these choices need to have a way of picking a default option - in my mind the simplest way to do this is to have the prompt auto-select a default option after x seconds, and have options within settings to select the default option (default: rename), whether to: prompt until user input, not prompt and do the default, or prompt but auto-select the default after X seconds (default).

In my opinion, during normal use, nothing should ever overwrite anything, ever, under any circumstance (and it should always put multiple files in a folder, and then rename the folder upon conflict (folder (1), folder (2), etc) rather than renaming the files themselves, to prevent having a situation where only some files are renamed), so I don't see the need for a prompt - just auto rename by default. I believe this is inline with how most people would expect most programs to work. I think a special case should then be created (i.e. a new menu option) to handle the situation where a user intentionally wants to connect existing files that don't have a torrent associated, to a new torrent (i.e. recheck it) - but it seems that other people must use qBittorrent differently to me so the above auto-prompt method would provide greater flexibility for more users.

The existing hash check when re-adding an identical torrent that's already been added would remain the same.

Other ways of handling this (which in my opinion are worse) could be to queue the prompts in some manner, so they don't block other downloads and can be dealt with later, or auto-handle conflicts (i.e. rename or store in a special conflicts sub-folder using the hash) but keep a record of them (along with the original name info etc.) and queue those so they could be resolved later, or even the super simple fix - do everything as normal but instead of ever overwriting anything, put anything that conflicts into its own unique hash folder and log a warning for manual intervention later.

Keep in mind that you don't always know what the file/folder names are when you start a torrent - for example, when adding a magnet link without any seeds.

Note also that if moving to a new location upon completion is enabled the final move location would need to be factored into the checks when the download initially starts, which gets more complicated if the user changes the eventual save location (such as changing the category) halfway through downloading...

The problem is the user add multiple torrents with same-name files/folders to be stored in the same location.

Okay, sir, but what does this matter? qBittorrent is not intelligent enough to do what file managers do when selecting a file and pressing Ctrl+C and Ctrl+V, or what browsers do when redownloading the same file?

Remember that the user may not know what the filenames are so they might not have meant to do that. I often download magnet links that don't immediately find any seeds, so I have no idea what their filename/s are, and I add multiple magnet links for similar items, as that increases the chances of one of them finding seeds. Then when I am asleep, they may find some seeds, and only then do they get their filenames. If two then happen to have the same file or folder name (which is common), then they will overwrite each other.

Det87 commented 4 years ago

K, but I feel like your posts are too long for ppl to get into this.

Vukodlak commented 3 years ago

Would it be possible to add an option to create a folder for each different torrent and to add the first 8 characters of the torrent Hash to the either the start of the end of the folder's name (user preference)?

For example, let's say you want do download a torrent for the movie "Scarlet Street (1945)". It's Hash is "12345678... ", and it has a folder with the movie file and some subtitles in it: Folder: "Scarlet Street (1945)" Inside the folder: "Scarlet Street (1945).mp4" "Scarlet Street (1945).srt"

qbittorrent could then create either of these two folders names, depending of the user's preference: "12345678-Scarlet Street (1945)" or "Scarlet Street (1945)-12345678"

Now, if the original torrent only contains one file, like "Scarlet Street (1945).mkv", with the Hash "ABCDEFGHI...", then it would have to create a folder for this one file inside the download folder (maybe trim down very long file names), like this:

"ABCDEFGHI-Scarlet Street (1945)" or "Scarlet Street (1945)-ABCDEFGHI"

If a torrent has a very long name, like "I Killed My Lesbian Wife, Hung Her on a Meat hook, and Now I Have A Three Picture Deal at Disney (1993) H264 AAC.mp4", the longer the name the more likely is is to create an error, so I imagine it should be trimmed down to something like the first 8 Hash characters and the first 30 characters of it's name, like this:

"ABCDEFGHI-I Killed My Lesbian Wife, Hu" or "I Killed My Lesbian Wife, Hu-ABCDEFGHI"

I believe utorrent or qbittorrent used to work like this, adding part to the hash to the torrents folder names, for a while. It would be good to have something like that for people who download multiple torrents and have trouble with this.

Det87 commented 3 years ago

Cool torrent names.

pxssy commented 3 years ago

Is this fixed yet? Why was this not implemented in v4.3.0?

Its a rather easy fix isn't it? It already does a checksum first when an existing file already exists with the same name. i mean if its the same name and 0% identical, you'd think its a pretty good sign its not the the same file in the torrent and should at the very least throw an error/inform the user.

glassez commented 3 years ago

Well, I have analyzed this Issue. Here are my current findings. We are not talking about the presence of files on disk, but only about the presence of files with the same names in the processed torrents. So when another torrent is added, we need to check its file names to match the file names of existing torrents. If the name of a file is already occupied, then we must modify it in some way to exclude a match. The only question here is whether we should modify the names of all the torrent files, or just the name of the root folder. Applying such a processing is not a problem if the torrent you are adding has metadata immediately. But what should we do if you add a torrent using a magnet link? We have to apply this processing between the moment when the metadata was uploaded and the one when libtorrent started downloading files. Here again there is a problem of lack of low-level interaction with libtorrent. After all, when we get metadata_received_alert, we are already "out of business". We can, of course, set stop_when_ready flag on the torrent being added so that it stops before downloading and we can process the files, but unfortunately, it is triggered before downloading metadata. The only thing that comes to my mind is to track the transition into downloading_metadata state using the torrent_plugin, and set the stop_when_ready flag at this point. But it seems that we will have to run force_recheck after renaming files, because the torrent will already be checked before it is paused by stop_when_ready flag. Of course all this stuff can be implemented in libtorrent itself (but we can't hope for that). @arvidn, are there any thoughts?

conkerts commented 3 years ago

Woa, I had some fear that there would be some issue with the library, but didn't expect it to be that complex. At this point, I'd even say it's not really qBittorrent's fault/bug, if the libtorrent API lacks those features. Should we as users file this upstream issue to libtorrent ? (Or is there already one? ) It sounds like this is the most sensible thing to do. Also then other clients based on libtorrent should have exactly the same problem ?

glassez commented 3 years ago

Also then other clients based on libtorrent should have exactly the same problem ?

They shouldn't necessarily have the same problem if they implement this feature in application level. I'm thinking of doing it the same way when the prerequisite core changes (#13123) are merged to allow performing some service actions behind the scenes.

glassez commented 3 years ago

Well, when it is implemented, what exactly should it handle, each individual file or the top-level items only?

article10 commented 3 years ago

"The only question here is whether we should modify the names of all the torrent files, or just the name of the root folder."

The name of the root folder of the newly added torrent should be changed (add 'foldername (1)', (2), etc.) if there is a collision.

In principle, the names of files inside a torrent should not be changed (that would break multi-part rar files etc., file references in .bat files, or even in .exe files to dlls etc).

melicrom commented 3 years ago

Two parts: My experience and stream-of-conciousness about a fix (which I don't think is a simple as it may look at first glance)

My experience

This has happened to a couple times, a few years apart with certainly different versions. Long enough that I forgot it even happened until the ahh-ha moment. One was today on 4.3.3.

In all cases it was a lazy torrent creator who made multiple torrents of some type of "sequential" or "similar in general name, but different in substance" data while leaving the base torrent Name the same. The torrents were likely created one-after-another and it took me a long time to abstractly understand what was going on just by looking at the interface especially because all had low seeds and the overwrite happened slowly. Compounding confusion as others noted is that after an individual file (or parts?) in the torrent has been downloaded it isn't checked again unless you force it, so completed files and in-progress files were actually restarted or overwritten even though they were still listed 100% and Completed in a different torrent. Also, individual files would sometimes say 0% after re-check, but they would be perfectly openable, readable and not corrupt (downloaded from a different torrent); other times they would not be openable.

Empirically encountered cases where filenames (again, from lazy people) were the same.

A parted audiobook where files are just sequential numbers with no extra info and the torrent name is 'someaudiobook' and each part restarts the numbering scheme.
In a lazier example, different audiobooks from the same series that were uploaded at the same time with the same generic torrent name like 'audiobook' and individual files with same sequence of numbers as filenames.
A multilevel course that talks about the same topics but at different levels or 'depth', something that is especially common with languages where filenames cover the topic. Think 'grammar.pdf', 'vocabulary.pdf' etc. but with the same torrent 'name'.

Maybe the real problem in my observed cases is that trackers allow you to have a stupid torrent 'name' that doesn't match the title on the tracker or prevent placement of multiple torrents on the same tracker with the same top level 'name'. This would have eliminated all the cases I experienced.

Adding the "Downloaded" column confirmed my suspicions. I re-downloaded parts of all of these at least three times. Because of low seeds, uploader bandwidth, and NAT, completion time is days to weeks with huge variance between completion of each torrent. All 5, same name.

Fix?

Disclaimer: I know absolutely nothing about the code and it sounds like some of these may not be possible (yet or ever). Suppositions and possibly flawed logic based on my current understanding of how torrents work. Just trying to help.

"The only question here is whether we should modify the names of all the torrent files, or just the name of the root folder." The name of the root folder of the newly added torrent should be changed (add 'foldername (1)', (2), etc.) if there is a collision.

Agree this is generally accepted action from user perspective for the 'same name' situation (ex: same file downloaded multiple times in a browser) But I think there are some complicated corner cases. Sometimes the person may just be reseeding or it may be a partial or less common, a partially corrupted torrent where some files no longer open?

Consider these:

New torrent added and a folder already exists in filesystem with data. Some filenames match, but all hashes do not. Intention is to reseed. Hash could be different for a legitimate-other-torrent or a random reason like corruption. Perhaps measure 'same-ness'. If there's a lot of data present but the hashes are all different it's probably different data // different torrent.
New torrent added and folder already exists in filesystem with data. Intention is to download new torrent but current-folder-data has different hashes than new-torrent-data with some or all of the same filenames.
New torrent added and folder already exists in filesystem with data present, but no filename collisions for individual files. Existing files are not included in the torrent at all. Won't cause download problems, but torrent deletion with files could (mentioned this or one of the other closed threads).
User adds and starts downloading multiple new torrents that have the same 'name' (==same folder name when added) but different and/or same data and/or overlapping file names (some mix of case 2 and 3). Check needs to be done on each subsequent torrent metadata load. (they're all magnets with low seeds)

Multi-torrent-add (software has more info to make decisions) maybe append first or last part of the torrent hash as suggested by @Vukodlak previously ( could just be 4 or 5 digits) to second folder name after determining there would be file collisions between present torrents and just-downloaded torrent by hash comparison? Would also allow some checking on add to see if a folder matches with hash appended and with acceptably low collision considering this is already a corner case? Does this also cover the centos case @ned-martin mentioned above or could those have the same hash -> could be the same exact torrent and a user changes just the tracker? Also, could be intentional where you want to use the same underlying data as mentioned in thread. (not a case I ever had, but not sure if it's common)

File(s) exist (but don't belong to another torrent, not enough info to make decision) maybe re-check all files by hash and warn user that files exist with the same name, but hash data doesn't match the torrent and prompt to overwrite with a "This data is not from this torrent, are you sure you want to do this..." ? Maybe tell the user how much doesn't match? I don't know the mechanics of this but if there are a large number of file-pieces present and populated with data that don't match the hash, this is the only likely possible reason besides large-scale-computer-destroying-corruption?"

Also thank you @glassez for your work on the project, while doing search-dilligence to find this issue I saw your name a lot.

sakkamade commented 3 years ago

Duplicate of https://github.com/qbittorrent/qBittorrent/issues/127.

milahu commented 3 years ago

What's so complicated about this? Why doesn't qBittorrent ask by default, if it finds a file with the exact same name whether to (1) recheck it, (2) leave it alone, and do a standard "download (1).exe" or (3) overwrite it?

I agree - nothing is complicated about this. It should have been fixed years ago.

"should" ... hehe

a simple solution: download every torrent to (downloads)/(name)/(hash)/ putting the hash second allows to find the torrent by its name no need for renaming

lonecrane commented 2 years ago

To me, the problem is different, doesn't involving simultaneously downloading. It is that the so-called "cross-seeding" senario go wrong for some particular torrents. Actually I add the torrents one by one, ensuring the previously added ones have been finished and STOPPED. However, they will still reports "file missing" after restart qbittorrent.

FranciscoPombal commented 2 years ago

@lonecrane

Currently, cross-seeding only works correctly if the files are exactly the same or if torrent A only has additional files over torrent B, not any modified file. The underlying cause of the issue you describe is the same as this one, whether you start the download of conflicting torrents simultaneously or one after the other.

lonecrane commented 2 years ago

@FranciscoPombal No, before I restart qbittorrent, every torrent has been rechecked to 100% separately. Actually they point to the same files. However, they will report "file missing" after restart qbittorrent. The procedures are listed below, please note that all the files have been downloaded:

Add the 1st torrent, check "start torrent". Then it will go to recheck automatically and eventually it reaches 100%. The statistic panel shows the "Downloaded" is 0 bytes. Stop it.
Add the 2nd torrent which point to the same files, check "start torrent". It will reach 100% again, and the "Downloaded" is 0 bytes as well. Stop it.
Start the two torrents.
Restart qbittorrent. Then the two torrent report "file missing".

FranciscoPombal commented 2 years ago

@lonecrane

What you describe still has to do with this issue.

Add the 2nd torrent which point to the same files, check "start torrent". It will reach 100% again, and the "Downloaded" is 0 bytes as well. Stop it.

In the process of completing the second torrent, qBittorrent clobbers (some of) the files from step 1, but does not update the completion state of the first torrent to reflect that, because each torrent, their completion state, and their save location is treated independently per session.

Start the two torrents.

All that qBittorrent knows is that each torrent was completed at some point in the past. The checks it makes when resuming are not enough to detect the clobbering that took place in step 2. The result of this is that some of the pieces you seed will be garbage (those that were clobbered in each torrent).

Restart qbittorrent. Then the two torrent report "file missing".

When you restart, qBittorrent reads the .fastresume files and notices the file corruption in both torrents. Notice this corruption is unpredictable. In fact, I think it's even possible for only one torrent's files to be overwritten, but most often both will be affected in this case.

lonecrane commented 2 years ago

@FranciscoPombal Thanks. Now I agree that the problem I want to report is within the issue here. But I still don't quite catch what does "clobbers" mean. Could you explain this action? By the way, I would like to report some strange phenomena when do cross-seeding (before I restart qbittorrent). qbittorrent keeps append '.!qb' to some files even the related option has been unchecked.

EdwinKM commented 1 year ago

Can someone explain why the solution of @milahu seems/is so difficult to implement? (incomplete)/(name)/(hash)/

Or just create a unique subfolder for each torrent. Based on the torrents hash or a incremental counter (incomplete)/(counter)

When finished it can move the download folder to the "finished" folder with the "correct" name (based on the torrents filename or content or user input). In case of a collision at this point is easily fixable with appending a incremental number.

The migration for current users with ongoing downloads can cause some challenges. The program has to be backwards compatible.

milahu commented 1 year ago

Can someone explain why the solution of @milahu seems/is so difficult to implement? (incomplete)/(name)/(hash)/

implementation is easy, but this bug is so rare that only few people care. in most cases you can just re-download the lost files

the only challenge is to make this backward-compatible with the old filepath schema. a simple solution is to introduce a new set of download folders (temp + done), so you have, for example

~/Downloads/torrent/temp
~/Downloads/torrent/done
~/Downloads/torrent/temp_cas
~/Downloads/torrent/done_cas

"cas" as in content-addressed store the cas paths must be different from the non-cas paths

now the user can either/or

manually migrate his old files to the new schema
keep his old files and use both schemas in parallel

EdwinKM commented 1 year ago

@milahu , i am searching for a workaround. Both transmission and qbittorrent have the same issues.

Lets say my "complete downloads" path is "/host/complete". This is set automatically for each new download. If i add 2 torrents and i see that they will conflict. I change for the conflicting item to "/host/complete/download_01". Now the data is moved from "incomplete" to "complete".

For now i solved this by created a third folder "incomplete_custom". If i need to rename afterwards i change "/host/complete" to "/host/incomplete_custom/myfoldername". It should by default create a subfolder per torrent (with a uniq name) and it should never move the file (if the user is changing the destination) to the new location if the download is not finished.

Small update: Exactly my "do not move" is implemented in qbittorrent. This makes it worse for current qbittorent. I now can not download files without interference and rewriting each other data. With Transmission i can rename the destination so a (half baked) workaround is actually possible

Created a warning on the truenas forum users.

Scripter17 commented 1 year ago

I just put a $20 bounty on this issue using the bountysource link OP posted. Not sure if a bot's going to show up to say that or not, so I'm saying it myself

I run into this issue very often in exhentai (porn site) torrents, especially for galleries that are all the works of a certain artist. At the bare minimum I'd like qbittorrent to mimic the way Windows handles name conflicts (Folder, Folder (1), Folder (2), etc. and File.zip, File (1).zip, File (2).zip for torrents that are just one file). It may be worthwhile to allow the user to define the exact formatting of root folder/file names (using a syntax like strftime or the "run external program when added/finished" feature) so that they can implement other solutions (like using the torrent ID) more easily

milahu commented 1 year ago

this problem becomes even more interesting with v2 torrents

with v1 torrents, we have only one hash per torrent so its either one hash per directory, or one hash per file

with v2 torrents, we have one hash per file similar to other content-addressed stores like IPFS, git, perkeep, ...

BitTorrent v2 not only uses a hash tree, but it forms a hash tree for every file in the torrent.

Identical files will always have the same hash and can more easily be moved from one torrent to another (when creating torrents) without having to re-hash anything.

Files that are identical can also more easily be identified across different swarms, since their root hash only depends on the content of the file.

so ideally, we would have a machine-level content-addressed store where all files are stored by their hash, for example

$HOME/.torrent/v2/12/1234567890123456789012345678901234567890123456789012345678901234

v2 torrent hashes are sha256 hashes, so 32 bytes for the raw hash, and 64 bytes in base16 = hexdigest (or 43 bytes in base64, or 52 bytes in base32) (base64 is bad for file paths because its case-sensitive)

partitioning hashes by their first two characters (/12/) makes directory-listing cheaper

then the actual files in the download folder are hardlinked (or symlinked) to that store. collisions of file paths would still have to be handled somehow, but now its cheaper to copy and rename a directory, because it contains only links to the store

other CAS filesystems

ideally, this would integrate with other CAS filesystems (content-addressed stores) so the store paths could be

$HOME/.cas/sha256/12/1234567890123456789012345678901234567890123456789012345678901234

or

/cas/sha256/12/1234567890123456789012345678901234567890123456789012345678901234

this reminds me of my rant in nix sha256 is bug not feature. solution: a global /cas filesystem

https://en.wikipedia.org/wiki/Content-addressable_storage

CAS filesystem of git (hash is sha1dc of header + content)

.git/objects/12/34567890123456789012345678901234567890

CAS filesystem of "git2" (hash is sha256 of header + content)

https://stackoverflow.com/questions/60087759/git-is-moving-to-new-hashing-algorithm-sha-256-but-why-git-community-settled-on https://github.com/go-gitea/gitea/issues/13794

create git repo with git init --object-format=sha256

.git/objects/12/34567890123456789012345678901234567890123456789012345678901234

casync uses sha256, but its focus is on network transfers, not filesystem https://en.wikipedia.org/wiki/Casync

CAS filesystem of perkeep (sha224) https://github.com/perkeep/perkeep/issues/625

$HOME/var/perkeep/blobs/sha224/12/34/sha224-12345678901234567890123456789012345678901234567890123456.dat

CAS filesystem of bazel https://github.com/bazelbuild/bazel-buildfarm/issues/568

CAS filesystem of IPFS (but IPFS is just a slow version of bittorrent) block-centric (not file-centric), base32 hashes with non-standard alphabet (?) (block-centric feels wrong, because this gives deduplication only when appending to files, but fails when prepending or inserting to files... a block-level deduplication should be handled by the actual filesytem like btrfs or XFS or ZFS) https://docs.ipfs.tech/concepts/content-addressing/#cids-are-not-file-hashes https://cid.ipfs.tech/#QmY7Yh4UquoXHLPFo2XbhXkhBvFoPwmQUSa92pxnxjQuPU

$HOME/.ipfs/blocks/12/3456789012345678901234567890123456789012345678901234567.data

https://stackoverflow.com/questions/1903416/do-any-common-os-file-systems-use-hashes-to-avoid-storing-the-same-content-data

https://en.wikipedia.org/wiki/Single-instance_storage - Single-instance storage (SIS) is a system's ability to take multiple copies of content and replace them by a single shared copy. It is a means to eliminate data duplication and to increase efficiency.

https://en.wikipedia.org/wiki/Data_deduplication

https://github.com/google/casfs - Content-addressable storage, implemented over pyfilesystem2 (python) default sharding: depth=2 width=2

12/34/567890123456789012345678901234567890123456789012345678901234

https://github.com/131/casfs - local content-addressable file system (javascript)

https://github.com/andyleap/casfs - ? (go)

TODO: write a proof-of-concept bittorrent client in python, which can connect to multiple CAS filesystems (on multiple hard drives)

makeasnek commented 1 year ago

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

Please do not use bountysource. Many devs have had trouble getting paid there. You can check out this lemmy community as an alternative https://lemmy.ml/c/bugbounties

For statements from devs who have been unable to cash out from bountysource see: https://github.com/bountysource/core/issues

milahu commented 1 year ago

You can check out this lemmy community as an alternative https://lemmy.ml/c/bugbounties

or just pay the devs directly, instead of replacing one scam with another scam

makeasnek commented 1 year ago

You can check out this lemmy community as an alternative https://lemmy.ml/c/bugbounties

or just pay the devs directly, instead of replacing one scam with another scam

That's literally what the linked site is. It's a place for OSS projects to post bounties. The projects pay the devs who solve the bugs in those bounties. Like BountySource, but without the middleman. I started this lemmy community because I previously posted a lot of bounties on BountySource for a non-profit I worked for, until BountySource decided to just stop responding to withdrawal requests.

Scripter17 commented 1 year ago

If this issue gets resolved and bountysource doesn't pay out, I'll just give the money directly via paypal or something

arvidn commented 1 year ago

it would be possible to add a flag to libtorrent making it an error if any file already exists. I'm hesitant to do so though, because it's a pretty simple check for a client to do.

However, a more comprehensive check would be to see if a new torrent has any file that clashes with any existing torrent's potential file. That would require a more interesting data structure to be efficient. But it could also be made on the client side.

milahu commented 1 year ago

a more comprehensive check would be to see if a new torrent has any file that clashes with any existing torrent's potential file.

that would be a "merge torrents by default" behavior, which can be unwanted

Scripter17 commented 1 year ago

"Merge by default" can also be very wanted, as is often the case when people on exhentai upload torrents that are actual well-structured folders that all use the same format instead of lazily slapped together zip files. When there's updates you just download the new files and both torrents keep working

It's very important that the user (me) is able to choose that behaviour

arvidn commented 1 year ago

"merge by default" is the current (default) behavior. My reading of this ticket is that users would like "warn before overwriting" being the default, which is essentially the oposite

milahu commented 1 year ago

a simple solution: download every torrent to (downloads)/(name)/(hash)/

early early draft: add-option-custom-download-path-format

this would be simpler than "warn before overwriting" because it would "split by default" to avoid path collisions when (hash) is part of the format string, so the download process is not blocked, and we dont need more temporary files for "unnamed" files

one downside of my fix is that merging is more complex, because the file paths are more complex, but ideally, we let the user choose an external program (python script, or whatever) to organize the files, for example with symlinks, or by telling qbittorrent to move the files, or my moving files and telling qbittorrent to use the new location

NSQY commented 1 year ago

"merge by default" is the current (default) behavior. My reading of this ticket is that users would like "warn before overwriting" being the default, which is essentially the oposite

Yes, the client should never overwrite files by default. I have lost irrecoverable data due to accidentally overwriting existing files, and ultimately ended up killing the swarm as I was the last seeder.

milahu commented 1 year ago

TODO: write a proof-of-concept bittorrent client in python, which can connect to multiple CAS filesystems (on multiple hard drives)

done: https://github.com/milahu/cas_torrent

this stores files by the sha256 file hash = sha256 store bittorrent v2 root hashes are symlinks to the sha256 store = bt2r store human-readable files are symlinks to the sha256 store = bt2 store human-readable torrent names are directories with symlinks to files in the bt2 store = las store

directories are merged by default. files are renamed only when they have different content. using symlinks allows to "unmerge" directories with readlink. las symlinks target the bt2 store, so its visible "to what torrent does this file belong?" (this would not be visible when symlinking directly to the sha256 store)

excelgit commented 10 months ago

syml

Using symlinks or hardlinks would introduce another method to lose data. For computers those things are easy to handle (although data loss will appear if humans implement or interprete it wrong, or can't see the method is used, or a filemanager doesn't handle it correctly). For humans they are not. They will introduce new methods to lose data.

I think qBit should do this:

1: The root name of the download (either file or folder) should be renamed automatically if a new torrent/magnet sees that the root name is already in the incompletes folder. It could be renamed by adding the last 3 characters of the hash code. Not much more than 3; we don't want very long pathnames, which gives new issues.

2: When the download is ready, and automoved to the completed folder, again, there should be a check if the root name already exists, and if so, it should be - again - autorenamed.

Another method would be: autoinsert the starting time in the root name of new downloads, preferably only if it finds the root name is already in use. So: Linux.iso would be: Linux.iso + Linux.231130_185622(1).iso + Linux.231130_185622(2).iso if you start 3 torrents with the same name the same time. Or Linux.iso + Linux.231130_185622.iso + Linux.231130_185658.iso if not started at the same time.

Another method would be: if the root name already exists: ask the user if he wants to insert (a part of) the hash or timestamp in the root name.

Best would be: offer an option, so that the user can select 1 of the 4 methods: the current (wrong) method, or 1 or the 3 above. Always let the user have more control. That is always better. Anyway: the default should always be that a new torrent/magnet never overwrites anything, either in the incompletes or in the completed folder. That's about always rule#1, for any software!

BTW: not only the files themselves should be autorenamed, also the name of what qBittorrent says it is downloading in the GUI. Otherwise, again, there will be made mistakes and data loss will appear. This reminds me of another problem: the name of the torrent/magnet is often other than the root name of the file/folder of what it is downloading. But that is an issue for another thread.

milahu commented 3 months ago

workaround in qbittorrent-move-to-cas.py move all finished torrents to ~/cas/btih/{btih}

qbittorrent / qBittorrent

4.2.5 overwrites files if file names are the same #12842

qBittorrent version and Operating System

What is the problem

What is the expected behavior

Steps to reproduce

My experience

Fix?

other CAS filesystems

2: When the download is ready, and automoved to the completed folder, again, there should be a check if the root name already exists, and if so, it should be - again - autorenamed.