y500 / libtorrent

Automatically exported from code.google.com/p/libtorrent
0 stars 0 forks source link

Torrents remains at 99% complete but continues to download and never finishes #283

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
I have just seen one of my torrents endlessly downloading and although this 
issue has been seen before by myself and other forum users, it seems quite rare 
(certainly with 0.15) and unable to be reproduced.

I found the torrent in Deluge in the following state: 
Size: 348MB
Progress: 0.5% 
Downloaded: 2.2GB
Active Time: 1d7h

After a force recheck the progress percentage reset to 0%. 

I manually verified the torrents single video file and it looked to be complete 
so the percentage returned by force recheck is definitely wrong.

Restarted Deluge, performed another force recheck and the progress was reported 
as 99.34%. After this restart the torrent completes successfully.

This torrent progress percentage seems to match what other users are reporting 
where it manages 99% actual torrent progress (reported progress in Deluge 
varies wildly from this) but then after this point the torrent never completes 
and continues to download. 

libtorrent 0.15.9

Reference topics from Deluge: 
http://forum.deluge-torrent.org/viewtopic.php?f=7&t=37427
http://forum.deluge-torrent.org/viewtopic.php?f=7&t=35849
http://dev.deluge-torrent.org/ticket/1174

Original issue reported on code.google.com by caluml...@gmail.com on 26 Jan 2012 at 2:06

GoogleCodeExporter commented 8 years ago
I have had this issue as well, and not that rarely it is something I picked up 
on after switching to Deluge from Transmission. I have had it happen to me 
twice in the same week sometimes.

Size: 699MB
Progress: 30%
Downloaded:1.9GB

Forced recheck and progress reset to 0% and started over again.

libtorrent 0.15.9 under Debian

No other torrent client I have ever used has ever had so much excessive over 
download. I understand there is over head from duplicate chunks some times but 
nothing ever this large.

Original comment by dgree...@gmail.com on 17 Feb 2012 at 3:38

GoogleCodeExporter commented 8 years ago
Is the regression in progress caused by failed hash checks or redundant/wasted 
download?

It sounds like nothing passes the hash check for you, since rechecking brings 
progress back to 0%.

If anybody could provide me a .torrent file for a swarm they see this behavior 
in, I might be able to reproduce it which would be ideal for figuring out 
what's going on. Otherwise, building libtorrent with verbose logging and 
providing me with logs for a run where this happens would be extremely useful 
as well. Unfortunately this requires rebuilding libtorrent and define 
TORRENT_VERBOSE_LOGGING. If you attempt this, try to just download a single 
torrent which exhibits this issue, to keep the noise down in the logs.

Original comment by ar...@rasterbar.com on 17 Feb 2012 at 6:37

GoogleCodeExporter commented 8 years ago
"Is the regression in progress caused by failed hash checks or redundant/wasted 
download?"
I am not sure how to determine this in deluge.

I am not sure it is as easy to reproduce as torrent / swarm, I had this happen 
the other day with 3 downloads that where running all at once, each was on a 
different tracker and different content. These where 300-400MB torrents that 
EACH had each surpassed between 1.2-1.9GB of download EACH!

The randomness sort of precludes picking out a test torrent...

Original comment by dgree...@gmail.com on 20 Feb 2012 at 9:24

GoogleCodeExporter commented 8 years ago
I would be willing to compile for debug mode if a guide was available.... The 
single example torrent however is an issue... I have been having it happen with 
increased frequency, so the other day I tried coping the torrent, removing the 
torrent with data, then starting the torrent from the torrent file again, this 
time there was no issue...

In the deluge debug logs the only thing I note is a lot of hits on the ip block 
list, nothing libtorrent or download specific.

So here is a summary of observations:
I am using deluge 1.3.3 with libtorrent 0.15.9 under Debian, I have not seen 
this issue with transmission.
The issue does not appear to be torrent specific.
The progress is not consistent, the % reported has been between 30% to 60% ish 
and then the "downloaded" size continues to increase without progress % 
increasing.
The torrents are well seeded, these are large swarms with 100 - 1000 seeds.
No one tracker appears to be at fault, it has happened on several.
Forcing a hash check seems to reset all progress to 0% then the torrent most 
often download normally.
Removing the torrent and running it again most often is fine.

My GUESS is that there is a bug in the hashing functions.

Original comment by dgree...@gmail.com on 23 Feb 2012 at 8:04

GoogleCodeExporter commented 8 years ago
Does deluge not report the number of bytes that failed hash checks?

Does this seem to be more likely to happen on very fast swarms with few seeds? 
Are there any distinct properties of the swarms that fail that you have noticed?

to build with logging enabled, configure libtorrent like this:

   ./configure --enable-logging=verbose

Original comment by ar...@bittorrent.com on 24 Feb 2012 at 3:31

GoogleCodeExporter commented 8 years ago
As far as I know deluge only reports hash check fails in debug logging
mode, and changing modes restarts deluge thus causing the problem to go
away... I tried running in debug for a while but I was getting hundreds of
megs of logs a min on download that where working fine..

As stated before, all of these torrents are VERY well seeded and very large
(I also have a 50Mbit connection), last night 3 torrents had this happen
one was seeded around 200, the other two had over 3000 seeds.. After
forcing a recheck they downloaded in minutes where as they had been hung
all night.

If I have time I will try and build lib-torrent with verbose.

Original comment by dgree...@gmail.com on 24 Feb 2012 at 5:26

GoogleCodeExporter commented 8 years ago
Haven't had time to compile a debug, but here is some more strange behavior.

2 torrents report about 2.1Gb download for a 300MB torrent, their % is 
something like 30%, I force a recheck while they are running and they revert to 
1%.
I pause them for a few hours while looking up debugging in deluge.
I decide to force a recheck on the paused torrents again, they both jump from 
1% to about 99%, I stop pausing them and they complete within minutes.

Original comment by dgree...@gmail.com on 24 Feb 2012 at 10:25

GoogleCodeExporter commented 8 years ago
what filesystem are you using?

Original comment by ar...@bittorrent.com on 24 Feb 2012 at 11:32

GoogleCodeExporter commented 8 years ago
The partition for active torrents is ext3.

Original comment by dgree...@gmail.com on 24 Feb 2012 at 11:40

GoogleCodeExporter commented 8 years ago
If you check your logs you'll get lines such as this:
[DEBUG   ] hh:mm:ss alertmanager:123 hash_failed_alert: TORRENT_NAME hash for 
piece xxx failed

Rechecking the torrent will cause it to think every piece is wrong but if you 
restart deluge and check the file again it will recheck the file correctly and 
you won't lose any data. The report about it fixing itself if you wait long 
enough to recheck is odd, I guess the best thing to do is to have libtorrent 
spit out verbose debugging messages when doing a full hash check so you can 
compare it with the hashes in the .torrent file and see if there's a problem in 
generating the hashes or comparing them.

Original comment by longinu...@gmail.com on 26 Feb 2012 at 7:18

GoogleCodeExporter commented 8 years ago
On previous debug runs I was not seeing any hash fails, thus my confusion as to 
not seeing them in deluge... I have set up a specific log rotation for my 
deluge debug file so I should be able to run it continually now without filling 
my hard drive..

Maybe I can capture more data.

If I am going to compile libtorrent should I compile 0.15.9 with debug logging 
or should I go to trunk and see if the problem goes away? I was really looking 
forward to uTP support anyway.

Original comment by dgree...@gmail.com on 26 Feb 2012 at 10:05

GoogleCodeExporter commented 8 years ago
trunk is in good shape right now. I think it's fine to test it.

Original comment by arvid.no...@gmail.com on 27 Feb 2012 at 7:03

GoogleCodeExporter commented 8 years ago
I haven't recompiled libtorrent yet for logging, but at leased I managed to 
have deluge in full debug mode when some torrents got stuck.

It IS logging hash_failed_alert, in fact it is logging hash_failed_alert almost 
every 15 seconds. So apparently NO real accepted data is getting downloaded for 
the torrents?

My debug log for deluge is currently about 2 GB.

Forcing recheck caused progress to drop to 0% for both torrents.

Restarting deluge and then forcing re-check the progress of these jumped up 
considerably.

Deluge doesn't really log much for the resume / recheck...

Original comment by dgree...@gmail.com on 27 Feb 2012 at 4:06

GoogleCodeExporter commented 8 years ago
dgreekas: You can modify the alert mask for Deluge in 
deluge/core/alertmanager.py using 
http://www.rasterbar.com/products/libtorrent/manual.html#alerts as a guide. 

Also you could change log.debug to log.error in line 123 then alerts will be 
logged at error level '-L error' instead of debug so will help reduce log size.

Original comment by caluml...@gmail.com on 27 Feb 2012 at 4:43

GoogleCodeExporter commented 8 years ago
That is useful, however the hash fail alerts are firing every 15s for ONE 
torrent. They are constant... I am not seeing any ban notices so libtorrent is 
not auto banning any peers as far as I can see. It is the hash fails that are 
making the log so big.

Completely restarting deluge and rechecking seems to reset everything back into 
a normal download state.

The log does not contain the peer info so I am not sure if it is one peer or 
many.

Original comment by dgree...@gmail.com on 27 Feb 2012 at 4:51

GoogleCodeExporter commented 8 years ago
Another observation.. since I was debug logging in deluge I noticed the logs 
where very full of blocked peers... I removed the deluge blocklist plugin and 
low and behold it has been about two to three days now without any noticeable 
over downloading or hung torrents.

Is it possible that loading a large predefined block list interferes with the 
bad peer auto ban functions?

Still considering re-compiling libtorrent with debuging but it is going to have 
to wait till i have a large block of free time.

Original comment by dgree...@gmail.com on 1 Mar 2012 at 8:14

GoogleCodeExporter commented 8 years ago
That would only make sense if you were actually getting bad data. What's 
actually happening is that libtorrent is thinking the data is bad when it 
isn't. If you get a torrent stuck at some percentage, say 50%, and then restart 
and recheck the torrent will increase in percentage done because there was lots 
of data it had thrown out when it was failing every single hash check. The bad 
peer detection should also be working because you should notice the download 
getting slower and slower as it throws away more and more peers since they're 
all sending "bad" data.

Original comment by longinu...@gmail.com on 1 Mar 2012 at 11:13

GoogleCodeExporter commented 8 years ago
Is there any correlation with this problem and whether or not encryption is 
turned on?

i.e. maybe someone could try to disable encryption entirely and also enable it 
entirely, and see if that makes the problem go away and increase in likeliness 
to happen.

Original comment by ar...@rasterbar.com on 2 Mar 2012 at 6:12

GoogleCodeExporter commented 8 years ago
For reference I have been running with full stream encryption required

Original comment by dgree...@gmail.com on 2 Mar 2012 at 6:26

GoogleCodeExporter commented 8 years ago
I have had the issue reoccur, with the block list removed, however this 
indicates the frequency may be less often?

Original comment by dgree...@gmail.com on 5 Mar 2012 at 9:09

GoogleCodeExporter commented 8 years ago
I have same problems. But it build with TORRENT_DISABLE_ENCRYPTION on 0.15.9. 
It using on NTFS with largest_contiguous, disable_os_cache.

Original comment by SeanYu...@gmail.com on 6 Mar 2012 at 4:43

GoogleCodeExporter commented 8 years ago
So that gives us one report Encrypted Stream on Ext3 file system on linux and a 
second unencrypted on NTFS on windows?

Completely restarting deluge and then forcing a recheck correctly checks the 
data and restores large amounts of progress, however forcing a recheck before 
restarting reverts progress to 0%.

The issue seems to revolve around hash checks and for some reason something is 
different after an application restart?

Original comment by dgree...@gmail.com on 6 Mar 2012 at 3:41

GoogleCodeExporter commented 8 years ago
I think I might be seeing this bug from the perspective of another client.
I uploaded 2.39 GB to a libtorrent client (and 677 MB to another) but I only 
have downloaded 12 MB of the torrent in question and my uploaded stat doesn't 
reflect the amount of data I uploaded. Totally weird.

Original comment by csim...@gmail.com on 6 Mar 2012 at 7:47

Attachments:

GoogleCodeExporter commented 8 years ago
cs your screen shot indicates the torrents availability is less than 1 and you 
are not connected to any seeds, this means you would be uploading a lot more 
than downloading and would not be able to complete downloading till a seed or 
peer with your missing chunks appeared and was available. This looks like 
normal behavior for a torrent with so few peers

Original comment by dgree...@gmail.com on 6 Mar 2012 at 7:55

GoogleCodeExporter commented 8 years ago
I'm already at 100% (from my perspective) since I only selected 12 MB of the 
torrent. So I got disconnected from all seeds and should have been disconnected 
from all peers who don't need my pieces anymore, which for some reason didn't 
happen with the libtorrent based clients. Instead I'm stuck uploading 
apparently nothing (based in the uploaded stat in the upper ragion of the image 
which is at 151 MB), forver to them.

Original comment by csim...@gmail.com on 6 Mar 2012 at 9:35

GoogleCodeExporter commented 8 years ago
sorry i can't give you log. because It working perfectly on my computer. but a 
few users have problem who never give me log. i used libtorrent for patching my 
application that is not general torrent client.

Original comment by SeanYu...@gmail.com on 7 Mar 2012 at 7:16

GoogleCodeExporter commented 8 years ago
Tried to compile from source for debugging but gave up after a few hours... the 
autoheader step was generating errors and I just could not find a solution.

So I guess I can't provide any deeper feedback other than this is happening 
frequently again (seems to come and go) and if it where not for some great 
features in deluge I probably would have gone back to transmission by now.

Original comment by dgree...@gmail.com on 15 Mar 2012 at 5:36

GoogleCodeExporter commented 8 years ago
somebody wrote a python script to deal with this issue in a brute force way.

https://gist.github.com/2269840

Original comment by szig...@gmail.com on 1 Apr 2012 at 12:00

GoogleCodeExporter commented 8 years ago
Can confirm this with 0.15.10 and a recent deluge git clone. Torrent gets stuck 
at 100% (thats what deluge-gtk displays) and just keeps uploading until it 
magically finishes or I help it out. (Pausing and/or Restarting _typically_ 
does the trick.)

I see this regulary for large torrents that have dozens of large files.

Original comment by fnord.ha...@gmail.com on 2 Apr 2012 at 10:11

GoogleCodeExporter commented 8 years ago
Also affecting me, deluge 1.3.5, libtorrent 0.15.9

Original comment by Snow.K...@gmail.com on 21 Apr 2012 at 12:30

GoogleCodeExporter commented 8 years ago
Just to add my experience of this issue under deluge (and libtorrent 0.15.6)
I pause the stuck torrent
force recheck
percentage goes to 0%
restart deluge host
force recheck again
percentage now will go up to 98%+
then i can resume the torrent and it finishes fine

The above procedure happens with every single stuck torrent
deluge 1.3.4 (libtorrent 0.15.6) seemed to have it happening much more then 
previous
now on deluge 1.3.5, libtorrent 0.15.9 will see how it goes

Original comment by Craig.Ch...@gmail.com on 2 May 2012 at 8:40

GoogleCodeExporter commented 8 years ago
I have the same Problem,
using deluge under linux. 

I have it mostly automated so I only check the torrents from time to time or 
when Im missing a file that should be finished already.
Every now and then I have stuck torrents and only a restart of deluge + force 
recheck can fix it.

Some of them go even to 100% after recheck.
Some just advance a few percent.
Most go to 95+% and finish within a few minutes after restarting.

Original comment by phad...@gmail.com on 7 Jun 2012 at 8:09

GoogleCodeExporter commented 8 years ago
I ended up making cron job that starts deluge every night due to the frequency 
of it happening... 

Original comment by dgree...@gmail.com on 7 Jun 2012 at 6:48

GoogleCodeExporter commented 8 years ago
Hello! I am also having the same problem. When I re-check the downloads are 
completed they actually incomplete. I have to re-download and so on cyclically.

I think it's a failed hash check. That's because I installed the add-on 
"Pieces" in Deluge which shows the parts downloaded to a chart and when I 
re-check noticed that different parts are incomplete every time.

Is that the reason is the encryption of my personal folder in Ubuntu? When you 
install Ubuntu it asks if I want to encrypt my home folder. Will this be the 
problem?

Please someone help us!

This message was automatically translated into English.

Original comment by claudioj...@gmail.com on 8 Jun 2012 at 2:32

GoogleCodeExporter commented 8 years ago
I too believe this is caused by some bug triggering spurious hash check 
failures.

ways to try to narrow down what's causing it (assuming it's a libtorrent bug) 
would be to disable features and find a correlation with one or more features, 
that when turned off makes this problem go away. For instance, try turning off 
the disk cache, or tweak the cache line size, just disabling the read cache.

Another way to narrow this down could be to make a debug build of libtorrent, 
with asserts enabled. If an assert triggers, that might be an indicator of 
something going wrong that could be the root cause of the hash failures.

Original comment by arvid.no...@gmail.com on 8 Jun 2012 at 3:59

GoogleCodeExporter commented 8 years ago
HI There, 

I too have been experiencing this problem for about 6 months. I didn't know 
about this thread until today otherwise would have posted. 

I'm running the most recent stable release.

Is there any progress on the issue yet?

My problems are pretty much the same as everyone else here:

1. Torrent starts downloading
2. Checking on torrent reveals that it's downloading and uploading constantly 
but the progress bar remains the same
3. Force recheck changes progress to 0
4. Restart the deluge daemon and force recheck reveals some percentage greater 
than 0 (usually 80+)
5. Torrent completes normally

Now normally I wouldn't care too much, but if not monitored this can get really 
out of hand. I had a torrent go a few days without me monitoring it and it 
uploaded over 10GB of data - I confirmed this via network bandwidth monitoring.

Any advice or help would be appreciated.

Thanks,

Mike 

Original comment by the.mike...@gmail.com on 26 Jun 2012 at 3:38

GoogleCodeExporter commented 8 years ago
I too am experiencing this issue from 1.3.3 to 1.3.5. Linux ext2 and ext3 
filesystems. Just FYI, another way to fix it besides restarting deluge. You can 
remove the torrent (without removing the data). Then just re-add the torrent 
file (if you keep a copy of it locally) or just re-download the torrent 
file/magnet link and it will recheck the torrent and either be complete, or 
continue from where it got stuck.

Original comment by silverdu...@gmail.com on 7 Nov 2012 at 9:03

GoogleCodeExporter commented 8 years ago
This bug still present, any new word?

Original comment by wes...@gmail.com on 7 Apr 2013 at 5:05

GoogleCodeExporter commented 8 years ago
Sorry, to add

deluge: 1.3.6
libtorrent: 0.15.10

Original comment by wes...@gmail.com on 7 Apr 2013 at 5:08

GoogleCodeExporter commented 8 years ago
The bug is over for me when I updated the Ubuntu Python. It was something to do 
with it.

Original comment by claudioj...@gmail.com on 7 Apr 2013 at 6:56

GoogleCodeExporter commented 8 years ago
@claudio I am not sure whether that is relevant but you need to clarify what 
versions of Python, Ubuntu and libtorrent.

The pattern seems to suggest an issue in 0.15 and would be good if those 
running Deluge 1.3.6 could test with 0.16 to see if the issue still presents.

Original comment by caluml...@gmail.com on 7 Apr 2013 at 8:55

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
Same problem here on libtorrent-rasterbar 0.16.10
Using python 2.7.3
Seems to only happen with large torrents > 2GB but that could just be that the 
large ones have a higher chance of getting stuck. 

Original comment by cariser...@gmail.com on 9 Sep 2013 at 11:46

GoogleCodeExporter commented 8 years ago
@cariseren: do you know if it's caused by a persistent hash-failure?
Do you see hash failure alerts?
I would imagine peers getting banned each time it happens, and that you would 
eventually have banned the whole swarm (but for a large swarm that might take a 
while).

Is the piece that's failing in an .mp3 file or other audio file by any chance?
Sometimes this can happen because the only peers with the last few pieces run 
media player that corrupted the audio file (by modifying it's id3 tag).

Original comment by arvid.no...@gmail.com on 10 Sep 2013 at 2:51

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
@arvid: I am not seeing any warnings (using the python bindings, I'm using 
ses.pop_alert() which isn't giving me anything), but at the point of stalling, 
the num_peers drops to 0.  The thing here is that I am seeding this myself from 
another host, so this is not  case of some random seed on the Internet.  So it 
looks like it's just simply dropping the seed, or the seed is dropping this 
peer.  Both sides are using libtorrent via python bindings.

Perhaps there is a setting or combination of settings that would imply a 
trusted relationship to the seed and prevent it from failing out or dropping it.

Original comment by cariser...@gmail.com on 10 Sep 2013 at 6:18

GoogleCodeExporter commented 8 years ago
This sounds like a serious bug. Would you have any chance to build libtorrent 
with verbose logging (both on the seeder and the downloader) and send me the 
logs from a failed transfer (if possible, limit the test run to just the one 
torrent that fails).

Also, if you have logs of all the alerts to get, on the seed and downloader, 
I'd be interested in those as well. If you don't want to attach it in this 
ticket, you can email it to me at arvid@rasterbar.com

Original comment by arvid.no...@gmail.com on 10 Sep 2013 at 8:39

GoogleCodeExporter commented 8 years ago
My sincere apologies.  I did not see any alerts on the peer, but I just noticed 
alerts on the seed, and it is in fact an IO Error on the seed.  I won't go into 
the details, but it's not libtorrent's fault.   Thank you for your time. 

Original comment by cariser...@gmail.com on 10 Sep 2013 at 9:44

GoogleCodeExporter commented 8 years ago
Turns out it wasn't quite so simple.  There is an alert, occurring on the 
seeding session.  The alert says "() file too short".  After the alert, the 
num_peers drops to 0 on both sessions and progress stops.  It does not appear 
to retry or rectify the situation at all.  Any idea why this happens?  

I'm not writing to the file at all, and it's not corrupt, as I just created the 
torrent file from it moments earlier.  The problem is intermittent.  

Original comment by cariser...@gmail.com on 10 Sep 2013 at 10:58

GoogleCodeExporter commented 8 years ago
what operating system are you running the seed on? (it sounds like it may be 
windows).

The 2 GiB limit you mention makes it sound like perhaps trying to read from the 
file at offsets > 2 GiB fail for some reason. This could happen if the file API 
that's being used doesn't support large files, and possibly if the filesystem 
doesn't support it (but then I would expect the file to not be allowed to be 
large to begin with).

What filesystem is the seed reading from?

Original comment by arvid.no...@gmail.com on 10 Sep 2013 at 11:21