Downloading stucks at 99.9% and never finishes (the last piece problem?)

GoogleCodeExporter commented 8 years ago

Hello. I'm developer of the update system for client data of one MMORPG game. 
And I decided to use BitTorrent protocol and specifically libtorrent, as many 
other (game) projects do. (to share downloading traffic between clients.)
Downloading works perfect from my home computer, but there is a some problem on 
my working machine. May be because of our weird firewall's policies and NAT or 
something else, but at work, downloading of torrent with 51 files stops at the 
some piece of the last file, and so downloading never finishes. I can reproduce 
it very clearly - just by getting all files from the server, removing the last 
file, and then start to download - file check completes successfully, 
downloading of last file starts great, but at some point download speed just 
falls down to zero and despite of many-many subsequent attempts to connect (and 
then disconnect), again and again - downloading of that stupid single piece 
(near the end of last file) never ends. :(
I noticed, that it doesn't depend on piece size (I tried 1Mb and 256Kb) - there 
is always one single piece of the last file left undownloadable. But, I'll 
repeat, that at my home machine (and some others test computers of publisher 
and game operator) downloading works great, and also on my work machine if I 
start seeding client in the local network - downloading completes immediately.
Problem only occurs only on my (and other our employees' too) computer, where 
only available seed is a torrent client running (based on libtorrent 0.15.8 
like a downloading client too) on remote dedicated server (I tried to disable 
firewall on that server, both soft (Win) and h/w, so all ports were open to the 
ext Internet - but it doesn't help). On my work machine there is no open ports 
at all (I mean visible from Internet), but I expected that BibTorrent protocol 
can correctly handle such cases, furthermore downloading goes at maximum speed 
at most of the time, except ending part.
(I think, this may be a BibTorrent protocol problem, because the same situation 
repeats not only with libtorrent, but with uTorrent too! which I started on 
dedicated server and on my work machine, and when I tried to resume download of 
that last file of torrent it also stucks on downloading of that piece).

I put to attach all logs, that I was able to collect, hope it helps.
I defined both TORRENT_LOGGING and TORRENT_VERBOSE_LOGGING. I also tried to 
define TORRENT_CONNECTION_LOGGING, but it leads to hang the torrent client app. 
:)
Also I logged all alerts (with alert mask set to all_categories).

And I have Russian version of Windows at my work, so I recommend to use 
translate.google.com if you need translation of system messages. :) Sorry about 
that, but I don't know how to switch language of system messages (and Win 7 Pro 
doesn't have an option to switch system language). But, fortunately, server has 
English version of Windows. :)

Original issue reported on code.google.com by alextret...@gmail.com on 24 Oct 2011 at 3:22

Attachments:

logs.zip

GoogleCodeExporter commented 8 years ago

The last missing piece problem could always be fixed for me by restarting the 
client.
I haven't seen this problem anymore since the last attempt to fix it.

However, i have still one difference in my code in peer_connection::snub_peer() 
compared to trunk. Because imo the oldest requested block times out first.

snip:
...
#ifdef MOOPOLICE_CLIENT
  pending_block& qe = m_download_queue.front();
#else
  pending_block& qe = m_download_queue.back();
#endif
...

Original comment by webmas...@massaroddel.de on 24 Oct 2011 at 8:50

GoogleCodeExporter commented 8 years ago

It may be the case that this problem is in 0.15.x but fixed in trunk.

Original comment by arvid.no...@gmail.com on 25 Oct 2011 at 12:37

GoogleCodeExporter commented 8 years ago

I don't think this is related to the other "stuck at 99%"-issue. From what I 
can tell, the error that causes all connection failures is: ERROR_SEM_TIMEOUT 
(this happens both on the server and on the client).

Some googling suggests that it may be related to NATs or firewalls:

   http://social.msdn.microsoft.com/Forums/en-SG/wsk/thread/deffaa26-1987-490e-b9a8-a7905d4391f7

I'm not sure how reliable that source is though. On the server, there's another 
error on the socket first:

   An existing connection was forcibly closed by the remote host

which is then followed by the semaphore error. That error message suggests 
there was a RST packet sent to terminate the connection, which smells like a 
firewall thinking that the TCP flow is malicious, and kills it.

Is it always the same portion of the file (the same piece) that fails? Maybe 
there's some specific byte-pattern in that file that the firewall recognizes as 
a virus or worm?

So, the good news (for me) is that it does not seem to be a bittorrent or 
libtorrent problem (especially since no other client can download it either). 
However, you might want to run a virus check on the data you're distributing 
and/or obfuscate/compress the data.

Original comment by arvid.no...@gmail.com on 25 Oct 2011 at 2:30

GoogleCodeExporter commented 8 years ago

i don't think it is a firewall/nat issue, i've faced it too. it comes out in 
qbittorrent (which i use as my primary torrent client), and in my software. 
i've just made a workaround for it - i perform a check for all downloading 
pieces, and check their speeds. if piece is being idle for some timeout (5 sec 
in my software), i just drop all connections with peer which this piece is 
downloaded from (yes, i would like to drop just one connection, for this piece, 
but i haven't found out how to do it). this workaround seems ugly, but works 
pretty fine, especially on large swarms

Original comment by rain87...@gmail.com on 25 Oct 2011 at 7:56

GoogleCodeExporter commented 8 years ago

could you provide verbose peer logs as well for me to look at? to make sure 
you're actually seeing the same issue, and not just the same symptom.

Original comment by ar...@bittorrent.com on 25 Oct 2011 at 10:36

GoogleCodeExporter commented 8 years ago

Thanks you all to the (so fast) replies!

> The last missing piece problem could always be fixed for me by restarting the 
client.
Unfortunately, in my case this doesn't help and downloading stably just never 
finishes after any number of restarting attempts (from the beginning, i.e. from 
checking files).

> However, i have still one difference in my code in 
peer_connection::snub_peer() compared to trunk. Because imo the oldest 
requested block times out first.
I tried this, but it didn't help. :(

> It may be the case that this problem is in 0.15.x but fixed in trunk.
Unfortunately, updating to trunk didn't fix a problem. :(

> Is it always the same portion of the file (the same piece) that fails?
Yes. But sometimes, problem occurs on some other files (I'm just copy all files 
from origin, then delete a single one and start download - almost all files 
download successful, but when deleted file was the last one, the download don't 
want to finish (this may happen not only with deleting the last file, but with 
some other as well, but with last file it can be reproduced with 100% prob); 
unfortunately, I can't experiment a lot because of our Internet connection is 
not so fast and I can't just download gigabytes of data on each attempt).

> Maybe there's some specific byte-pattern in that file that the firewall 
recognizes as a virus or worm?
May be, but I don't know what is that pattern looks like and I can't depend on 
even if I can fix downloading on my working machine at my specific case with 
our NAT and firewall settings, what I will say to our customers if they faced 
with the similar problem?

> However, you might want to run a virus check on the data you're distributing 
and/or obfuscate/compress the data.
This last file is an executable, but unlikely it contains viruses, because it 
made by our build system. :)
And our update system should be able to transfer any possible combination of 
bits in transferred data (even if they looks like a virus :) ), so I do not 
even consider obfuscation of data as possible solution. :)

So, for now, as a workaround, I decided to use a url_seed (and also as a 
reserve update source) - but this is not always work fine (again, only on my 
working machine), i.e. even in that case (having 2 seeds: one libtorrent and 
one url) sometimes downloading stucks also, but restarting download app always 
helps in such cases. And version from trunk doesn't seem to work better, and as 
in 0.15.8, when downloading run from beginning, it stucks at the end, and I 
need to restart download because downloading speed is too low or even fall to 
zero (but even without restarting download continues at some time, and speed 
rises from zero).
So, looks like, url seeding gives a guarantee that downloading will finished 
some time, and this solves the problem at least for such specific curious case. 
:)

Original comment by alextret...@gmail.com on 25 Oct 2011 at 11:17

GoogleCodeExporter commented 8 years ago

> This last file is an executable, but unlikely it contains viruses, because it 
made by our build system. :)
> And our update system should be able to transfer any possible combination of 
bits in transferred
> data (even if they looks like a virus :) ), so I do not even consider 
obfuscation of data as possible solution. :)

I'm not suggesting that libtorrent or your software prevents certain bits from 
being transferred. I'm suggesting that your firewall may inspect flows and 
perhaps (with varying degree of success) block PE headers from being 
transferred in, from the untrusted outside. I'm not an expert on firewalls, but 
it doesn't seem entirely unlikely
that some firewalls have a feature like this, and that some sysadmins enable it 
(google doesn't let you attach executables to mails for instance).

So, assuming that "the internet" is a place of all sorts of crazy hardware 
trying to stop you from transferring certain bits, and you don't consider 
obfuscation a solution, I think there's probably not much you can do (short of 
not  transferring executables or not care about the 0.1% of users that might be 
affected by a firewall like that).

Maybe you can ask your sysadmin if there are any logs of the firewall killing 
your TCP flows, because that's what it looks like, and it might be nice to 
confirm.

Original comment by arvid.no...@gmail.com on 25 Oct 2011 at 8:30

GoogleCodeExporter commented 8 years ago

> Maybe you can ask your sysadmin if there are any logs of the firewall killing 
your TCP flows

We are using Kerio firewall, and when I began looking for ip-address of remote 
server in its logs, I immediately found a reason of blocking of TCP traffic in 
Kerio security log!
It was Intrusion Prevention System and it drops packets by the following rule:
IPS: Packet drop, severity: High, Rule ID: 1:2012086 ET SHELLCODE Possible Call 
with No Offset TCP Shellcode, proto:TCP, ip/port:188.127.234.226:6891 -> 
192.168.1.19:51673

So, I disabled a rule 2012086 in the IPS settings, and the problem disappeared!

Thanks a lot for the help!

Original comment by alextret...@gmail.com on 26 Oct 2011 at 7:24

GoogleCodeExporter commented 8 years ago

I found another workaround to avoid Kerio IPS to drop packets:

    pe_settings pes;
//  pes.in_enc_policy = pes.out_enc_policy = pe_settings::forced;//this line has 
no effect in my case (with or without the next one - doesn't matter)
    pes.allowed_enc_level = pe_settings::rc4;
    ses.set_pe_settings(pes);

Original comment by alextret...@gmail.com on 27 Oct 2011 at 11:46

GoogleCodeExporter commented 8 years ago

If the encrytion generates a matching pattern then the block would still be 
filtered out, right?

Maybe we need an algorithm (extension) to request/transfer a block in a 
different obfuscation after it failed to download several times.

Original comment by webmas...@massaroddel.de on 30 Oct 2011 at 8:52

GoogleCodeExporter commented 8 years ago

> If the encrytion generates a matching pattern then the block would still be 
filtered out, right?

I think, that probability of such occasion is very close to zero. I don't know, 
what means "plaintext encryption" in the "allowed_enc_level = 
pe_settings::both" (both = plaintext + rc4) which is set by default, but any 
data encrypted only with RC4 would unlikely match any filtering pattern. :)

> Maybe we need an algorithm (extension) to request/transfer a block in a 
different obfuscation after it failed to download several times.

And I think that adding such "an obfuscation extention" to the BitTorrent 
protocol only to bypass some stupid firewalls is not very good idea. :)
Moreover, at this time there is no bittorrent-clients which support such kind 
of obfuscation, and unlikely to appear in the foreseeable future.

Original comment by alextret...@gmail.com on 31 Oct 2011 at 7:01

GoogleCodeExporter commented 8 years ago

It looks like this is a description of the kind of rule set up by your firewall:

   http://www.networkforensics.com/2010/05/16/network-detection-of-x86-buffer-overflow-shellcode/

I can't imagine it being a good idea to try to work around these kinds of 
firewall rules at the bittorrent level. The encryption is the obfuscation in 
bittorrent, there doesn't seem to be a point in adding yet another one.

The distinction between RC4 encrypted stream and plaintext encryption is that 
the latter only adds the encryption handshake/padding at the beginning of the 
stream. It uses less CPU and works against bittorrent throttling that only 
looks at the first few packets of a TCP stream to classify it.

If you can think of any action I should take in libtorrent regarding this 
issue, please re-open it.

Original comment by arvid.no...@gmail.com on 31 Oct 2011 at 7:51

Changed state: WontFix

GoogleCodeExporter commented 8 years ago

hi

Original comment by fallah.m...@gmail.com on 18 Jan 2014 at 9:41

sa3paleasm / libtorrent

Downloading stucks at 99.9% and never finishes (the last piece problem?) #261