jlouis / etorrent

Erlang Bittorrent Client
BSD 2-Clause "Simplified" License
295 stars 50 forks source link

Crash on torrent near (on?) download completion #107

Open drdaeman opened 13 years ago

drdaeman commented 13 years ago

Hello.

I've checked out and built etorrent@510b4d68ee7bda76df6a1d7de1b7bbb997537532 today on Erlang/OTP R14B03 (with HiPE) using kerl on fresh Ubuntu 11.04 installation, then tried to download ubuntu-11.04-desktop-amd64.iso torrent.

Download process went seemingly fine with both WebUI and etorrent:l/1 reporting steady progress, but when the torrent was (almost?) completed, it just disappeared from the list with the following exception reported on console and in SASL log (full log available here: http://paste2.org/p/1455524)

=CRASH REPORT==== 6-Jun-2011::22:40:54 ===                                                                                                                                                                                                       
  crasher:                                                                                                                                                                                                                                       
    initial call: etorrent_scarcity:init/1                                                                                                                                                                                                       
    pid: <0.176.0>                                                                                                                                                                                                                               
    registered_name: []                                                                                                                                                                                                                          
    exception exit: {function_clause,[{gb_trees,get_1,[<0.3560.0>,nil]},{etorrent_monitorset,update,3},{etorrent_scarcity,handle_call,3},{gen_server,handle_msg,5},{proc_lib,init_p_do_apply,3}]}                                                
      in function  gen_server:terminate/6                                                                                                                                                                                                        
      in call from proc_lib:init_p_do_apply/3                                                                                                                                                                                                    
    ancestors: [<0.175.0>,etorrent_torrent_pool,etorrent_sup,<0.81.0>]                                                                                                                                                                           
    messages: []                                                                                                                                                                                                                                 
    links: [<0.175.0>]                                                                                                                                                                                                                           
    dictionary: []                                                                                                                                                                                                                               
    trap_exit: false                                                                                                                                                                                                                             
    status: running                                                                                                                                                                                                                              
    heap_size: 4181                                                                                                                                                                                                                              
    stack_size: 24                                                                                                                                                                                                                               
    reductions: 11767299                                                                                                                                                                                                                         
  neighbours:

At the same time, etorrent.log contained only

2011-06-06 22:28:33 : {checking_torrent,1}                                                                                                                                                                                                       
2011-06-06 22:28:36 : {started_torrent,1}                                                                                                                                                                                                        
2011-06-06 22:40:54 : {checking_torrent,1}

Download was broken and/or incomplete, because MD5 mismatched, but progress was >=99.6% last time I checked, so this happened on or very near to completion. After removing all files second attempt was with the same results (but different MD5 of the downloaded .iso). Unfortunately, Attempt to reproduce the bug a bit later, on another machine failed with download stuck on 99.957% for an hour (which is probably due to a lack of endgame mode).

Unfortunately, I'm quite new to Erlang/OTP, and have almost no understanding of etorrent's internals. How could I help in debugging this? Thanks.

jlouis commented 13 years ago

Thanks for finding this one!

I'm a bit pressed at the moment due to the Erlang Factory in London later this week (have to write slides, I do :) but when it is over, I want to get deeper into this. The code problem you are seeing is due to the lack of endgame support @klaar should have it done very soon though.

The Crash you are seeing is due to some fairly recent code which tries to keep track of the rarity of pieces. There seem to be a bug inside it that needs attention in the monitoring code.

And then there is another thing we need to address which is that it looks like the the torrent is not restarted correctly. We should probably do that. Until now, the decision has been to ignore restarts of torrents, but it would be important to get nicked. I'll have to think a bit about how we wanna do that though: We could either let the supervisor restart, or monitor it up higher in the supervision tree and then handle it from there.

ghost commented 13 years ago

This error occurs when a connection is not initialized with a bitfield, or equivalent message, it should be a rare case.

ghost commented 13 years ago

I just pushed a fix for this issue to https://github.com/klaar/etorrent/tree/add-piece-fix

Doing some test runs right now. You might want to clone that one and see if it works better.

With regards to endgame it'll be merged in right after erlang factory because erlang factory is more fun if you're not merging and re-testing code. :)

drdaeman commented 13 years ago

Thanks.

I've cloned add-piece-fix branch, ran it on the same Ubuntu torrent and it crashed right after the start. I've posted log here: http://paste2.org/p/1456613 (replaced pieces data with "[trimmed]," for compactness). Then it seems that Ubuntu tracker decided to temporary block or throttle me. I'll try again later.

Have fun at Erlang Factory! :)

jlouis commented 12 years ago

Has this beast been fixed, or does it persist?