y500 / libtorrent

Automatically exported from code.google.com/p/libtorrent
0 stars 0 forks source link

Deadlock libtorrent with custom extension #322

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1.write a simple stream_plugin which extends torrent_plugin like that: (detail 
in attached file stream.cpp)
    void tick() {
        int cancel_piece = -1;
        while (cancel_piece_task->pop(&cancel_piece)) {
            LOG_DEBUG("cancel piece[%d]", cancel_piece);
            cancel_request(cancel_piece);
        }
    }

    void cancel_request(int piece_idx) {
        piece_picker& picker = m_torrent.picker();
        const std::vector<piece_picker::downloading_piece> &m_downloads =
                picker.get_download_queue();
        std::vector<piece_picker::downloading_piece>::const_iterator iter =
                std::find_if(m_downloads.begin(), m_downloads.end(),
                        piece_picker::has_index(piece_idx));
        if (iter == m_downloads.end() || iter->writing + iter->finished == 0)
            return;

        int num_blocks_in_piece = picker.blocks_in_piece(piece_idx);
        int counter = iter->requested;
        for (int i = 0; i < num_blocks_in_piece; ++i) {
            const piece_picker::block_info& info = iter->info[i];
            if (info.state == piece_picker::block_info::state_requested) {
                picker.abort_download(piece_block(piece_idx, i));
                --counter;
                if (counter == 0) break;
            }
        }
    }
2. call set_piece_deadline on piece
3. if piece timeout in 20s then cancel_request via push pieceIdx to 
stream_plugin::cancel_piece_task
4. call torrent::status to monitor download progress

What is the expected output? What do you see instead?
system sometime got deadlock

What version of the product are you using? On what operating system?
Windows 7, libtorrent 0.15.10

Original issue reported on code.google.com by tronglon...@gmail.com on 11 May 2012 at 4:15

Attachments:

GoogleCodeExporter commented 8 years ago
i got deadlock when test with attached torrent file

Original comment by tronglon...@gmail.com on 11 May 2012 at 4:19

Attachments:

GoogleCodeExporter commented 8 years ago
which threads are locked and which mutexes are they fighting over?

Original comment by arvid.no...@gmail.com on 11 May 2012 at 5:01

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
main thread which calls torrent::status wait for lock of session_impl::m_mutex
when deadlock occur my CPU is about 24% (normally 4%). Using process explorer, 
it show thread which calls request_time_critical_piece() always consume CPU 24% 
(maybe it was not locked because calls stack on that thread change every time 
and consume CPU)  

Original comment by tronglon...@gmail.com on 11 May 2012 at 7:02

GoogleCodeExporter commented 8 years ago
Here's detail screenshots

Original comment by tronglon...@gmail.com on 11 May 2012 at 9:36

Attachments:

GoogleCodeExporter commented 8 years ago
i found what problem is. The piece_block A (belong to peer_connection P) when i 
clear it's flag download via piece_picker::abort_download. Libtorrent will loop 
forever on method torrent::request_time_critical_piece

    // loop until every block has been requested from this piece (i->piece)
            do
            {
                // pick the peer with the lowest download_queue_time that has i->piece
                std::vector<peer_connection*>::iterator p = std::find_if(peers.begin(), peers.end()
                    , boost::bind(&peer_connection::has_piece, _1, i->piece));

                // obviously we'll have to skip it if we don't have a peer that has this piece
                if (p == peers.end()) break;
                peer_connection& c = **p;

                interesting_blocks.clear();
                backup1.clear();
                backup2.clear();
                // specifically request blocks with no affinity towards fast or slow
                // pieces. If we would, the picked block might end up in one of
                // the backup lists
                m_picker->add_blocks(i->piece, c.get_bitfield(), interesting_blocks
                    , backup1, backup2, 1, 0, c.peer_info_struct()
                    , ignore, piece_picker::none, 0);

                std::vector<pending_block> const& rq = c.request_queue();

                bool added_request = false;

                if (!interesting_blocks.empty() && std::find_if(rq.begin(), rq.end()
                    , has_block(interesting_blocks.front())) != rq.end())
                {
                    c.make_time_critical(interesting_blocks.front());
                    added_request = true;
                }
                else if (!interesting_blocks.empty())
                {
                    if (!c.add_request(interesting_blocks.front(), peer_connection::req_time_critical))
                    {
                        peers.erase(p);
                        continue;
                    }
                    added_request = true;
                }

                // TODO: if there's been long enough since we requested something
                // from this piece, request one of the backup blocks (the one with
                // the least number of requests to it) and update the last request
                // timestamp

                if (added_request)
                {
                    peers_with_requests.insert(peers_with_requests.begin(), &c);
                    if (i->first_requested == min_time()) i->first_requested = now;

                    if (!c.can_request_time_critical())
                    {
                        peers.erase(p);
                    }
                    else
                    {
                        // resort p, since it will have a higher download_queue_time now
                        while (p != peers.end()-1 && (*p)->download_queue_time() > (*(p+1))->download_queue_time())
                        {
                            std::iter_swap(p, p+1);
                            ++p;
                        }
                    }
                }

            } while (!interesting_blocks.empty());

When we select fast peer to download deadline piece, the peer_connection P was 
choice again. So P contain piece_block A, it will call 
peer_connection::make_time_critical (no call add_request which will mark 
piece_block A to requested state). peer_connection::make_time_critical do 
nothing with piece_block state, so piece_block A state is still none -> 
piece_picker::add_block will pick A again and again -> loop forever

Original comment by tronglon...@gmail.com on 11 May 2012 at 10:35

GoogleCodeExporter commented 8 years ago
simple workaround: call piece_picker::mark_as_downloading on method 
peer_connection::make_time_critical

Original comment by tronglon...@gmail.com on 11 May 2012 at 10:47

GoogleCodeExporter commented 8 years ago
thanks!

Original comment by arvid.no...@gmail.com on 12 May 2012 at 12:40

GoogleCodeExporter commented 8 years ago

Original comment by arvid.no...@gmail.com on 12 May 2012 at 12:40

GoogleCodeExporter commented 8 years ago
I'm not sure that's correct. Whenever make_time_critical() is called, the block 
is already in the request queue, and the block should already have been marked 
as downloading in the piece picker. It might be appropriate to add an assert in 
make_time_critical() that the block is in fact marked as downloading in the 
picker.

Original comment by arvid.no...@gmail.com on 12 May 2012 at 6:07

GoogleCodeExporter commented 8 years ago
Yes, in normal case then it's true. But in the case u write torrent_plugin when 
call picker::abort_download on deadline piece_block A then problem maybe occur. 
Example, we got following sequence actions:
1. set piece deadline for piece P
2. on method torrent::request_time_critical:
 _ peer_connection C add piece_block A to request_queue (A was marked as downloading in piece picker)
 _ peer_connection C send request download but due download_queue is full so A still in request_queue.
 _ finish.
3. wait for 1 second torrent::on_tick() was called again
 _ first custom torrent_plugin was called and it will call piece_picker::abort_download on piece_block A -> A is in request queue but not mark as downloading.
 _ on method request_time_critical_piece will loop forever on piece_block A

Original comment by tronglon...@gmail.com on 12 May 2012 at 2:11

GoogleCodeExporter commented 8 years ago
I don't think you should just call abort_download() on the piece picker. that 
invalidates the invariant between the picker and the peer connections. What 
effect are you looking for by calling abort_download()?

I imagine a more appropriate function to call is 
peer_connection::cancel_request().

Original comment by arvid.no...@gmail.com on 12 May 2012 at 9:18

GoogleCodeExporter commented 8 years ago
i need stream video file so i using set_piece_deadline. But set_piece_deadline 
only take care select fast peer in current peers list to download, it don't 
check timeout when deadline piece was downloaded slowly. I see 
time_critical_piece::last_requested, it can be used for timeout but it was not 
used (so sad). So i write custom plugin to cancel request on timeout piece. I 
tried use peer_connection::cancel_request() but piece was not cancel 
immediately. It's depend on implementation of peer_connection::write_cancel() 
and in case bt_peer_connection::write_cancel it depend on 
bt_peer_connection::m_supports_fast and other things.
I need the way which re-download deadline piece if it's timeout

Original comment by tronglon...@gmail.com on 13 May 2012 at 3:44

GoogleCodeExporter commented 8 years ago
I see. Maybe there should be a force-flag to cancel_request, that will actually 
release it from the piece picker. That would be a pretty simple patch. Once 
that's in there, maybe it wouldn't be that hard to actually complete the logic 
in libtorrent to re-request time critical pieces (as the TODO comment suggests).

Original comment by arvid.no...@gmail.com on 16 May 2012 at 6:42

GoogleCodeExporter commented 8 years ago
Thanks :)

Original comment by tronglon...@gmail.com on 17 May 2012 at 1:49

GoogleCodeExporter commented 8 years ago
Does this patch help you fix your plugin?

Index: include/libtorrent/peer_connection.hpp
===================================================================
--- include/libtorrent/peer_connection.hpp      (revision 6957)
+++ include/libtorrent/peer_connection.hpp      (working copy)
@@ -506,7 +506,9 @@
                // sends a cancel message if appropriate
                // refills the request queue, and possibly ignoring pieces requested
                // by peers in the ignore list (to avoid recursion)
-               void cancel_request(piece_block const& b);
+               // if force is true, the blocks is also freed from the piece
+               // picker, allowing another peer to request it immediately
+               void cancel_request(piece_block const& b, bool force = false);
                void send_block_requests();

                int bandwidth_throttle(int channel) const
Index: src/peer_connection.cpp
===================================================================
--- src/peer_connection.cpp     (revision 6957)
+++ src/peer_connection.cpp     (working copy)
@@ -2898,7 +2898,7 @@
                }
        }

-       void peer_connection::cancel_request(piece_block const& block)
+       void peer_connection::cancel_request(piece_block const& block, bool 
force)
        {
                INVARIANT_CHECK;

@@ -2946,6 +2946,8 @@
                TORRENT_ASSERT(block_size > 0);
                TORRENT_ASSERT(block_size <= t->block_size());

+               if (force) t->picker().abort_download(block, 
peer_info_struct());
+
                if (m_outstanding_bytes < block_size) return;

                peer_request r;

Original comment by arvid.no...@gmail.com on 6 Jun 2012 at 3:36

GoogleCodeExporter commented 8 years ago
What logic do you use to determine that a block request should be cancelled? 
It's not obvious what the right thing to do is in this case (at least not after 
spending a few minutes thinking about it).

Original comment by arvid.no...@gmail.com on 6 Jun 2012 at 4:02

GoogleCodeExporter commented 8 years ago
Thanks u for this patch. I think: cancelling block request depend on many 
elements. My main purpose is download piece completely, the in-complete piece 
is meaningless. So i cancel all requested-block in piece. First time 
torrent::m_average_piece_time and torrent::m_piece_time_deviation can be used 
for determine timeout. When a cancelled piece was requested to cancel again, 
the timeout should be (torrent::m_average_piece_time/requested-block). On the 
case streaming system via p2p which i'm building then it depend on current 
download rate of torrent and video's bit-rate.

Original comment by tronglon...@gmail.com on 7 Jun 2012 at 3:07

GoogleCodeExporter commented 8 years ago
Sorry after long time, now i apply ur patch which using 
peer_connection::cancel_request instead of piece_picker::abort_download(). But 
i got same problem due use of peer_connection::cancel_request with param force 
= true is not much different than piece_picker::abort_download(): piece_block A 
is till in request queue but was cleared mark as downloading => leads to case 
as above (https://code.google.com/p/libtorrent/issues/detail?id=322#c11)

Original comment by tronglon...@gmail.com on 12 Mar 2013 at 10:49