removing bad peer... doesn't

reardonia commented 2 weeks ago

What is the issue?

Testing on Tr4 mainline.

related loosely to #6909 , when peers are removed due to connection error (handshake occured, but was terminated when one side or the other thinks both are seeds), an attempt is immediately made to reconnect. The peer is removed from active swarm but kept on the candidate atom list. This means the peer is retried endlessly, once every reconnect_pulse().

[2024-06-10 20:25:43.394] dbg TORRENTNAME valid peer 130.61.##.###:45000 from=2 (peer-mgr.cc:1428)
[2024-06-10 20:25:43.394] dbg TORRENTNAME found non-seed 130.61.##.###:45000 (peer-mgr.cc:463)
[2024-06-10 20:25:44.395] trc TORRENTNAME Starting an OUTGOING TCP connection with 130.61.##.###:45000 (peer-mgr.cc:2743)
[2024-06-10 20:25:44.395] trc 130.61.##.###:45000 new_outgoing()  (peer-io.cc:137)
[2024-06-10 20:25:44.395] trc net.cc:303 New OUTGOING connection 25 (130.61.##.###:45000) (net.cc:303)
[2024-06-10 20:25:44.395] trc 130.61.##.###:45000 socket (tcp) is 25 (peer-socket.cc:43)
[2024-06-10 20:25:44.395] trc 130.61.##.###:45000 bandwidth is 0x7fb3d8059b18; its parent is 0x5562ca8ea158 (peer-io.cc:78)
[2024-06-10 20:25:44.395] trc handshake 130.61.##.###:45000 sending MSE handshake (Ya) (handshake.cc:43)
[2024-06-10 20:25:44.395] trc handshake 130.61.##.###:45000 len(PadA) is 219 (handshake.cc:46)
...
[2024-06-10 20:25:44.396] trc handshake 130.61.##.###:45000 handling can_read; state is [awaiting yb] (handshake.cc:600)
[2024-06-10 20:25:44.396] trc handshake 130.61.##.###:45000 in read_yb... need 96, have 250 (handshake.cc:68)
[2024-06-10 20:25:44.396] trc handshake 130.61.##.###:45000 in read_vc... need 8, read 147, have 7 (handshake.cc:157)
...
[2024-06-10 20:25:45.387] trc handshake 130.61.##.###:45000 handling can_read; state is [awaiting vc] (handshake.cc:600)
[2024-06-10 20:25:45.387] trc handshake 130.61.##.###:45000 found ENCRYPT(VC)! (handshake.cc:165)
[2024-06-10 20:25:45.387] trc handshake 130.61.##.###:45000 len(PadB) is 154 (handshake.cc:168)
[2024-06-10 20:25:45.387] trc handshake 130.61.##.###:45000 crypto select is 2 (handshake.cc:194)
[2024-06-10 20:25:45.387] trc handshake 130.61.##.###:45000 len(PadD) is 281 (handshake.cc:203)
[2024-06-10 20:25:45.387] trc handshake 130.61.##.###:45000 PadD: need 281, got 281 (handshake.cc:216)
[2024-06-10 20:25:45.387] trc handshake 130.61.##.###:45000 read_handshake: need 48, got 0 (handshake.cc:242)
...
[2024-06-10 20:25:45.387] trc handshake 130.61.##.###:45000 handling can_read; state is [awaiting handshake] (handshake.cc:600)
[2024-06-10 20:25:45.387] trc handshake 130.61.##.###:45000 read_handshake: need 48, got 248 (handshake.cc:242)
[2024-06-10 20:25:45.387] trc handshake 130.61.##.###:45000 read_peer_id: need 20, got 200 (handshake.cc:332)
[2024-06-10 20:25:45.387] trc handshake 130.61.##.###:45000 peer-id is 'qBittorrent 4.3.9' ... isIncoming is false (handshake.cc:342)
[2024-06-10 20:25:45.387] trc handshake 130.61.##.###:45000 180 more bytes remain after handshake (handshake.cc:352)
[2024-06-10 20:25:45.387] trc TORRENTNAME 130.61.##.###:45000 [qBittorrent 4.3.9]: sending an ltep handshake (peer-msgs.cc:1090)
[2024-06-10 20:25:45.387] trc TORRENTNAME 130.61.##.###:45000 [qBittorrent 4.3.9]: sending 'ltep' 0 [] (peer-msgs.cc:859)
[2024-06-10 20:25:45.387] trc TORRENTNAME 130.61.##.###:45000 [qBittorrent 4.3.9]: sending 'fext-have-all' (peer-msgs.cc:859)
...
[2024-06-10 20:25:45.388] trc 130.61.##.###:45000 try_read err: n_read:0 errno:107 (Transport endpoint is not connected) have:180 (peer-io.cc:468)
[2024-06-10 20:25:45.388] dbg TORRENTNAME setting 130.61.##.###:45000 OUTGOING do_purge flag because we got [(107) Transport endpoint is not connected] (peer-mgr.cc:816)
...
[2024-06-10 20:25:45.388] trc TORRENTNAME purging peer 130.61.##.###:45000 because its do_purge flag is set (peer-mgr.cc:2211)
[2024-06-10 20:25:45.388] trc TORRENTNAME removing bad peer 130.61.##.###:45000 (peer-mgr.cc:2305)
[2024-06-10 20:25:45.388] trc 130.61.##.###:45000 disabling ready-to-write polling (peer-io.cc:554)
[2024-06-10 20:25:45.388] trc 130.61.##.###:45000 disabling ready-to-read polling (peer-io.cc:542)
[2024-06-10 20:25:45.388] trc 130.61.##.###:45000 in tr_peerIo destructor (peer-io.cc:204)
[2024-06-10 20:25:45.388] trc TORRENTNAME Starting an OUTGOING TCP connection with 130.61.##.###:45000 (peer-mgr.cc:2743)
[2024-06-10 20:25:45.388] trc 130.61.##.###:45000 new_outgoing()  (peer-io.cc:137)
[2024-06-10 20:25:45.388] trc net.cc:303 New OUTGOING connection 20 (130.61.##.###:45000) (net.cc:303)
[2024-06-10 20:25:45.388] trc 130.61.##.###:45000 socket (tcp) is 20 (peer-socket.cc:43)

Which application of Transmission?

transmission-daemon

Which version of Transmission?

Tr4 mainline

tearfur commented 2 weeks ago

See what you think.

diff --git a/libtransmission/peer-mgr.h b/libtransmission/peer-mgr.h
index 7c0b4915c..a32c80be5 100644
--- a/libtransmission/peer-mgr.h
+++ b/libtransmission/peer-mgr.h
@@ -207,9 +207,16 @@ public:

         if (is_connected_)
         {
-            num_consecutive_fails_ = {};
             piece_data_at_ = {};
         }
+        else if (piece_data_at_ != time_t{})
+        {
+            num_consecutive_fails_ = {};
+        }
+        else
+        {
+            on_connection_failed();
+        }
     }

     [[nodiscard]] constexpr auto is_connected() const noexcept

tearfur commented 2 weeks ago

We need to keep these seeds in the candidate list, and somehow discourage ourselves from trying them. Or else, we will rediscover these peers from trackers and whatnot sooner or later, then find ourselves trying them again.

The above patch deprioritises the peer candidate further and further every time a connection was established, but no piece data was transferred. Hopefully this will help bring the swarm to a equilibrium where most connection attempts amount to something.

transmission / transmission