Closed reardonia closed 3 weeks ago
I haven't looked into this in detail yet, just listing some possible regression commits off the top of my head. 6384abeb2be8347e8840c1db78dd27bdb973a7f5 7973d873ffadaa555beadf979e8a245de57e66e9
Hmm, just to confirm, can you point out the log snippets (<10 lines for each line) that makes you think that the connections are disconnecting after handshake?
I am guessing these:
[2024-06-01 00:04:53.303] trc %%NETWORK%%.147:56974 try_write err: wrote:0, errno:111 (Connection refused) (peer-io.cc:335)
[2024-06-01 00:04:53.303] trc handshake %%NETWORK%%.147:56974 handshake socket err: Connection refused (111) (handshake.cc:636)
[2024-06-01 00:04:53.303] trc %%TORRENTNAME%% marking peer %%NETWORK%%.147:56974 as unreachable... num_fails is 1 (peer-mgr.cc:1301)
for comparison, a trace of 51995ab succeeding:
[2024-06-01 17:04:56.998] trc [%%NETWORK%%.173]:59938 socket (tcp) is 23 (peer-socket.cc:36)
[2024-06-01 17:04:56.998] trc net.cc:302 New OUTGOING connection 23 ([%%NETWORK%%.173]:59938) (net.cc:302)
[2024-06-01 17:04:56.998] trc [%%NETWORK%%.173]:59938 enabling ready-to-read polling (peer-io.cc:505)
[2024-06-01 17:04:56.998] trc [%%NETWORK%%.173]:59938 enabling ready-to-write polling (peer-io.cc:517)
[2024-06-01 17:04:56.998] trc [%%NETWORK%%.173]:59938 libevent says this peer socket is ready for writing (peer-io.cc:356)
[2024-06-01 17:04:56.998] trc [%%NETWORK%%.173]:59938 libevent says this peer socket is ready for reading (peer-io.cc:480)
[2024-06-01 17:04:56.998] trc [%%NETWORK%%.173]:59938 enabling ready-to-read polling (peer-io.cc:505)
[2024-06-01 17:04:56.998] trc [%%NETWORK%%.173]:59938 handling canRead; state is [awaiting yb] (handshake.cc:682)
[2024-06-01 17:04:56.998] trc [%%NETWORK%%.173]:59938 got an encrypted handshake (handshake.cc:172)
[2024-06-01 17:04:56.998] trc [%%NETWORK%%.173]:59938 not enough bytes... returning read_more (handshake.cc:259)
[2024-06-01 17:04:56.998] trc [%%NETWORK%%.173]:59938 enabling ready-to-write polling (peer-io.cc:517)
[2024-06-01 17:04:56.998] trc [%%NETWORK%%.173]:59938 libevent says this peer socket is ready for writing (peer-io.cc:356)
[2024-06-01 17:04:57.995] trc [%%NETWORK%%.173]:59938 libevent says this peer socket is ready for reading (peer-io.cc:480)
[2024-06-01 17:04:57.995] trc [%%NETWORK%%.173]:59938 enabling ready-to-read polling (peer-io.cc:505)
[2024-06-01 17:04:57.995] trc [%%NETWORK%%.173]:59938 handling canRead; state is [awaiting vc] (handshake.cc:682)
[2024-06-01 17:04:57.995] trc [%%NETWORK%%.173]:59938 got it! (handshake.cc:265)
[2024-06-01 17:04:57.995] trc [%%NETWORK%%.173]:59938 crypto select is 2 (handshake.cc:291)
[2024-06-01 17:04:57.995] trc [%%NETWORK%%.173]:59938 pad_d_len is 0 (handshake.cc:301)
[2024-06-01 17:04:57.995] trc [%%NETWORK%%.173]:59938 pad d: need 0, got 204 (handshake.cc:318)
[2024-06-01 17:04:57.995] trc [%%NETWORK%%.173]:59938 payload: need 48, got 204 (handshake.cc:335)
[2024-06-01 17:04:57.995] trc [%%NETWORK%%.173]:59938 peer-id is 'Transmission 4.0.7 (Dev)' ... isIncoming is false (handshake.cc:432)
[2024-06-01 17:04:57.995] trc %%TORRENTNAME%% [%%NETWORK%%.173]:59938 [Transmission 4.0.7 (Dev)]: sending an ltep handshake (peer-msgs.cc:931)
[2024-06-01 17:04:57.995] trc %%TORRENTNAME%% [%%NETWORK%%.173]:59938 [Transmission 4.0.7 (Dev)]: sending 'ltep' 0 [] (peer-msgs.cc:813)
[2024-06-01 17:04:57.995] trc %%TORRENTNAME%% [%%NETWORK%%.173]:59938 [Transmission 4.0.7 (Dev)]: sending 'fext-have-none' (peer-msgs.cc:813)
[2024-06-01 17:04:57.995] trc [%%NETWORK%%.173]:59938 enabling ready-to-write polling (peer-io.cc:517)
[2024-06-01 17:04:57.995] trc [%%NETWORK%%.173]:59938 libevent says this peer socket is ready for writing (peer-io.cc:356)
[2024-06-01 17:04:57.995] trc [%%NETWORK%%.173]:59938 enabling ready-to-write polling (peer-io.cc:517)
[2024-06-01 17:04:57.995] trc [%%NETWORK%%.173]:59938 libevent says this peer socket is ready for writing (peer-io.cc:356)
[2024-06-01 17:04:59.000] trc [%%NETWORK%%.173]:59938 enabling ready-to-write polling (peer-io.cc:517)
[2024-06-01 17:04:59.000] trc [%%NETWORK%%.173]:59938 libevent says this peer socket is ready for writing (peer-io.cc:356)
[2024-06-01 17:04:59.000] trc [%%NETWORK%%.173]:59938 enabling ready-to-write polling (peer-io.cc:517)
[2024-06-01 17:04:59.000] trc [%%NETWORK%%.173]:59938 libevent says this peer socket is ready for writing (peer-io.cc:356)
[2024-06-01 17:04:59.992] trc [%%NETWORK%%.173]:59938 libevent says this peer socket is ready for reading (peer-io.cc:480)
[2024-06-01 17:04:59.992] trc [%%NETWORK%%.173]:59938 enabling ready-to-read polling (peer-io.cc:505)
[2024-06-01 17:04:59.992] trc %%TORRENTNAME%% [%%NETWORK%%.173]:59938 [Transmission 4.0.7 (Dev)]: read 126 payload bytes; 0 left to go (peer-msgs.cc:1760)
[2024-06-01 17:04:59.992] trc %%TORRENTNAME%% [%%NETWORK%%.173]:59938 [Transmission 4.0.7 (Dev)]: got peer msg 'ltep' (20) with payload len 126 (peer-msgs.cc:1395)
[2024-06-01 17:04:59.992] trc %%TORRENTNAME%% [%%NETWORK%%.173]:59938 [Transmission 4.0.7 (Dev)]: Got a BtPeerMsgs::Ltep (peer-msgs.cc:1624)
[2024-06-01 17:04:59.992] trc %%TORRENTNAME%% [%%NETWORK%%.173]:59938 [Transmission 4.0.7 (Dev)]: got ltep handshake (peer-msgs.cc:1242)
[2024-06-01 17:04:59.992] trc %%TORRENTNAME%% [%%NETWORK%%.173]:59938 [Transmission 4.0.7 (Dev)]: here is the base64-encoded handshake: ...
[2024-06-01 17:04:59.992] trc %%TORRENTNAME%% [%%NETWORK%%.173]:59938 [Transmission 4.0.7-dev]: peer's port is now 59938 (peer-msgs.cc:1110)
[2024-06-01 17:04:59.992] trc %%TORRENTNAME%% [%%NETWORK%%.173]:59938 [Transmission 4.0.7-dev]: got peer msg 'fext-have-all' (14) with payload len 0 (peer-msgs.cc:1395)
[2024-06-01 17:04:59.992] trc %%TORRENTNAME%% [%%NETWORK%%.173]:59938 [Transmission 4.0.7-dev]: Got a BtPeerMsgs::FextHaveAll (peer-msgs.cc:1569)
[2024-06-01 17:04:59.992] trc %%TORRENTNAME%% [%%NETWORK%%.173]:59938 [Transmission 4.0.7-dev]: got peer msg 'unchoke' (1) with payload len 0 (peer-msgs.cc:1395)
[2024-06-01 17:04:59.992] trc %%TORRENTNAME%% [%%NETWORK%%.173]:59938 [Transmission 4.0.7-dev]: got Unchoke (peer-msgs.cc:1431)
Note that I think your rewrite dropped the trace message when self-connects are dropped. It happens in read_peer_id(). I'm guessing that your logic for "connected_to_self" is wrong? Are you comparing the clientForId string rather than the actual over-the-wire peer-id ?
Personally I don't understand why we don't filter these as OUTGOING connection before they ever get generated. This has been Tr behavior forever, but seems non-sensical. ie if my listening port is the same as the destination port, and my global-address is the same, then don't generate.
Hmm, just to confirm, can you point out the log snippets (<10 lines for each line) that makes you think that the connections are disconnecting after handshake?
I am guessing these:
[2024-06-01 00:04:53.303] trc %%NETWORK%%.147:56974 try_write err: wrote:0, errno:111 (Connection refused) (peer-io.cc:335) [2024-06-01 00:04:53.303] trc handshake %%NETWORK%%.147:56974 handshake socket err: Connection refused (111) (handshake.cc:636) [2024-06-01 00:04:53.303] trc %%TORRENTNAME%% marking peer %%NETWORK%%.147:56974 as unreachable... num_fails is 1 (peer-mgr.cc:1301)
Yes, the OUTGOING MSE seems to get stuck at "read_vc". The INCOMING MSE seems fine.
Hmm yes of course the mistake was going to be so trivial, I love programming. ^^
@reardonia Please have a try with the patch below.
diff --git a/libtransmission/handshake.cc b/libtransmission/handshake.cc
index 63297198e..3a41cd9a3 100644
--- a/libtransmission/handshake.cc
+++ b/libtransmission/handshake.cc
@@ -536,10 +536,12 @@ ReadState tr_handshake::read_ia(tr_peerIo* peer_io)
* standard practice at this time is for it to be zero-length */
outbuf.add_uint16(0U);
+ // send it
+ peer_io->write(outbuf, false);
+
/* maybe de-encrypt our connection */
if (crypto_select_ == CryptoProvidePlaintext)
{
- peer_io->write(outbuf, false);
TR_ASSERT(std::empty(outbuf));
// All future communications will use ENCRYPT2()
Hmm yes of course the mistake was going to be so trivial, I love programming. ^^
@reardonia Please have a try with the patch below.
diff --git a/libtransmission/handshake.cc b/libtransmission/handshake.cc index 63297198e..3a41cd9a3 100644 --- a/libtransmission/handshake.cc +++ b/libtransmission/handshake.cc @@ -536,10 +536,12 @@ ReadState tr_handshake::read_ia(tr_peerIo* peer_io) * standard practice at this time is for it to be zero-length */ outbuf.add_uint16(0U); + // send it + peer_io->write(outbuf, false); + /* maybe de-encrypt our connection */ if (crypto_select_ == CryptoProvidePlaintext) { - peer_io->write(outbuf, false); TR_ASSERT(std::empty(outbuf)); // All future communications will use ENCRYPT2()
Yeah, that fixes it, but for the wrong reasons. Something changed in the rewrite of handshake that needs closer examination.
In read_ia you seemed to preserve all the logic from Tr3, including waiting to commit the write() based on falling back to disabling crypto. Oddly enough, when I was testing handshakes in Tr3 I added this comment:
// BUGBUG why would this ever happen?
/* maybe de-encrypt our connection */
if (crypto_select == CRYPTO_PROVIDE_PLAINTEXT)
Which means I thought this was buggy or questionable in Tr3 and should likely be dropped in Tr4. I don't understand logically why we would drop down to PLAINTEXT at this point in the negotiation. We should look at this more closely.
There are two main concerns (maybe four): 1) Tr3 was able to commit a write in ReadIA that is missing in your new read_ia. It looks like you were trying to fold all the "payload" stuff together instead of the bizarre way the Tr3 treating encrypted-payload and plaintext-payload as two separate paths. From end of read_ia in Tr3:
/* send it out */
tr_peerIoWriteBuf(handshake->io, outbuf, false);
evbuffer_free(outbuf);
2) Why was this a problem for Tr4 <-> Tr4 but not other clients? Doesn't make sense to me. The VC was never written by Tr4 mainline, so how does it get past this event?
3) If you want to preserve the Tr3 logic, then the write() should happen after you disable encrypt?
4) There is an off-by-one error somewhere in read_pad_a and read_vc, in the resync logic. Both of those functions are called twice, eg:
[2024-06-03 19:19:52.087] trc handshake XXXX:59938 handling can_read; state is [awaiting yb] (handshake.cc:608)
[2024-06-03 19:19:52.087] trc handshake XXXX:59938 in read_yb... need 96, have 587 (handshake.cc:68)
[2024-06-03 19:19:52.087] trc handshake XXXX:59938 in read_vc... need 8, read 484, have 7 (handshake.cc:166)
[2024-06-03 19:19:52.087] trc XXXX:59938 enabling ready-to-write polling (peer-io.cc:521)
[2024-06-03 19:19:52.087] trc XXXX:59938 libevent says this peer socket is ready for writing (peer-io.cc:359)
[2024-06-03 19:19:52.087] trc XXXX:59938 libevent says this peer socket is ready for reading (peer-io.cc:484)
[2024-06-03 19:19:52.087] trc XXXX:59938 enabling ready-to-read polling (peer-io.cc:509)
[2024-06-03 19:19:52.087] trc handshake XXXX:59938 handling can_read; state is [awaiting vc] (handshake.cc:608)
[2024-06-03 19:19:52.087] trc handshake XXXX:59938 found ENCRYPT(VC)! (handshake.cc:174)
Same happens with read_pad_a on the INCOMING connection. Maybe pre- vs post-increment in the resync loop?
Some more thoughts on this resync problem.
The MSE spec says:
The handshake is seperated into 5 blocking steps.
1 A->B: Diffie Hellman Ya, PadA
2 B->A: Diffie Hellman Yb, PadB
3 A->B: HASH('req1', S), HASH('req2', SKEY) xor HASH('req3', S), ENCRYPT(VC, crypto_provide, len(PadC), PadC, len(IA)), ENCRYPT(IA)
4 B->A: ENCRYPT(VC, crypto_select, len(padD), padD), ENCRYPT2(Payload Stream)
5 A->B: ENCRYPT2(Payload Stream)
This is logically true but programmatically false. Really it should be:
1 A->B: Diffie Hellman Ya
2 B->A: Diffie Hellman Yb, PadB, ENCRYPT(VC)
3 A->B: PadA, HASH('req1', S), HASH('req2', SKEY) xor HASH('req3', S), ENCRYPT(VC, crypto_provide, len(PadC), PadC, len(IA)), ENCRYPT(IA)
4 B->A: ENCRYPT(crypto_select, len(padD), padD), ENCRYPT2(Payload Stream)
5 A->B: ENCRYPT2(Payload Stream)
In step (2), client B cannot do anything with the PadA until it receives the first HASH in step (3). So there is no reason to send PadA with Ya. Client A should just wait to send PadA until it has obtained Yb and can form HASH('req1').
Slightly different for client B, which should send the resync info ENCRYPT(VC) with the PadB.
I recognize that you open up to simple traffic blocking & shaping if the opening packet is always 96 bytes, hence PadA is always sent with Ya. HOWEVER, in Tr4 code, at a minimum we should alter read_ya and read_yb to both exit with READ_LATER instead of READ_NOW, since there is nothing to do until we received the second packet with resync bytes. I've made this change locally and it now avoids the double-call into read_vc or read_pad_a:
[2024-06-03 21:26:08.558] trc handshake XXXX:57005 handling can_read; state is [awaiting ya] (handshake.cc:619)
[2024-06-03 21:26:08.558] trc handshake XXXX:57005 in read_ya... need 96, have 266 (handshake.cc:387)
[2024-06-03 21:26:08.558] trc handshake XXXX:57005 sending B->A: Diffie Hellman Yb, PadB (handshake.cc:412)
[2024-06-03 21:26:09.560] trc XXXX:57005 enabling ready-to-write polling (peer-io.cc:521)
[2024-06-03 21:26:09.560] trc XXXX:57005 libevent says this peer socket is ready for writing (peer-io.cc:359)
[2024-06-03 21:26:09.560] trc XXXX:57005 libevent says this peer socket is ready for reading (peer-io.cc:484)
[2024-06-03 21:26:09.560] trc XXXX:57005 enabling ready-to-read polling (peer-io.cc:509)
[2024-06-03 21:26:09.560] trc handshake XXXX:57005 handling can_read; state is [awaiting pad a] (handshake.cc:619)
[2024-06-03 21:26:09.560] trc handshake XXXX:57005 found HASH('req1', S)! (handshake.cc:445)
To respond to your main concerns:
read_ia()
, we no longer send anything after disabling encryption, so commiting a write once prior will be sufficient.I was gonna say there's nothing to worry about here, then I realised the things you wrote in the next comment superscedes your point here.
I agree that in the current code, returning READ_LATER
in read_ya()
and read_yb()
is probably safe and avoids some extra function calls, but it's too much fine-tuning for my comfort, and for such little gain.
I'd rather keep it consistent with the other functions, and not having to risk forgetting its reasonings in the future, make some changes that break it, then end up spending hours figuring out why read_ya()
and read_yb()
differs from the other functions, just to find out it is just a minor optimisation.
I don't understand logically why we would drop down to PLAINTEXT at this point in the negotiation. We should look at this more closely.
Well, it rarely happens in practice, and I don't know why any client would want to do this either. In any case, this is part of the MSE spec, and there isn't a good reason not to support it as far as I'm concerned, so we support it.
Personally I don't understand why we don't filter these as OUTGOING connection before they ever get generated.
Because you can't rule out the possibility that after port forwarding, another client is exposed to the public Internet using the port you are listening on.
What is the issue?
Have been testing various loopback scenarios and ran into this, though it repros when connecting across different machines.
4.0.x does not seem to have this problem. But, mainline @ a18fca59 has the problem, though not sure when the behavior began.
mainline will connect to a variety of other clients (UT, LT, etc) without problems. But it cannot connect to itself when encryption is Required. It seems to get thru most of the handshake but then drops the connection. Tr4<->Tr4 works fine when connection is unencrypted.
Note: the log shows "4.0.5" but this really is mainline @ a18fca59, the version number was forced to 4.0.5 during build.
Also Note: I had to comment out several tr_logAddTrace events due to segfaults. Seems like (io)->display_name() causes havoc in a lot of places, assume that (io) is being referenced after release? I'll file separate bug for this.
Which application of Transmission?
transmission-daemon
Which version of Transmission?
4.1