TCP (not TLS) RTT time report not measuring the RTT.

traviscross / mtr

Official repository for mtr, a network diagnostic tool

http://www.bitwizard.nl/mtr/

GNU General Public License v2.0

2.6k stars 334 forks source link

TCP (not TLS) RTT time report not measuring the RTT. #505

Open cveto opened 3 months ago

cveto commented 3 months ago

Using command: mtr --tcp --report -c 10 --port <port> <server>

I was expecting to get the time between the initial SYNC packet and the time when "SYNC+ACK" returns.

But looking with Wireshark, I get a time that included a few extra packages going back and forth.

Any way to get what I want? Trying to establish the difference between a RTT for TLS and RTT for non-TLS to see why server are taking this extra time to process.

rewolff commented 3 months ago

I would expect that most servers even when "saturated" will still return ping packets and probably also "SYNC+ACK" packets close to immediately. Sure there is an interrupt latency and that sort of stuff, but I exect that to be negligible compared to common networking delays.

The TCP test case should measure how long it takes for the app to return from the "connect" call. That's what it is meant to measure. If that requires multiple back-and-forth, then so be it, and that's what should be measured.

For a non TLS connection, I'd expect SEND: SYN RCV: SYN+ACK SEND: <SYN+ACK> ... so you would see the second packet going from your mtr-running machine going to the other machine, but that time would not be counted.

I studied networking in detail when TLS didn't quite exist yet. And never had the need to investigate. So I know it exists, but I don't know the details of how it works. To securely establish a transaction key, indeed more than one RTT is required, and I think that should count. Use ping or non-TLS connections to test the RTT.

To verify that it makes sense to measure the time the server takes to prepare the SYN+ACK packet, I'd like you to do a manual measurement of that metric. You have mtr, normal ping packets and wireshark as tools.

cveto commented 3 months ago

Appreciate the fast response, the title has a mistake. Should be TCP, not TLS, please ignore anything about TLS.

Running the command mtr --tcp --report -c 1 --port <port> <server> , the statistics provide "last, avg, best worse, stdev", and in this scenario they are all the same of course. The results make sense to me when using ICMP.

But looking at what mtr is measuring when --tcp attribute is used with the above command.

There are 20 packets captured by Wireshark
mtr produces 3 SYN calls to the server (from 3 different ports)
The time provided by mrt is not the time between SYN and the frist received SYN+ACK
The time provided by mrt seems to be the time between the first SYN is sent and to the server, and when the server responds with SYN+ACK to the third SYN packate.

TCP handshake needs 3 packets, mtr produces about 20. I do not understand what mtr is actually measuring. Result is 3 magnitudes larger than what I would expect as a result of a "TCP ping".

Is this a bug or by design?

matt-kimball commented 3 months ago

I think you'll find in your Wireshark trace that each connection port has a differing TTL. mtr will attempt to connect from multiple source ports so that it can have a connection attempt expire at each hop along the trace toward the destination.

mtr doesn't look at the raw TCP header - it just makes a connect() system call and measures the time for it to complete: https://github.com/traviscross/mtr/blob/master/packet/construct_unix.c#L524-L525

It some ways it would be nicer if it operated at a lower level, inspecting the TCP flags itself, though that would mean that the code would be much more involved and surely a higher maintenance burden. Instead the code was implemented in a simpler way where the kernel's TCP implementation is used, and the measurement is the time it takes for an asynchronous connect system call to complete.

rewolff commented 3 months ago

Put it that way... Certainly sounds like a bug. :-(

One of the things that is "difficult" is to know when to stop listening for replies. So it is very possible that mtr runs for 3 seconds (waiting for possible replies) even though it got a quick reply on the first probe. But due to the way it is coded, it continues to send out probes during that time.

Hmm. OK. I just tested... mtr -nr -c 1 8.8.8.8 This sends out 12 packets and gets back 12 responses (from my location). It of course probes first if it can reach 8.8.8.8 with a probe with TTL=1. That gets a response from my local router: TTL expired. This continues until; TTL =11, Then we don't get TTL expired from a router, but a ping reply from 8.8.8.8. By that time the probe for "level 12" is already scheduled, and gets sent out. And again 8.8.8.8 replies to the ping.

When using TCP, mtr will issue a connect to the destination host, and this triggers the kernel to do "its thing". It seems that it is possible to set options like ttl on the socket so that the kernel uses that. But the kernel will then not get a valid response (SYN+ACK) from the destination and issue retries.

It is debatable if the kernel should be doing that if it's seen the "host unreachable, TTL exceeded". Shouldn't the whole connection be dropped if that happens? On the other hand, with mtr specifically listening for these ICMP messages, maybe they do not get processed normally by the kernel.

Anyway..... I tested mtr with tcp probes to 8.8.8.8 and what I see is that the reported times are in line with the expected network delays:

(So far I was testing to 8.8.8.8: google's public DNS server. But this test is done with 142.251.36.36 as the target: One of the addresses that www.google.com resolves to for me. Anyway, there should be something running on port 80 on 142.251.36.36...)

  6.|-- 213.51.5.38                0.0%     1   13.3  13.3  13.3  13.3   0.0
  7.|-- 213.51.64.186              0.0%     1   32.0  32.0  32.0  32.0   0.0
  8.|-- 213.46.182.22              0.0%     1   14.2  14.2  14.2  14.2   0.0
  9.|-- 209.85.242.197             0.0%     1   16.8  16.8  16.8  16.8   0.0
 10.|-- 142.251.66.239             0.0%     1   12.6  12.6  12.6  12.6   0.0
 11.|-- 142.251.36.36              0.0%     1   12.0  12.0  12.0  12.0   0.0

TLDR: Cannot reproduce, works as intended here.

matt-kimball commented 3 months ago

If @cveto really only cares about the RTT to his intended destination to compare results with ICMP vs TCP, perhaps mtr-packet could be used directly, like so:

mkimball@Alienware:~$ nslookup github.com
Server:         172.31.32.1
Address:        172.31.32.1#53

Non-authoritative answer:
Name:   github.com
Address: 140.82.114.4

mkimball@Alienware:~$ echo '1 send-probe ip-4 140.82.114.4 protocol icmp' | mtr-packet
1 reply ip-4 140.82.114.4 round-trip-time 73282
mkimball@Alienware:~$ echo '1 send-probe ip-4 140.82.114.4 protocol tcp port 443' | mtr-packet
1 reply ip-4 140.82.114.4 round-trip-time 74496
mkimball@Alienware:~$

That won't send a bunch of extra packets - it won't check every hop, only the final destination round trip time. Note that mtr-packet won't do DNS lookups for you, so you've got to do that separately, as I've done in the example above. (The round-trip times are measured in microseconds, not milliseconds - so they look reasonable to me.)

rewolff commented 3 months ago

You should be able to get such behaviour from the commandline by setting min-ttl and max-ttl to values that are very close (not sure if they need to differ by 0 or 1) and sufficiently large.

OK. Tested. Difference needs to be 0. So it'd be: mtr -c 1 -r -f 32 -m 32 <DEST>

cveto commented 3 months ago

I see. So I'd have to first determine the number of hops. That I can achieve, the network I am using is predictable. Right now I am working in the local network, so I don't have to solve that yet.

I will have to update the mtr to 0.95 (running on 0.92 at the moment) to get the -r -m functionality.

@matt-kimball , when you say "destination round trip time", I will have to check what time is reported back. Below the Wireshark capture of the echo '1 send-probe ip-4 10.2.0.9 protocol tcp port 5513' | mtr-packet command. Hoping it will be a time closer to between packet 1 and packet 2, and not packet 1 and packet 6.


No.     Time           Source                Destination           Protocol Length Info
      1 0.000000       10.2.0.3              10.2.0.9        TCP      74     33000 → 5513 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM TSval=3479570715 TSecr=0 WS=128

No.     Time           Source                Destination           Protocol Length Info
      2 0.000324       10.2.0.9              10.2.0.3        TCP      66     5513 → 33000 [SYN, ACK] Seq=0 Ack=1 Win=29200 Len=0 MSS=1460 SACK_PERM WS=128

No.     Time           Source                Destination           Protocol Length Info
      3 0.000344       10.2.0.3              10.2.0.9        TCP      54     33000 → 5513 [ACK] Seq=1 Ack=1 Win=29312 Len=0

No.     Time           Source                Destination           Protocol Length Info
      4 0.000455       10.2.0.3              10.2.0.9        TCP      54     33000 → 5513 [FIN, ACK] Seq=1 Ack=1 Win=29312 Len=0

No.     Time           Source                Destination           Protocol Length Info
      5 0.001080       10.2.0.9              10.2.0.3        TCP      60     5513 → 33000 [ACK] Seq=1 Ack=2 Win=29312 Len=0

No.     Time           Source                Destination           Protocol Length Info
      6 0.001185       10.2.0.9              10.2.0.3        TCP      60     5513 → 33000 [RST, ACK] Seq=1 Ack=2 Win=29312 Len=0

matt-kimball commented 3 months ago

I would expect the time measured to be close to the time between packet 1 and packet 3, since this is when the initial TCP three-packet handshake is complete. And, indeed, when I measure locally, my mtr-packet is doing what I expect and reports a time quite close to that value - it is not measuring the time for the socket to be closed.

Are you seeing something different in your tests?

rewolff commented 3 months ago

So I'd have to first determine the number of hops

Nope. Just some "sufficiently large" number. As long as it is the actual number of hops or larger. 32 was considered "should be high enough in all cases" until a few years ago it wasn't. Most OSes upgraded to 64, Google is not taking any chances and using 128. (*) (that's wat I learned from this thread. :-) )

I'd say the time between packet 1 and 2. That's when the kernel should signal to mtr-packet: Your socket opeened successfully, and also "prepare for sending out packet 3". Those two things could run in parallel, but probably don't. The difference is only 20 microseconds, so not much.

Anyway. Matt and I are expecting a measurement of about 0.32 milliseconds (time from 1 to 2 or 3) for the time and not "1.1" milliseconds (time from 1-6).

(*) 128 is a bad choice. Sure it pays to be sure, but if say "10 hops out" a routing loop crops up. The intent was that a packet would then go back and forth say 20 about times. This has worsened to 50 times because everybody uses 60, but google packets end up going back and forth over the loop 110 times!

cveto commented 3 months ago

Thank you both for your time.

The environment where I was testing this with mtr 0.92 was returning me number 201, which corresponded to the the time between about 9 packages. Wireshark showed that ACK package actually rook about 0.2ms. Doing test with mtr 0.95 right now in my own lab, mtr result matches what wireshark is telling me nicely, even if not using the -r -fparameters. I have screenshots of my original test, and right now it doesn't make sense to me.

I am putting this to a test in the original environment with mtr 0.95 to see if I can replicate it. Will respond when done.

cveto commented 3 months ago

I can not reproduce my previous observations with mtr. I will continue to use it, but I have bundled it with tshark in a script in order to compare it with the TLS timestamps.

Results with using: mtr --tcp --report -c 100 --port 80 172.xxx.xxx.168

$ sudo bash script1.sh 100 172.xxx.xxx.168 80
----- MTR results -----
Results in miliseconds:
Start: 2024-04-09T08:28:21+0000
HOST: TEST_HOST                    Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 172.xxx.xxx.168             0.0%   100    0.6   0.6   0.3   0.8   0.1

----- tshark results -----
SYN packets sent (according to our instructions and mtr): 100
SYN_ACK packets received (according to tshark): 101

Results in seconds
avg 0.000417811  | max 0.000541046  | min 0.000212556

Results with using mtr --tcp -c 100 -r -f 32 -m 32 --port 80 172.xxx.xxx.168:

$ sudo bash script2.sh 100 172.xxx.xxx.168 80
----- MTR results -----
Results in miliseconds:
Start: 2024-04-09T08:33:11+0000
HOST: TEST_HOST                    Loss%   Snt   Last   Avg  Best  Wrst StDev
 30.|-- 172.xxx.xxx.168            0.0%   100    0.6   0.6   0.4   0.9   0.1

----- tshark results -----
SYN packets sent (according to our instructions and mtr): 100
SYN_ACK packets received (according to tshark): 101

Results in seconds
avg 0.000404305  | max 0.000613604  | min 0.000210085

The results differ, but to about the same relative difference. This was run from a VM, I will run it from a physical server next. The mtr test performed with the second command are much much faster.

yvs2014 commented 3 months ago

The mtr test performed with the second command are much much faster.

It's a feature connected with wait time calculation between packet sending (like mtr -f 32 127.1) A possible local fix for a bit older version https://github.com/traviscross/mtr/commit/f89c17078deda0d3bd841cd515da4b79b0c2cada