n0-computer / sendme

A tool to send files and directories, based on iroh
Other
101 stars 6 forks source link

ERROR iroh_quinn_udp::imp: got transmit error, halting segmentation offload #45

Open jerzydziewierz opened 3 months ago

jerzydziewierz commented 3 months ago
2024-08-10T06:19:59.120474Z ERROR iroh_quinn_udp::imp: got transmit error, halting segmentation offload

I provide a server that will generate this file for you:

sendme receive blobacyeboyfhlszybnlqliyhvlbgkdnt5teybpm27yrpj6xahetgpbluajdnb2hi4dthixs6zlvo4ys2mjoojswyylzfzuxe33ifzxgk5dxn5zgwlrpaiafp3khwt2wqagavac6fp55ama6gzgbecgcklcx4ddcnyhuh5lf53uak4hgp754aquyz5slaqwlakq

this is a 1GB file that fails to download from the server,

the server says 2024-08-10T06:24:36.963867Z ERROR iroh_quinn_udp::imp: got transmit error, halting segmentation offload


qlm@qlmfb1:~/git/tmp$ sendme --version
sendme 0.13.0

qlm@qlmfb1:~/git/tmp$ uname --all
Linux qlmfb1 5.15.0-116-generic #126-Ubuntu SMP Mon Jul 1 10:14:24 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

qlm@qlmfb1:~/git/tmp$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.4 LTS
Release:        22.04
Codename:       jammy

looking at https://github.com/quinn-rs/quinn/blob/c0b1e281e3167d4f4c8496082cb1117c4a270ad8/quinn-udp/src/unix.rs#L288

and

https://www.ibm.com/docs/en/linux-on-systems?topic=offload-tcp-segmentation

I discover:


qlm@qlmfb1:~/git/tmp$ sudo ethtool -K enp0s3 tx on sg on tso on
Actual changes:
tx-scatter-gather: off [requested on]
tx-checksum-ipv4: off [requested on]
tx-checksum-ip-generic: off [requested on]
tx-checksum-ipv6: off [requested on]
tx-scatter-gather-fraglist: off [requested on]
tx-tcp-segmentation: off [requested on]
tx-tcp-ecn-segmentation: off [requested on]
tx-tcp-mangleid-segmentation: off [requested on]
tx-tcp6-segmentation: off [requested on]
tx-checksum-fcoe-crc: off [requested on]
tx-checksum-sctp: off [requested on]
Could not change any device features

and

qlm@qlmfb1:~/git/tmp$ sudo ethtool --offload enp0s3  rx on  tx on
Actual changes:
tx-checksum-ipv4: off [requested on]
tx-checksum-ip-generic: off [requested on]
tx-checksum-ipv6: off [requested on]
tx-checksum-fcoe-crc: off [requested on]
tx-checksum-sctp: off [requested on]
rx-checksum: off [requested on]
Could not change any device features

In other words, this platform does not support tcp segmentation, is there a way to get sendme working in such case?

matheus23 commented 3 months ago

I suspect there's another underlying issue and missing segmentation offload support is a red herring. As the code you linked shows - if there's any error with socket sending, quinn will report this error and turns off segmentation offload (the state.max_gso_segments.store(1, ...) call). I wonder if there's a way to get a better error message from quinn?

matheus23 commented 3 months ago

Can you try running this again with the matheus23/quinn11 branch?

jerzydziewierz commented 3 months ago

Yes, going to try today

{{ placeholder for a result update }}

jerzydziewierz commented 3 months ago

@matheus23 no luck.

with the debug build,

from git log

commit d7daa05ad1ef3a5a51c89fb9c1915d8ffc77556b (HEAD -> matheus23/quinn11, origin/matheus23/quinn11)
Author: Philipp Krüger <---->
Date:   Mon Aug 12 10:13:44 2024 +0200

    chore(deps): Depend on quinn 0.11, instead of quinn 0.10

on the server, I get

qlm@qlmfb1:~/git/tmp/sendme/target/debug$ ./sendme send ~/git/tmp/random_file2
imported file /home/qlm/git/tmp/random_file2, 1.00 GiB, hash c18c2fdf6dcab3d4d1b07aa4cedc5ef05a05a1f1ea397a3f3a43fb3b8cda6743
to get this data, use
sendme receive blobabryt2qukwi7mqmqvfsjvhdbtwsddu6jw2hkyqboeio3rqmnaknziajdnb2hi4dthixs6zlvo4ys2mjoojswyylzfzuxe33ifzxgk5dxn5zgwlrpaiafp3khwsswaagavac6f66gama4ddbp35w4vm6u2gyhvjgo3rppawqfuhy6uol2h45eh6z3rtngoqy
⠁ [00:00:00] 139814540399008 transfer completed 1371 0 seconds                                                                                                                                                                          2024-08-13T21:57:25.948898Z ERROR iroh_quinn_udp::imp: got transmit error, halting segmentation offload

note that at first it prints "got connection" but then that gets deleted from the console

I am leaving the server open for you to inspect

with

sendme receive blobaazo7yxip2x57ratut5b7goyv7svglhobqozetdlet5npjc63rgt2ajdnb2hi4dthixs6zlvo4ys2mjoojswyylzfzuxe33ifzxgk5dxn5zgwlrpaiafp3khwtuh2agavac6f3gaama4ddbp35w4vm6u2gyhvjgo3rppawqfuhy6uol2h45eh6z3rtngoqy

additional debug info:

This is from a VM "virtualbox" running Ubuntu22, on a host being Ubuntu22; the NIC is an USB-attached NIC (not a built-in NIC)

This is somewhat sad, as I love the NAT traversal capability when it works....

jerzydziewierz commented 3 months ago

by the way, on another machine (but not that target machine that is a server) I get

GUI |/home/mib07150/git/zfs/conda-envs/py310qlm|~/git/zfs/git/private/from-source/sendme matheus23/quinn11   ⧭ 
𝄞 cargo update
    Updating git repository `https://github.com/n0-computer/iroh.git`
    Updating git repository `https://github.com/n0-computer/quinn`
    Updating git repository `https://github.com/n0-computer/quic-rpc`
    Updating git repository `https://github.com/n0-computer/tokio-rustls-acme`
error: failed to load source for dependency `tokio-rustls-acme`

Caused by:
  Unable to update https://github.com/n0-computer/tokio-rustls-acme?branch=rustls-23

Caused by:
  failed to find branch `rustls-23`

Caused by:
  cannot locate remote-tracking branch 'origin/rustls-23'; class=Reference (4); code=NotFound (-3)

and indeed, there is no such branch in tokio-rustls-acme repo

but somehow this error does not occur on the target VM . . . strange

matheus23 commented 3 months ago

Yeah that branch has been deleted since. You probably had it cached somehow still. We can just depend on the released version now.

Hm okay. Thanks for the reports, I'm diagnosing a fairly similar issue on windows at the moment. Curious that this occurs on Linux for you.

If you don't mind, could you clone the iroh repository, then cd iroh-net/bench && cargo run -- quinn and tell me if the first benchmark there is successful on your server with your NIC?

jerzydziewierz commented 3 months ago
qlm@qlmfb1:~/git/STL/private/from-source/iroh/iroh-net/bench$ cargo run quinn
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.35s
     Running `/home/qlm/git/STL/private/from-source/iroh/target/debug/bulk quinn`

Client 0 stats:
Connect time: 18.316202ms
Overall download stats:

Transferred 1073741824 bytes on 1 streams in 11.91s (85.96 MiB/s)

Time to first byte (TTFB): 7.402ms

Total chunks: 37707

Average chunk time: 315.52ms

Average chunk size: 27.80KiB

Stream download metrics:

      │  Throughput   │ Duration 
──────┼───────────────┼──────────
 AVG  │   86.09 MiB/s │    11.89s
 P0   │   86.06 MiB/s │    11.89s
 P10  │   86.12 MiB/s │    11.90s
 P50  │   86.12 MiB/s │    11.90s
 P90  │   86.12 MiB/s │    11.90s
 P100 │   86.12 MiB/s │    11.90s
accepting stream failed: TimedOut
qlm@qlmfb1:~/git/STL/private/from-source/iroh/iroh-net/bench$ 

is this what you need?

jerzydziewierz commented 3 months ago

a bit more for you:

qlm@qlmfb1:~/git/STL/private/from-source/iroh/target/debug$ ./bulk quinn

Client 0 stats:
Connect time: 19.837755ms
Overall download stats:

Transferred 1073741824 bytes on 1 streams in 12.14s (84.37 MiB/s)

Time to first byte (TTFB): 8.042ms

Total chunks: 37773

Average chunk time: 320.896ms

Average chunk size: 27.76KiB

Stream download metrics:

      │  Throughput   │ Duration 
──────┼───────────────┼──────────
 AVG  │   84.47 MiB/s │    12.12s
 P0   │   84.44 MiB/s │    12.12s
 P10  │   84.50 MiB/s │    12.13s
 P50  │   84.50 MiB/s │    12.13s
 P90  │   84.50 MiB/s │    12.13s
 P100 │   84.50 MiB/s │    12.13s
accepting stream failed: TimedOut
qlm@qlmfb1:~/git/STL/private/from-source/iroh/target/debug$ ./bulk iroh

Client 0 stats:
Connect time: 21.952676ms
Overall download stats:

Transferred 1073741824 bytes on 1 streams in 15.13s (67.67 MiB/s)

Time to first byte (TTFB): 7354.368ms

Total chunks: 46126

Average chunk time: 327.808ms

Average chunk size: 22.73KiB

Stream download metrics:

      │  Throughput   │ Duration 
──────┼───────────────┼──────────
 AVG  │   67.72 MiB/s │    15.12s
 P0   │   67.69 MiB/s │    15.11s
 P10  │   67.75 MiB/s │    15.12s
 P50  │   67.75 MiB/s │    15.12s
 P90  │   67.75 MiB/s │    15.12s
 P100 │   67.75 MiB/s │    15.12s
qlm@qlmfb1:~/git/STL/private/from-source/iroh/target/debug$ 
matheus23 commented 3 months ago

Yeah thank you, that helps!

With the original error:

2024-08-10T06:19:59.120474Z ERROR iroh_quinn_udp::imp: got transmit error, halting segmentation offload

Can you run this again with RUST_LOG=warn at least? There's a good chance there's a warning log with a message that contains "sendmsg error", if possible, I'd like to see the full line of that msg.