threefoldtech / mycelium

End-2-end encrypted IPv6 overlay network
Apache License 2.0
34 stars 11 forks source link

Implement TUN offloads #141

Open LeeSmet opened 9 months ago

LeeSmet commented 9 months ago

Performance profiles show that the biggest amount of time is currently spent in 3 places:

One option which could improve the situation is #102, since larger packets naturally mean less syscalls (in the case of a tcp stream in the overlay). The problem there, is that larger packets will need to be fragmented if the lower layer link has a smaller MTU (which is the reason why MTU is currently set at 1400). While we currently only use stream based connections, keeping individual packets at MTU 1400 leaves the door for UDP at some point open (plain UDP that is).

The proper way to instead handle this would be to enable TSO (and USO and GRO, while we are at it). Unfortunately not a lot of info is readily available about this. In a first stage, we'll limit this to linux. From what I did manage to find so far:

iwanbk commented 7 months ago

Performance profiles show that the biggest amount of time is currently spent in 3 places:

@LeeSmet

curious, how you did the profiling?

LeeSmet commented 7 months ago

In my global cargo config I have a section which specifies a profiling profile, which just adds debug symbols to the configured release profile of the project:

[profile.profiling]
inherits = "release"
debug = true

Then I build with cargo build --profile profiling. This binary is then run with samply (sudo -E samply record ./target/profiling/mycelium {args}). The resulting profile can then be inspected (uses firefox tracing UI by default) to see where the application spends its time)

iwanbk commented 7 months ago

nice 👍

iwanbk commented 3 days ago

continuing discussion on https://github.com/threefoldtech/mycelium/issues/459#issuecomment-2497453463

The TUN offload issue has been parked as its purely an optimization for the send/receive nodes. It can be done in the future once the system is completely stable.

Wdyt about implementing it now @LeeSmet ? I think we already have performance/scalability issue.

From quick search, to enable GRO we only need to enable the flag using ethtool, CMIIW