Open LeeSmet opened 9 months ago
Performance profiles show that the biggest amount of time is currently spent in 3 places:
@LeeSmet
curious, how you did the profiling?
In my global cargo config I have a section which specifies a profiling
profile, which just adds debug symbols to the configured release profile of the project:
[profile.profiling]
inherits = "release"
debug = true
Then I build with cargo build --profile profiling
. This binary is then run with samply (sudo -E samply record ./target/profiling/mycelium {args}
). The resulting profile can then be inspected (uses firefox tracing UI by default) to see where the application spends its time)
nice 👍
continuing discussion on https://github.com/threefoldtech/mycelium/issues/459#issuecomment-2497453463
The TUN offload issue has been parked as its purely an optimization for the send/receive nodes. It can be done in the future once the system is completely stable.
Wdyt about implementing it now @LeeSmet ? I think we already have performance/scalability issue.
From quick search, to enable GRO we only need to enable the flag using ethtool, CMIIW
Performance profiles show that the biggest amount of time is currently spent in 3 places:
One option which could improve the situation is #102, since larger packets naturally mean less syscalls (in the case of a tcp stream in the overlay). The problem there, is that larger packets will need to be fragmented if the lower layer link has a smaller MTU (which is the reason why MTU is currently set at 1400). While we currently only use stream based connections, keeping individual packets at MTU 1400 leaves the door for UDP at some point open (plain UDP that is).
The proper way to instead handle this would be to enable TSO (and USO and GRO, while we are at it). Unfortunately not a lot of info is readily available about this. In a first stage, we'll limit this to linux. From what I did manage to find so far: