Open noughtmare opened 5 years ago
You seem to be correct. The flamegraph and that memory graph seem to be outdated.
If you are interested in profiling numbers I found some:
Cost Centre Module %Time %Alloc
receive.go Lib.Ixgbe 35.9 22.2
send.go Lib.Ixgbe 16.4 4.7
rxGetMapping Lib.Ixgbe.Queue 5.0 10.5
forward Lib.Ixgbe 4.7 3.7
rxMap Lib.Ixgbe.Queue 4.3 10.5
send.clean.cleanDescriptor Lib.Ixgbe 4.2 5.8
allocateBuf Lib.Memory 3.8 7.0
mkTxQueue.descriptor Lib.Ixgbe.Queue 3.1 2.4
txGetMapping Lib.Ixgbe.Queue 2.7 7.0
idToPtr Lib.Memory 2.4 4.7
freeBuf Lib.Memory 2.3 2.3
txMap Lib.Ixgbe.Queue 2.3 10.4
receive.go.next Lib.Ixgbe 1.4 2.3
rxqDescriptor Lib.Ixgbe.Queue 1.3 0.0
txqDescriptor Lib.Ixgbe.Queue 1.2 0.0
send.clean Lib.Ixgbe 1.0 0.9
mkRxQueue.descriptor Lib.Ixgbe.Queue 0.9 2.3
send.go.indexRef Lib.Ixgbe 0.3 2.3
Really sorry about the formatting. I'm on mobile and copied this from a LaTeX table lol.
Maybe adding inline pragmas (e.g. {-# INLINE rxGetMapping #-}
) for the functions like rxGetMapping
, rxMap
etc. can help with the performance.
You could also use unsafe indexing functions to speed up the send and receive functions.
(I am inspecting the generated core. I notice quite a few bounds checks)
I have implemented my suggestions (and some more things): https://github.com/ixy-languages/ixy.hs/pull/3.
The receive
function should be faster now.
I cannot 100% guarantee that it is still correct.
I see
Lib.Ixgbe.PackBuf
in the flame graph, but I can't find it in the code.