Open nitely opened 1 month ago
~current speed is 10MB/s~
add
copy (search XXX x_x
) by memCopy speeds up to 150MB/s~ memCopy speeds up to 300MB/s, but ended up just improving the copy so it gets to 200MB/s without unsafe code.Replacing setLen by setLenUninit seems to help ORC.
profiling showed SSL stuff as the next bottleneck. I tried removing all SSL related code, and the next bottleneck is the add copy, I changed that to use moveMem that speeds up to 500MB/s.
8eeb60d5bed95ebb8aa2f3dd672c8f64cdbc6ffa
So, I think the SSL wrapper needs to be improved to allocate less and go from there.
I tried yasync and got a 2x speed up when running h2load with a high number of streams -- unrelated to single data transfer speed.
changes are here https://github.com/nitely/nim-hyperx/compare/master...futurevar
load test command ./h2load -n100000 -c10 -m1000 -t2 https://127.0.0.1:4443
I got asyncdispatch on ORC to reach 430MB/s by increasing the socket buffer from 8KB to 64KB, ~and reusing the BIO buffer (but reusing bio buffer cannot be done safely).~
On refc it reaches +500MB/s. On yasync + ORC it reaches +700MB/s.
https://github.com/nitely/nim-hyperx/compare/asyncsockssl?expand=1 https://github.com/nitely/nim-hyperx/compare/experiment?expand=1
I haven't check how much it affects latency on many streams. But even a 16KB buffer gives a ~2x improvement.
I removed the queue for writes, and the send lock. Checking the asyncdispatch code it seems safe to make concurrent send calls, and at least POSIX allows it at the OS level. It's 2-3x faster on h2load bench for every load I tried. This won't improve single stream data transfer, though.
A queue may block user code less as long as it's not full, but thinking about it, if the user code running between sends is really fast it'll eventually fill the queue and block, and if it's too slow then it does not matter much. The more streams the more likely the queue gets full. There may be a bench that shows code that runs exactly the right amount for a queue to help, but it seems artificial.
I added flow-control for streams back (#12), so data transfer is back to 10MB/s. Increasing the window size to 256KB seems to increase it to ~200MB/s. But cannot do that until flow-control for the connection is implemented.
fix enough bottlenecks for async dispatch to be a bottleneck.
Start server with profiler
then send data to a single stream