twitchyliquid64 / subnet

Simple, auditable & elegant VPN, built with TLS mutual authentication and TUN.
MIT License
1.06k stars 79 forks source link

Small review of code and propositions #1

Closed kanocz closed 6 years ago

kanocz commented 7 years ago

I've looked at code to find source of "Fix throughput issues - 5% of normal connection speed. Latency is good though." and found next issues:

1) using gob encoder for transferring packed is big overhead - it's possible just to send uint16 for size and then just []byte of packet - in case of gob encode you have each time allocated new memory + processing (the same for decoding) (if you really need to pack it to some structure and so on on you can choose cheaper encoding, look at https://github.com/alecthomas/go_serialization_benchmarks) 2) using channels is very expensive in this case: you need to allocate memory for each packet + internal it's mutex lock/unlock - if you'll use pre-allocated buffer it's no additional memory operations + just write to stream - on client this may be just on thread, on server cheaper solution will be to use mutex - lock before write, write and unlock 3) in case if you really want to use channels you can use unbuffered ones - you'll still has linux kernel network buffers, but if your application is unable to process so much packets as you're receiving it's better to drop them on kernel level, so connection will slow-down but no additional packet lost created (maybe I'm not right, but from experience this way is better - creating one more buffer is additional cpu usage)

twitchyliquid64 commented 7 years ago

Thanks, sorry for the delay. I'll get onto this.

  1. Will implement. I'll also use a memory pool.
  2. Are channels really that expensive? my understanding is that on arm and amd64 they use atomic operations for locking so are very fast as long as your throughput is <10k/s'ish.
  3. Agreed - will turn down my buffers.
kanocz commented 7 years ago
  1. look at src/runtime/chan.go - internal structure hchan has member lock of type mutex :) and if you'll look at chansend function in the same file you'll see that it's actually used (it locks before sending or adding to queue)... so yes, channels are expensive for critical operations and can't be atomic by design
twitchyliquid64 commented 6 years ago

Thanks for your analysis. In the end, a buffer corruption bug caused by a race was the cause of the slowness. Incoming packets overwrote the buffer before it was transmitted, leading to dropped packets.

Serves me right for trying to be clever and minimising allocs.

Full performance is now achieved as expected in regards to link speed.