Reduce allocations and optimize out method calls during encryption

Biggest thing is trying to reduce how often the [] overload is called. It's called SO MUCH. Also Offset and Size secretly get called ALL THE TIME.

Biggest refactor here though is AesGcm.MultiplyGF128Elements. This method can call ByteSpan Get/Set item ~450k times per packet. Especially with that awful "SecureClear" implementation. I haven't load tested it yet, but this might actually shave milliseconds per packet sent/received.

On the allocations side, it's really just a bunch of temp buffers everywhere. Nothing huge, but maybe like 100 bytes per packet sorta stuff. The biggest one is probably PrfSha256.ExpandSecret which gets called fairly often and allocates a bunch of temp buffers. But that's mostly during the handshaking process, so it's probably not a huge deal.

This probably still isn't everything. This is just high targets during profiling and stuff I noticed.

Might need to edit this later, but early benchmarking suggests that this and the reduce-allocs branch improve DTLS RTT by 50%

willardf / Hazel-Networking

Reduce allocations and optimize out method calls during encryption #61