Closed ilyalesokhin closed 8 years ago
The original idea was to decrypt asynchronously to user space and provide data with cost of a data copy (not to wait for Crypto API decryption).
I guess the optimal solution depends on the use case. But I think we should consider removing the rx work on the following ground:
I guess the optimal solution depends on the use case.
Yes, exactly. In the worst case, if user space hits worker, there will be additional copy involved.
But I think we should consider removing the rx work on the following ground:
- The CPU usage of the decryption is not attributed to the correct process.
- If the user application is waiting in tls_recv message this is clearly a loss as it adds more data shuffling and context switches. In normal TCP proccessing, the kernel doesn't linearize incoming SKB asynchronously to speed up later tcp_recvmsg calls.
Sounds reasonable, I'm not a kernel dev though. The implementation would be simplified as well.
I think OP makes a great point. If we really only cared about throughput then the async worker is a good way to make use of idle cores, but this won't scale to large number of connections.
Patch for this #73. Dave will discuss this issue at netdev
Let's pull this in for now
Can you please explain why the rx async work is needed? Why can't we read the data and decrypt it inside tls_recvmsg from process context?
I'm guessing the concern is that we will deadlock if there is not enough room for 1 record in the TCP socket. But can't we just enforce SO_RCVBUF > 16KB?