ktls / af_ktls

Linux Kernel TLS/DTLS Module
GNU General Public License v2.0
157 stars 25 forks source link

Replaced kernel_sendmsg with kernel_sendpage #67

Closed lancerchao closed 8 years ago

lancerchao commented 8 years ago

Allocates pages on the fly and sends tls records in one call to kernel_sendpage Still needs to fix #65

Similar to https://github.com/ktls/af_ktls/commit/0fbc8d145d5f12490e1a51627b9cef7a3e6b8b6b but with 3 improvements:

  1. Pages are allocated on the fly
  2. Sends data in one call to kernel_sendpage, thereby not dealing with the situation where the first send succeeded but second failed, resulting in partial data in the tx queue. Also not vulnerable to the situation where tsk->header or tsk->tag crosses page boundaries
  3. Not vulnerable to the bug where quick sends overwrites tx queue, since different pages are used with each call to kernel_sendpage

A weakness with this implementation is that since a max record requires 4 pages, including the header and tag would mean that it would require 8 pages to send 4 pages' worth of payload (since alloc_pages always allocate order of 2 pages). If this gets merged, I would open an issue about that.

Also missing MSG_MORE and other flag support

djwatson commented 8 years ago

Code lgtm - I'm concerned about perf though - trying to allocate 8 pages is hard. Run some benchmarks and see how much effect this has vs. 3 separate sendpages for header / data / tail

fridex commented 8 years ago

Looks good to me as well. I think we should wait for benchmarks before merging (?).

lancerchao commented 8 years ago

I'm concerned about perf though - trying to allocate 8 pages is hard. Run some benchmarks and see how much effect this has vs. 3 separate sendpages for header / data / tail

True. Fridolin's original implementation did that, however I was concerned over two things: 1) In Fridolin's implementation did that - however I was concerned on what would happen for example if the header (which is part of tsk) happens to cross a page boundary. I don't think sk_alloc guarantees that sk is on a single (or compound) page. 2) I was also concerned with what would happen if sendpage(header) succeeded but sendpage(data) failed. When the user tries to send again its record would be appended to the previous header. We need to somehow pull the data back out if only part of a record was successfully sent.

fridex commented 8 years ago

1) In Fridolin's implementation did that - however I was concerned on what would happen for example if the header (which is part of tsk) happens to cross a page boundary. I don't think sk_alloc guarantees that sk is on a single (or compound) page.

We need to allocate pages for header/tag, since we are using kernel_sendpage, which queues pages. If we would use a buffer as in my original approach, the buffer gets overwritten in the sending queue if records were not sent. That's the main reason why I did not pushed my original patch. The fact that the buffer for header/tag crosses page boundary is yet another serious issue.

2) I was also concerned with what would happen if sendpage(header) succeeded but sendpage(data) failed. When the user tries to send again its record would be appended to the previous header. We need to somehow pull the data back out if only part of a record was successfully sent.

I'm not strict with this one. It could be an error, which user space needs to handle, but having it in one call is nice.

djwatson commented 8 years ago

2) I was also concerned with what would happen if sendpage(header) succeeded but sendpage(data) failed. When the user tries to send again its record would be appended to the previous header. We need to somehow pull the data back out if only part of a record was successfully sent.

That's not actually how kernel_sendpage ... or sendmsg or send or write ... works. It's always the case that half the data can be sent, the return value tells you how much data was sent (at least in TCP). At least with MSG_DONTWAIT. There was some discussion about trying to probe if there is enough buffer space in the TCP socket, but I'm unsure if that would work with how sk_write_space is called

I think the only reasonable way to handle this is to hook in to sk_write_space, and write more data when we get the callback.

Closing this, needs a rewrite.