Copy data from multiple TLS records to userspace buffer

lancerchao commented 8 years ago

If server sends: abcde, then sends 12345, a client calling recv() with a read size of 10 would get "abcde" instead of "abcde12345"

nmav commented 8 years ago

I'm not sure whether that is an issue. If the data are send in different TLS records, then data would be received on the TLS record boundary. I believe that's very similar to the TCP handling.

fridex commented 8 years ago

From my POV, this is definitely not an issue. I'm not sure if this enhancement worth it since it can be complicated to implement if we want to keep consistency with bound socket.

Consider following scenario with two TLS records:

TLS record data size: Record1: 1000B Record2: 1000B

A user supplies a buffer of size S, where 1000B < S < 2000B: In this case we shouldn't cache partial data in the AF_KTLS socket, since the second record was not fully read (same as if user supplies buffer of size < 1000 when there is returned ENOMEM, no partial data are returned). We should simply propagate only Record1 - 1000B. If we would cache decrypted data switching from AF_KTLS to OpenSSL/GnuTLS would be inconsistent.
A user supplies a buffer of size S, where S > 2000: We could fill supplied buffer with payload of Record1 and Record2 and pop those TLS records from TCP socket. Personally, I don't see any benefits by doing so. Are there any benefits I am not aware of?

lancerchao commented 8 years ago

I'm not sure whether that is an issue. If the data are send in different TLS records, then data would be received on the TLS record boundary. I believe that's very similar to the TCP handling.

I don't quite follow. Can you explain a little more?

A user supplies a buffer of size S, where 1000B < S < 2000B: In this case we shouldn't cache partial data in the AF_KTLS socket, since the second record was not fully read (same as if user supplies buffer of size < 1000 when there is returned ENOMEM, no partial data are returned). We should simply propagate only Record1 - 1000B. If we would cache decrypted data switching from AF_KTLS to OpenSSL/GnuTLS would be inconsistent.

Do you mean that we will not support partial reads? In such a case, how would a client know how big the supplied buffer needs to be, without knowing the size of the TLS record that its about to receive?

My impression is that if the client opened the original socket with SOCK_STREAM, then it should expect that af_ktls socket operations bound to that socket could work exactly as a stream socket is expected to behave. In the case of a TCP socket, it should be able to read and write data of arbitrary length without worrying about record boundaries, record sizes, etc.

fridex commented 8 years ago

Do you mean that we will not support partial reads? In such a case, how would a client know how big the supplied buffer needs to be, without knowing the size of the TLS record that its about to receive?

Actually you know it - it will not exceed KTLS_MAX_PAYLOAD_SIZE.

My impression is that if the client opened the original socket with SOCK_STREAM, then it should expect that af_ktls socket operations bound to that socket could work exactly as a stream socket is expected to behave. In the case of a TCP socket, it should be able to read and write data of arbitrary length without worrying about record boundaries, record sizes, etc.

TCP can be fragmented; even there is not guaranteed that if you request N bytes, that N bytes are really received in RX buffer and available.

I understand your approach. The issue is with user space sync. I can imagine a getsockopt(2) that would propagate cache size to user space so user space can pick the data. The original approach was to pop records from receiving queue only if we know that they were fully read by user space.

djwatson commented 8 years ago

If the data are send in different TLS records, then data would be received on the TLS record boundary. I believe that's very similar to the TCP handling.

I don't think this is true? read() will happily read across TCP message boundaries.

TCP can be fragmented; even there is not guaranteed that if you request N bytes, that N bytes are really received in RX buffer and available.

Correct - a SOCK_STREAM read() where insufficient bytes are available will usually block waiting for them, unlike a SOCK_DATAGRAM.

Not supporting small partial reads means this will never be a drop-in replacement for existing code, and anyone using af_ktls would need to reimplement some sort of buffering mechanism on top of the socket - the application may be putting message boundaries on top of tcp, for example:

int size
read(fd, &size, 4); // or SSL_read
char buf[size];
read(fd, &buf, size); // or SSL_read

Note that OpenSSL (and I presume GnuTLS) does this buffering for you, so this would even be a change with respect to existing TLS implementaitons:

SSL_read() works based on the SSL/TLS records. The data are received in records (with a maximum record size of 16kB for SSLv3/TLSv1). Only when a record has been completely received, it can be processed (decryption and check of integrity). Therefore data that was not retrieved at the last call of SSL_read() can still be buffered inside the SSL layer and will be retrieved on the next call to SSL_read()

djwatson commented 8 years ago

Yea, it looks like gnutls_record_recv is roughly the same and buffers data - so yea I think we do need this complexity. For userspace sync we'd need the equivalent of:

Function: size_t gnutls_record_check_pending (gnutls_session_t session) session: is a gnutls_session_t type.

This function checks if there are unread data in the gnutls buffers. If the return value is non-zero the next call to gnutls_record_recv() is guaranteed not to block.

Returns: Returns the size of the data or zero.`

djwatson commented 8 years ago

I think this was fixed with the buffer changes?

fridex commented 8 years ago

I'm not sure with this (but I believe yes), Lance would give a proper status of this (btw this is correct for DTLS).

lancerchao commented 8 years ago

Yes, this was fixed.

ktls / af_ktls

Copy data from multiple TLS records to userspace buffer #53