Closed Anonymous-Someneese closed 4 years ago
Salt and target address can be combined in a single packet because we know both pieces of information immediately when a client connects. Actual content is a bit challenging because for a generic TCP connection the client might not send any data but expect the server to send information (e.g. hello banner) first, though popular protocols like HTTP do not have this problem. A workaround could be to wait a couple millisecond for the client to send actual payload to combine with salt and target address in the first packet.
Go still lacks proper support for TCP Fast Open. However if local side is on Linux kernel 4.11 and above, it could be hacked on right now. See https://github.com/golang/go/issues/4842 for detail.
As for the connection without initial data problem, shadowsocks-libev behaves the same so I guess it's good enough for most implementation? For the Fastopen, yes I am intended to implement using this method but we first need to send all data in one row in order to drop TTFB.
I just gave it a timeout. @riobard What do you think?
To be honest I'm hesitant to add artificial delay. TLS 1.3 and TFO spends all effort to cut handshake time and we just spend that saving right away…
I agree. But if it maintain the current behavior, TFO will have no effect. This change will save 1 RTT for normal connections if TFO is enabled and add 5ms for connections that don't send data on the initiator side. Maybe we can have an option to disable the behavior and allow setting the delay manually?
Wait, I thought we do not currently have TFO yet?
Yes. I mean even if you add TFO support, only the first packet will be sent in the handshake process. So TFO will have no effect if we send multiple packets for the request. I will add TFO support in another PR if this one is merged.
@riobard Any update?
@Anonymous-Someneese Sorry I've been busy recently and haven't got time to think about the issue. Maybe @lixin9311 could check the PR and see if it fits.
@Anonymous-Someneese Sorry I've been busy recently and haven't got time to think about the issue. Maybe @lixin9311 could check the PR and see if it fits.
I think it could be a more elegant way, if we pass target address to the shadow
function, and initialize the cipher with an iv/salt, temporarily saving them somewhere.
Then we send them along with the first data packet, initialize the writer
afterwards, without the timer and any other atomic ops.
I would be happy to send another PR addressing the same issue, but this patch currently doesn't look good to me.
Salt and target address can be combined in a single packet because we know both pieces of information immediately when a client connects. Actual content is a bit challenging because for a generic TCP connection the client might not send any data but expect the server to send information (e.g. hello banner) first, though popular protocols like HTTP do not have this problem. A workaround could be to wait a couple millisecond for the client to send actual payload to combine with salt and target address in the first packet.
@lixin9311 Thank you for your work. But what is your solution to this problem?
Salt and target address can be combined in a single packet because we know both pieces of information immediately when a client connects. Actual content is a bit challenging because for a generic TCP connection the client might not send any data but expect the server to send information (e.g. hello banner) first, though popular protocols like HTTP do not have this problem. A workaround could be to wait a couple millisecond for the client to send actual payload to combine with salt and target address in the first packet.
@lixin9311 Thank you for your work. But what is your solution to this problem?
see https://github.com/shadowsocks/go-shadowsocks2/pull/164/ thank you
Today I've got some time to think about this issue. There are three separate but related problems here:
Given the above understanding of the problem landscape I tend to think we just do the minimal work of fixing problem 1.
Please share your thoughts and we could discuss.
Yes, I agree. If we only need to send iv and address, we can manually merge those 2 packets, without any system calls.
I think a code change should be implemented instead of changing socket label since many application uses this project as library. We should at least concatenat iv and addr packets to eliminate the protocol feature. As for fastopen, I think giving an option would be better without one. The user should be responsible to decide if it's better to turn it on. That's also what other implementation have been doing.
I strongly recommend coalescing salt+address+initial data. It makes the size of the initial packet hard to predict, which is not the case for salt+address.
We were able to implement this in outline-ss-server: https://github.com/Jigsaw-Code/outline-ss-server/pull/73
The change was very tricky to get right without adding extra buffering, but we made it. Our ShadowsocksWriter
now has a LazyWrite
method that eventually gets flushed by a timed Flush
call, or a full Write
or ReadFrom
. See our DialTCP code.
In practice there's no delay most of the time, since clients will usually send data immediately. For server-speaks-first protocols, there's a minimal 10ms delay, but I don't think it matters. We are in the process of releasing this change to Outline Clients during this week and the next.
We have previously made a change to coalesce the salt and initial server data, which is a lot simpler: https://github.com/Jigsaw-Code/outline-ss-server/pull/69
That has been running on production servers for a few weeks now.
Coalescing salt+address+initial data also removes the possibility of some replay attacks that observe where the boundaries of the encryption blocks are, since it becomes one single block with unpredictable length.
@fortuna I think (haven't actually tried yet) that by turning on Nagle's algorithm for the first couple of packets we can get the same coalescing effect without explicit packing bytes?
I'm curious to hear whether that works. I also haven't tried it. It would certainly be an easy way to implement the packet coalescing. However, from what I read about the Nagle's algorithm, it only buffers if there's unconfirmed data in the pipe, which I interpret as immediately sending the first write as its own packet, which wouldn't accomplish what we need.
@fortuna My bad. After further thinking about it, what you said is correct, and Nagle's algo won't help in this case.
OK, I've found the correct mechanism to implement this without breaking code layering and API compatibility.
I was confused and thinking about TCP_NODELAY (Nagle's algo), but actually it's TCP_CORK that'll do wonders in the case. A naive solution is like this:
Caveats:
A proof of concept for client-side use https://github.com/shadowsocks/go-shadowsocks2/commit/50676a73f6643df35b3d15ea74850b142c46df94
The TCP_CORK solution seems convenient, but unfortunate that it's for Linux only. We opted to implement the in user space, which we have much more control and is cross-platform. Maybe you implement TCP_CORK in the interim before implementing in the user space?
Feel free to copy what we did by the way. We effectively have a LazyWrite call in the ShadowsocksWriter that appends to the internal buffer, and a Flush call that writes to the TCP connection. Regular Write will also flush. We were able to do it without requiring an extra buffer. You just need to make sure there's space for the AEAD tag after the lazy data, in case you need to flush before the regular ReadFrom finishes the first read.
BSD has TCP_NOPUSH, so the last commit works on both Linux and BSD. It is also trivial to implement corking (or traffic shaping in general) in user-space by wrapping net.TCPConn
with necessary timing/buffering policies for platforms like Windows which lack similar kernel mechanism without breaking code layering and API compatibility.
Turns out a generic user-space TCP_CORK is very simple, see this feature branch https://github.com/shadowsocks/go-shadowsocks2/tree/feature-generic-cork
Generic TCP corking has been merged in #186. Closing this now.
Sending salt, address and content seperately would generate three packets during the start of a connection. Since the size of salt is almost always 8, 16 or 32 and the address are pretty much always 20-100 bytes, it leaves a strong feature.
Additionally, with TCP fastopen turned on, only the first write would be carried by SYN packet. The other writes will have to wait until the handshake complete, making fastopen pointless.