Write salt/iv, addr together with content

Anonymous-Someneese commented 4 years ago

Sending salt, address and content seperately would generate three packets during the start of a connection. Since the size of salt is almost always 8, 16 or 32 and the address are pretty much always 20-100 bytes, it leaves a strong feature.

Additionally, with TCP fastopen turned on, only the first write would be carried by SYN packet. The other writes will have to wait until the handshake complete, making fastopen pointless.

riobard commented 4 years ago

Salt and target address can be combined in a single packet because we know both pieces of information immediately when a client connects. Actual content is a bit challenging because for a generic TCP connection the client might not send any data but expect the server to send information (e.g. hello banner) first, though popular protocols like HTTP do not have this problem. A workaround could be to wait a couple millisecond for the client to send actual payload to combine with salt and target address in the first packet.

Go still lacks proper support for TCP Fast Open. However if local side is on Linux kernel 4.11 and above, it could be hacked on right now. See https://github.com/golang/go/issues/4842 for detail.

Anonymous-Someneese commented 4 years ago

As for the connection without initial data problem, shadowsocks-libev behaves the same so I guess it's good enough for most implementation? For the Fastopen, yes I am intended to implement using this method but we first need to send all data in one row in order to drop TTFB.

Anonymous-Someneese commented 4 years ago

I just gave it a timeout. @riobard What do you think?

riobard commented 4 years ago

To be honest I'm hesitant to add artificial delay. TLS 1.3 and TFO spends all effort to cut handshake time and we just spend that saving right away…

Anonymous-Someneese commented 4 years ago

I agree. But if it maintain the current behavior, TFO will have no effect. This change will save 1 RTT for normal connections if TFO is enabled and add 5ms for connections that don't send data on the initiator side. Maybe we can have an option to disable the behavior and allow setting the delay manually?

riobard commented 4 years ago

Wait, I thought we do not currently have TFO yet?

Anonymous-Someneese commented 4 years ago

Yes. I mean even if you add TFO support, only the first packet will be sent in the handshake process. So TFO will have no effect if we send multiple packets for the request. I will add TFO support in another PR if this one is merged.

Anonymous-Someneese commented 4 years ago

@riobard Any update?

riobard commented 4 years ago

@Anonymous-Someneese Sorry I've been busy recently and haven't got time to think about the issue. Maybe @lixin9311 could check the PR and see if it fits.

lixin9311 commented 4 years ago

@Anonymous-Someneese Sorry I've been busy recently and haven't got time to think about the issue. Maybe @lixin9311 could check the PR and see if it fits.

I think it could be a more elegant way, if we pass target address to the shadow function, and initialize the cipher with an iv/salt, temporarily saving them somewhere. Then we send them along with the first data packet, initialize the writer afterwards, without the timer and any other atomic ops.

lixin9311 commented 4 years ago

I would be happy to send another PR addressing the same issue, but this patch currently doesn't look good to me.

Anonymous-Someneese commented 4 years ago

Salt and target address can be combined in a single packet because we know both pieces of information immediately when a client connects. Actual content is a bit challenging because for a generic TCP connection the client might not send any data but expect the server to send information (e.g. hello banner) first, though popular protocols like HTTP do not have this problem. A workaround could be to wait a couple millisecond for the client to send actual payload to combine with salt and target address in the first packet.

@lixin9311 Thank you for your work. But what is your solution to this problem?

lixin9311 commented 4 years ago

Salt and target address can be combined in a single packet because we know both pieces of information immediately when a client connects. Actual content is a bit challenging because for a generic TCP connection the client might not send any data but expect the server to send information (e.g. hello banner) first, though popular protocols like HTTP do not have this problem. A workaround could be to wait a couple millisecond for the client to send actual payload to combine with salt and target address in the first packet.

@lixin9311 Thank you for your work. But what is your solution to this problem?

see https://github.com/shadowsocks/go-shadowsocks2/pull/164/ thank you

riobard commented 4 years ago

Today I've got some time to think about this issue. There are three separate but related problems here:

Salt and address should definitely be combined and sent together. This can be easily done by turning off TCP_NODELAY (default is on by Go's net.TCPConn) at the beginning, and turning it back on after address is written. Nagel's algorithm should do the job of combing the two writes into a single small packet without changing our code structure.
Waiting for actual payload from client is problematic, because the client might be expecting the server to speak first.
And the above two problems exist because we want to enable TCP Fast Open to send all three pieces of data (salt, address, initial payload) in the first SYN packet, which Nagel's algo does not even apply (at least that's my understanding). TFO comes with its own host of problems, mostly because it does not work well in practice (see https://github.com/shadowsocks/go-shadowsocks2/issues/161#issuecomment-581011916), and the semantics of enabling it in a split-proxy scenario like Shadowsocks with 4 endpoints and 3 legs is ill-defined. The still-in-draft SOCKS6 spec covers it but only in the normal 3 endpoints and 2 legs scenario.

Given the above understanding of the problem landscape I tend to think we just do the minimal work of fixing problem 1.

Please share your thoughts and we could discuss.

lixin9311 commented 4 years ago

Yes, I agree. If we only need to send iv and address, we can manually merge those 2 packets, without any system calls.

Anonymous-Someneese commented 4 years ago

I think a code change should be implemented instead of changing socket label since many application uses this project as library. We should at least concatenat iv and addr packets to eliminate the protocol feature. As for fastopen, I think giving an option would be better without one. The user should be responsible to decide if it's better to turn it on. That's also what other implementation have been doing.

fortuna commented 4 years ago

I strongly recommend coalescing salt+address+initial data. It makes the size of the initial packet hard to predict, which is not the case for salt+address.

We were able to implement this in outline-ss-server: https://github.com/Jigsaw-Code/outline-ss-server/pull/73 The change was very tricky to get right without adding extra buffering, but we made it. Our ShadowsocksWriter now has a LazyWrite method that eventually gets flushed by a timed Flush call, or a full Write or ReadFrom. See our DialTCP code.

In practice there's no delay most of the time, since clients will usually send data immediately. For server-speaks-first protocols, there's a minimal 10ms delay, but I don't think it matters. We are in the process of releasing this change to Outline Clients during this week and the next.

We have previously made a change to coalesce the salt and initial server data, which is a lot simpler: https://github.com/Jigsaw-Code/outline-ss-server/pull/69

That has been running on production servers for a few weeks now.

fortuna commented 4 years ago

Coalescing salt+address+initial data also removes the possibility of some replay attacks that observe where the boundaries of the encryption blocks are, since it becomes one single block with unpredictable length.

riobard commented 4 years ago

@fortuna I think (haven't actually tried yet) that by turning on Nagle's algorithm for the first couple of packets we can get the same coalescing effect without explicit packing bytes?

fortuna commented 4 years ago

I'm curious to hear whether that works. I also haven't tried it. It would certainly be an easy way to implement the packet coalescing. However, from what I read about the Nagle's algorithm, it only buffers if there's unconfirmed data in the pipe, which I interpret as immediately sending the first write as its own packet, which wouldn't accomplish what we need.

riobard commented 4 years ago

@fortuna My bad. After further thinking about it, what you said is correct, and Nagle's algo won't help in this case.

riobard commented 4 years ago

OK, I've found the correct mechanism to implement this without breaking code layering and API compatibility.

I was confused and thinking about TCP_NODELAY (Nagle's algo), but actually it's TCP_CORK that'll do wonders in the case. A naive solution is like this:

Set TCP_CORK;
Send salt;
Send address;
(Optionally) send payload;
Unset TCP_CORK.

Caveats:

TCP_CORK is Linux-specific. On BSD there is a similar option called TCP_NOPUSH. Not sure about Windows.
TCP_CORK will wait up to 200ms to accumulate data before sending out on the wire. The 200ms timer seems to be hard-coded and unadjustable. Ideally we'd like to keep it to under 10ms in our use case to reduce latency penalty of server-speak-first protocols. A simple workaround is that right after setting TCP_CORK we start a 10ms-delayed function to unset TCP_CORK, which forces sending on the wire.

riobard commented 4 years ago

A proof of concept for client-side use https://github.com/shadowsocks/go-shadowsocks2/commit/50676a73f6643df35b3d15ea74850b142c46df94

fortuna commented 4 years ago

The TCP_CORK solution seems convenient, but unfortunate that it's for Linux only. We opted to implement the in user space, which we have much more control and is cross-platform. Maybe you implement TCP_CORK in the interim before implementing in the user space?

Feel free to copy what we did by the way. We effectively have a LazyWrite call in the ShadowsocksWriter that appends to the internal buffer, and a Flush call that writes to the TCP connection. Regular Write will also flush. We were able to do it without requiring an extra buffer. You just need to make sure there's space for the AEAD tag after the lazy data, in case you need to flush before the regular ReadFrom finishes the first read.

riobard commented 4 years ago

BSD has TCP_NOPUSH, so the last commit works on both Linux and BSD. It is also trivial to implement corking (or traffic shaping in general) in user-space by wrapping net.TCPConn with necessary timing/buffering policies for platforms like Windows which lack similar kernel mechanism without breaking code layering and API compatibility.

riobard commented 4 years ago

Turns out a generic user-space TCP_CORK is very simple, see this feature branch https://github.com/shadowsocks/go-shadowsocks2/tree/feature-generic-cork

riobard commented 4 years ago

Generic TCP corking has been merged in #186. Closing this now.

shadowsocks / go-shadowsocks2

Write salt/iv, addr together with content #160