net4people / bbs

Forum for discussing Internet censorship circumvention
3.19k stars 75 forks source link

IP spoofing can hide "almost" all traffic of proxy server on goverments' radar #159

Open arandomgstring opened 1 year ago

arandomgstring commented 1 year ago

Honestly I have no idea why something like this has not been implemented by even premium VPNs yet. Please do enlighten me. I will simplify it down to bare-bone, however, in practice we have to deal with ingress and egress filtering which IMO are kinda easy to dodge but here I ignore them; IP spoofing itself on the other hand is a pain, and understandably in open communities like this people won't find enough time to implement it, but still I like to share my theory anyway.

tl;dr: User sends packets to real IP address of proxy server Proxy server uses user IP to answer him/her back, but rewrite its own IP address in packets so that it looks like another website is sending packets to user, not the proxy server itself.

We know that normally the amount of data that a user receive from proxy server is significantly higher than what a user sends to proxy server. Therefore, as long as we hide the origin of incoming traffic (the proxy server) we can essentially hide the whole traffic. This is why people started using domain fronting to begin with, albeit it (almost) doesn't work anymore due the variety of reasons.

Any device that wishes to send a packet to a non local network needs to write its IP address in source address of IP layer's header; otherwise the packet will be dropped. Due to NAT, the IP address of the user changes quite a few times, but here we don't care about this. We are interested in public IP address of the proxy server itself which is static and remember that it is not behind symmetric, or other type of closed NAT. This is why government can block it anyway; if it were dynamic, blocking this IP addresses would be pointless, on the other hand if it were indeed dynamic, it couldn't accept incoming connections easily. So we have 1 static public IP address that we want to protect against censorship by hiding almost all of its traffic.

So what if, what if the proxy server could use IP spoofing to change the source address of the packets that it sends to the user, so that government thinks the user is using a well-known whitelisted website instead of using a proxy server?

The simplest procedure is as follows:

  1. The user connect to the proxy server and establish a SSL connection to it; just like how we visit a simple website. Note that from this point onward all packets are completely encrypted (the http header and its payload), but TCP, IP and physical layers are sent in plain bytes. After all a router needs to know the destination and source of the packet, we cannot encrypt it and TCP header doesn't contain any sensitive information that we wish to hide.

  2. Afterwards, the user opens a well-known whitelisted website and establish SSL connection to it (all of these things should be done on background of course). We already know the IP address of that well-known whitelisted website, or we can simply get it with a dig, or ping command.

  3. User sends packets to proxy server using proxy server real IP. The volume of these packets are very negligible; they are mostly http queries anyway. Optionally, the proxy server can answer back with some random bytes, but as for real data (which is heavy and takes 99% of the traffic) and real communication, proxy server rewrite the source IP address of packets to IP address of that well-known website.

Essentially we are kinda DDOSing ourselves but in a good way :). When government monitors the traffic of the user, 99% of the traffic it sees is encrypted traffic of a well-known whitelisted website. What a well-behaved good user indeed! Of course, we cannot hide everything. User still need to send their queries to real IP address of proxy server, and proxy server sometimes needs to answer back with its own IP so that everything look normal. User might download gigabytes of traffic from proxy server, but what government sees is a few kilobytes from proxy server and gigabytes from the well-known whitelisted website. A server can be shared with many people, and even government get suspicious all it sees is a simple website that sends a few KBs to its users.

Note that this is just a simplified toy model. I can't write a 24 pages article here.

wkrp commented 1 year ago

tl;dr: User sends packets to real IP address of proxy server Proxy server uses user IP to answer him/her back, but rewrite its own IP address in packets so that it looks like another website is sending packets to user, not the proxy server itself.

It is good that you are thinking in this direction. Let me list some related things that have been proposed/tried in the past, for inspiration or comparison.

Triangle Boy is somewhat similar to what you are suggesting. But the user sends its upstream traffic to a secondary proxy, and the main proxy spoofs source IP addresses in the downstream direction, to make them appear to be originating from the secondary proxy. In this case, the purpose of source address spoofing is to unify the upstream and downstream addresses, not separate them.

CensorSpoofer from 2012 is like what you are suggesting. The proxy server spoofs source IP addresses in the downstream direction.

CensorSpoofer de-couples the upstream and downstream channels, using a low-bandwidth indirect channel for delivering upstream messages (URLs) and a high-bandwidth direct channel for downloading web content. The upstream channel hides the request contents using steganographic encoding within Email or instant messages, whereas the downstream channel uses IP address spoofing so that the real address of the proxy is not revealed either to legitimate users or censors.

ReQrypt uses a similar triangular communication model, but the source address is changed in the upstream direction. Technically it is not spoofing: the user sends its original packets (with its own true source address) encapsulated (VPN-like) to the proxy server. The proxy server sends the packets (with the user's own IP address) on the network. Then the true server sends packets back directly to the user, bypassing the proxy in the downstream direction.

GoHop (GitHub) is worth mentioning. It does not use spoofing, but it splits its communication over multiple random ports, in order to prevent traditional flow analysis based on 5-tuples.

Conjure (summary here) does not use address spoofing, but it has the property you listed (hide the origin of the proxy server). It works by having intermediate routers intercept traffic to and from "phantom" IP addresses. There is source code for Conjure as a Tor pluggable transport if you want to test it.

I'll add that building a circumvention system around an independent session protocol (turbo tunnel) makes it relatively easy to do things like decoupling the upstream and downstream, splitting the upstream into multiple channels, or things like that.

free-the-internet commented 1 year ago

It's very interesting, but you can't spoof "the source of your incoming packets originated from your server" to well known services; unless you build yourself such a service. The other problem is ACKing the incoming packets if you use a protocol that needs ACK. This one can be done by carefully design, perhaps. That is filtering packets based on the interface, port and currently used spoofed IP (?). Also, I think spoofing to source and changing them to well-known ones is illegal.

arandomgstring commented 1 year ago

@wkrp Thank you a lot for providing such valuable resources.

CensorSpoofer from 2012 is like what you are suggesting.

This is it! I was not aware that it was proposed at all, but I could imagine that at least some services had already thought about something like this and yet no one has implemented it. Of course I disagree with using VoIP and email protocols where we can simply use https.

@free-the-internet

It's very interesting, but you can't spoof "the source of your incoming packets originated from your server" to well known services; unless you build yourself such a service.

Why not? We have already a https server running on our VPS with a real website on top of it (this is what vless/trojan/vmess/... + tls + nginx/caddy/... + ws uses anyway), then we spoof the IP address of another https website and that's about it. IP spoofing is a pain, just because when we rewrite the source IP address of the packets, we have to rewrite TCP header too because it holds IP header information in its checksum, however that is about it. Maybe a few IPtable rules solve this issue though without writing a single line of code.

The other problem is ACKing the incoming packets if you use a protocol that needs ACK. This one can be done by carefully design, perhaps. That is filtering packets based on the interface, port and currently used spoofed IP (?).

I was waiting for this good question. Either we have to go with UDP which I think is not a good idea at all because protocols such as QUIC are not very common, or we can simply write our code by design in this way:

The client doesn't actually send ACK packets to proxy server at all (it can optionally send it to no-where) and proxy server doesn't wait for ACK packets as well (this way we get UDP like behavior in TCP session). Of course the client can send those ACK packets to real public IP address of proxy server, but why should it do so?

Optionally we can make client application resistant against RST packets that ISP injects sometimes :) More importantly we don't spoof a single IP address. We can use a mixture of many well-known websites to make traffic even less special.

Also, I think spoofing to source and changing them to well-known ones is illegal.

Oh no, anyway. Remember building VPNs, using VPNs, etc are illegal too, where VPN is actually needed for common people.

free-the-internet commented 1 year ago

Why not? We have already a https server running on our VPS with a real website on top of it (this is what vless/trojan/vmess/... + tls + nginx/caddy/... + ws uses anyway), then we spoof the IP address of another https website and that's about it. IP spoofing is a pain, just because when we rewrite the source IP address of the packets, we have to rewrite TCP header too because it holds IP header information in its checksum, however that is about it. Maybe a few IPtable rules solve this issue though without writing a single line of code.

Here I mean being illegal to spoof Google's IP as well known service for example. But I'm just saying, I'm not sure if it's legal or not, or even if somebody cares. Yes, on Linux IPtables can do.

Oh no, anyway. Remember building VPNs, using VPNs, etc are illegal too, where VPN is actually needed for common people.

Here, I'm talking about the legality in a free country. Of course it is okay in Iran, China or Russia.

The client doesn't actually send ACK packets to proxy server at all (it can optionally send it to no-where) and proxy server doesn't wait for ACK packets as well (this way we get UDP like behavior in TCP session). Of course the client can send those ACK packets to real public IP address of proxy server, but why should it do so?

The whole stack needs to be replaced. Windows won't work. You can do custom Kernel in Linux.

alirezaac commented 1 year ago

well it is good that you think the same way as this tweet, and let me know if your idea is the same as shadow tls which is the same telegram is making hidden vpn for its russian users. and yes we need to attack and ddos, instead of defense, and the best and less effort ways are firewall evasion methods in ddosing. that shadowtls has weakness in its rack, it is so empty against some attacks in this good read well think like this that they are keep forcing and searching, we need to think of things like logic bombs.

free-the-internet commented 1 year ago

The client doesn't actually send ACK packets to proxy server at all (it can optionally send it to no-where) and proxy server doesn't wait for ACK packets as well (this way we get UDP like behavior in TCP session). Of course the client can send those ACK packets to real public IP address of proxy server, but why should it do so?

I forgot to say that this could be a design flaw as the censor will see TCP packets without ACK [for a long time] but no RST packets, or retransmissions.

cross-hello commented 1 year ago

The only relation after censors recognized this to distinguish proxy servers and normal web server is  via the traffic of client. List all tcp connections of clients, then map all ip to its responding domain name, then exclude domain names which don't have ACK, which is computation expensively.

Beside if initiate connection with corresponding domain in background from beginning also. it will be some with ACK, and some no. And make proxy servers show some similar behaviors. It could resist for a time.


From: free-the-internet @.> To: net4people/bbs @.> CC: Subscribed @.**> Date: Nov 22, 2022 22:59:30 Subject: *Re: [net4people/bbs] IP spoofing can hide "almost" all traffic of proxy server on goverments' radar (Issue #159)

The client doesn't actually send ACK packets to proxy server at all (it can optionally send it to no-where) and proxy server doesn't wait for ACK packets as well (this way we get UDP like behavior in TCP session). Of course the client can send those ACK packets to real public IP address of proxy server, but why should it do so?

I forgot to say that this could be a design flaw as the censor will see TCP packets without ACK [for a long time] but no RST packets, or retransmissions.

— Reply to this email directly, view it on GitHub[https://github.com/net4people/bbs/issues/159#issuecomment-1324344323], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AKGBAYHYKXFBAECDYNGYS7TWJVF5DANCNFSM6AAAAAASHI774U]. You are receiving this because you are subscribed to this thread.[Tracking image][https://github.com/notifications/beacon/AKGBAYDIIZPDQUTX36XS3RLWJVF5DA5CNFSM6AAAAAASHI774WWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSO57SAG.gif]Message ID: @.***>

cross-hello commented 1 year ago

Also, I think spoofing to source and changing them to well-known ones is illegal.

The IP Spoofing Law and Legal Definition is:

IP (Internet Protocol) spoofing is one of the most common forms of on-line camouflage. It is a technique used to gain unauthorized access to computers, whereby the intruder sends messages to a computer with an IP address indicating that the message is coming from a trusted host. To engage in IP spoofing, a hacker must find an IP address that is allowed, then modify the header information in the data packets from his or her own computer to include this IP address. Newer routers and firewall arrangements offer protection against IP spoofing.

Think about it with authorized access, it shouldn't be problem?

arandomgstring commented 1 year ago

@free-the-internet

Here I mean being illegal to spoof Google's IP as well known service for example. But I'm just saying, I'm not sure if it's legal or not, or even if somebody cares.

Oh I didn't mean spoofing Google's IP or any foreign IP at all, though we can do that as well. Instead some websites within that country is the best. For example, you spoof the IP address of aparat.com while you are watching youtube.com. How cool is that? And interestingly enough, it brings benefit to aparat itself, because ISPs see that the usage of aparat has increased so more money would be invested in this website. Win, Win situation.

The whole stack needs to be replaced. Windows won't work. You can do custom Kernel in Linux.

Not necessarily. Yes if we change the whole stack it would be the best in term of performance; but who has the time to rewrite the whole thing from the beginning. However, we can simply inject "fake" ACK packets to our sockets, so that we make Windows (or Linux) kernel believe that everything is fine. Even a simple python code in theory can be written in this way, though connecting this code with v2ray or naiveproxy, etc is another story.

I forgot to say that this could be a design flaw as the censor will see TCP packets without ACK [for a long time] but no RST packets, or retransmissions.

Now now, the only person paying attention to these details is you, which is very good. I actually omitted these details because my post would have become very long otherwise. But as you can guess, at the beginning of connection we actually do send ACK packets to proxy server to do a SSL handshake with it to begin with. Afterward, we can send keep alive (or URL queries) to it and that's it. Proxy server can also send some fake data with very low volume back, in response to those queries and that's it, the client sends ACK to those fake responses. IP spoofing will be used for the real response which is heavy.

@alirezaac

well it is good that you think the same way as this tweet,

That tweet is good, but what I am proposing here (or rather what it was proposed 10 years ago by other people) is much much more lower level; fighting against it is very expensive for government, as it needs egress filtering which will not implemented in Iran, but maybe in China in future. As for the other links, all of them work on application layer, not IP layer.

arandomgstring commented 1 year ago

@cross-hello

The only relation after censors recognized this to distinguish proxy servers and normal web server is via the traffic of client. List all tcp connections of clients, then map all ip to its responding domain name, then exclude domain names which don't have ACK, which is computation expensively.

Beside if initiate connection with corresponding domain in background from beginning also. it will be some with ACK, and some no. And make proxy servers show some similar behaviors. It could resist for a time.

Ummm, no! we actually do send ACK packets to proxy server time to time, as well as TCP keep alive packets. For example, every time that we request a website from proxy server, proxy server responds with a very low volume packet (just confirming our request), and we send an ACK packet to it and that's it, other traffic is spoofed. Now what they can do?

Compare these two situations:

  1. you request to download a file (or watch a video) from proxy server. proxy server get that information and send it back with its IP address. Meanwhile you send many ACK packets to it as well. So the total volume would be

Volume of user's request (Low) + Volume of proxy's response (very high, because not only it contains the information of file/video but also because of TLS over TLS every packet has become super big) + Volume of user's responses (many ACK packets) to proxy responses (Low).

  1. you request to download a file (or watch a video) from proxy server. proxy server get that information and send it back with another IP address, also it sends a very low volume packet as response to your request which you will acknowledge once and that is it. You won't send ACK packets any longer, because on surface you have received everything you wanted from proxy server.

Volume of user's request (Low) + Volume of proxy's response (very low) + Volume of user's respond (a single ACK packet) to proxy response (very low) , and heavy traffic comes from another website.

We can go a step further, and send RST packet to proxy server after our request, and initiate a new connection for every new request. This way we solve almost all issues, regarding ACK packets. The problem is, if a user send too many RST packets to a TCP connection (for example when he/she is playing a game, he/she will send many requests), its traffic become distinguishable from the rest of TCP connections. But with TCP keep alive, everything would be fine. We can optionally set a timeout for it too.

free-the-internet commented 1 year ago

Now now, the only person paying attention to these details is you, which is very good. I actually omitted these details because my post would have become very long otherwise. But as you can guess, at the beginning of connection we actually do send ACK packets to proxy server to do a SSL handshake with it to begin with. Afterward, we can send keep alive (or URL queries) to it and that's it. Proxy server can also send some fake data with very low volume back, in response to those queries and that's it, the client sends ACK to those fake responses. IP spoofing will be used for the real response which is heavy.

Here, I meant the censor will detect the many un-acked spoofed packets. In fact if they are TCP, ISP expects ACKs to go to spoofed IP (e.g. if you used aparat's IP, there should be acks towards that IP; but can you send them ACKs (technically Yes, but with no consequences like getting reported, or banned or ... ?).

arandomgstring commented 1 year ago

@free-the-internet Correct. Though there is a workaround this as well. If you remember from my very first post here, I said

  1. Afterwards, the user opens a well-known whitelisted website and establish SSL connection to it (all of these things should be done on background of course).

It means that we do send ACK packets to that spoofed website, actually! These ACK packets are healthy and we need not to forge them and they won't cause any trouble to that server anyway. However, if government looks very closely and check the number of packets received by user from spoofed IP vs the amount of ACK sent by user to that spoofed IP, they might become suspicious. But think about it, how hard, and expensive is it for them to do so! They have to list all TCP connection, check the rate of packet and cache the number of ACK packet sent by user as well as the number of packet received by that user for every single IP, and they have to consider retransmittion, time outs and lots of other stuff.

We can optionally forge and send ACK packets to spoofed IP but that's not a good idea, the spoofed server cannot acknowledge these packets, and might send RST packet back to us.

The best way to solve all problems is to spoof a different IP address, say every 10 second or something. That in itself not only make censorship harder, but it also solves the problem of ACK packets to some extent. They can't even match the number of packet received by user vs the ACK packet sent by user to that server in short connections.

free-the-internet commented 1 year ago

@arandomgstring Did you follow up the idea of "ACK needless of the internal HTTPS traffic" in TLS in TLS scenario that we have with v2ray protocols? In simpler (and less accurate) words, we translate TCP flow to UDP like traffic and exchange UDP inside initial TLS: Client: application_produces_tcp->tcp2udp-->TLS <-----INTERNET-----> Server: TLS->udp2tcp-->... This way we can get rid of internal TLS (TCP) ACKs, and thus in reality we should have less overhead. Please note that tcp2udp can be done in any other proper way, like forging ACKs directly in client and terminating ACKs in the server.

In general, do you agree that this can increase the performance? There are some schemes like xtls-rprx-origin-udp443 and also xtls-rprx-direct-udp443, but I couldn't find what is the usage from the docs. I tested both with xtls settings, but xray gives error when I use them. Maybe this is what I'm talking here, and they are already studied it, and now abandoned as it was not improving the performance as much as expected?

arandomgstring commented 1 year ago

@free-the-internet I presume such options are deprecated in favor of rprx-vision that you were using. You see, vision's performance is much much better than what you are describing. Because translating udp2tcp or vice versa, takes time. In vision, such translations are not needed, and you achieve "not sending needless ACK" directly. Because after all, TLS in TLS doesn't happen to begin with.

ValdikSS commented 1 year ago

Note that this is just a simplified toy model. I can't write a 24 pages article here.

But Sergey Frolov et al could.
https://www.usenix.org/system/files/conference/usenixsecurity14/sec14-paper-wustrow.pdf

It is not exactly your idea, but a very similar one, and on a larger scale (there's no single proxy server, instead the system is installed as a passive tap on ISP). The paper describes all the challenges and drawbacks of using third-party running websites as a termination point, in particular issues with ACKs.

The overall concept is called Refraction networking. This is implemented in a form of standalone application https://github.com/refraction-networking/gotapdance, which also supports a future generation of tapdance protocol called conjure, which is quite similar but instead uses unallocated IP addresses and not with websites ones.

Tapdance/conjure could be seen live in Psiphon. It also includes a working solution for ACK issue (or prolongation method so to say) on Linux/Android, by using classic BPF filter to block RST/RSTACK packets.

Honestly I have no idea why something like this has not been implemented by even premium VPNs yet.

Because, unfortunately, your idea presents even more challenges than tapdance, much more actually.

  1. To send data back to the user using spoofed IP, you need to know source port of the connection between the user and the whitelisted website after NAT/CGNAT. If there's destination-dependent && non-port-preserving mapping somewhere (so-called symmetric NAT), you won't be able to learn the port (in a sane amount of time).
  2. To make the router/CGNAT accept your data, you need to learn TCP SEQ/ACK numbers of the user-whitelisted website connection, which require RAW socket or driver, root/administrator privileges, which instantly eliminates mobile devices (there's no support for on-socket eBPF on Android, it's blocked by seccomp for user applications, and classic BPF does not allow to extract data).
  3. A good deal of networks with direct peering implement source address checks, there's a high chance that your spoofed packet just won't be delivered from one address or another.

I recommend visiting refraction networking website, there are a lot of interesting papers.
Also there's refraction concept for QUIC: https://fc8.web.illinois.edu/posters/dr-quic.pdf


If you have the server with IP spoofing, you could try to "reverse TCB" for UDP traffic, it would probably work if the block is one-way from the country to elsewhere.