Using proxy and scraping services for hiding servers?

mmmray commented 3 months ago

This idea is a bit out there and I lack some networking understanding to determine whether it is doable at all.

There are a few companies out there that provide access to a proxy network of "residential IPs". Basically comparable with botnets. You get a SOCKS5 proxy endpoint, and the TCP connections you establish through that proxy enter the public network through what I can assume is somebody's malware-infected mobile phone or home computer. Basically, botnet as a service. Those services have relatively high, but not prohibitively high prices per-gigabyte, and are mostly to bypass detection when scraping websites for data.

These companies advertise a pool of "millions of organic IPs". Now I wonder, can this kind of service be used in this setup:

there is a middlebox in a censored country. it masquerades as a website and has two hidden endpoints: one for clients to connect to, and one for establish reverse tunnels to outside of the GFW
an exit server in the friendly country attempts to establish connections to that middlebox, conceiled as web traffic. Proxy networks are used to hide the source IP in order to make the (client ip, server ip) distribution seem organic, as if foreign visitors are browsing the website
clients connect to the middlebox, and their packets travel from middlebox through the proxy network to the exit server

the idea of middleboxes is not new for sure. what I wonder is, what are the challenges getting the traffic from the middlebox over the firewall, generally speaking, and am i correct in assuming there's a challenge in making that traffic seem organic?

gaukas commented 3 months ago

That's an interesting idea. So effectively your model is equivalent to (from a censor's perspective):

Multiple residential IPs from outside of the censorship perimeter connecting to a server inside the perimeter.
Multiple residential IPs from within the perimeter connecting to the same server.

While the model may seem to be fairly common (Baidu.com?), there are still a few discrepancies censors may notice, majorly via traffic shaping detection:

TLS-over-TLS pattern: the most controversial traffic shaping problem in circumvention community. See xue-usenix2024.
Flow direction: HTTP round trips are usually asymmetrical, that one relatively small request triggers a relatively large response. Without a very smart padding/fragmenting, the direction will look like reversed on the leg crossing the censorship perimeter, i.e., a small response (HTTP server -> HTTP client) triggering a large request.
Connection TTL/Timing: HTTP connections are likely short-lived and immediately start communication once established.

Please feel free to point me out in case I made any mistake. And I believe there should be more common challenges that the circumvention community is currently facing.

As said, web browsing might not be an ideal traffic source for mimicry purposes. There could be better candidates like online gaming, video streaming/conferences, etc.

mmmray commented 3 months ago

You're right there are quite a few features being used by some censors that this solution does not cover. I was mainly focused on Iran where it seems to me the main issue today with censorship are per-IP bandwidth limits rather than things like TLS-in-TLS detection.

gaukas commented 3 months ago

To be fair, overall this is still not a bad idea, since the strongest advantages the circumvention community has against censors are the variety and agility. To not fall into the dead cycle of cat-and-mouse game, I believe it is crucial to introduce more novel designs/approaches.

klzgrad commented 3 months ago

Connection TTL/Timing: HTTP connections are likely short-lived

Many ones are short-lived, but also many ones are long-lived, e.g. HTTP/2 connections.

The problem is, if a proxy tunnel connection multiplexes several H/2 connections, the tunnel connection will be even longer-lived than each indivisual H/2 ones. And this cannot be shortened without breaking the payload connections.

Given payload H/2 connections, the only way to shorten the tunnel connection time limit is connection migration, which is only available in H/3, or I forget where but wkrp may have mentioned it somewhere here. But overall this dimension is quite difficult to parrot in terms of engineering.

ValdikSS commented 3 months ago

https://www.akamai.com/blog/security/upnproxy-eternal-silence

mmmray commented 3 months ago

the proxy services I have in mind do not allow listening for inbound connections on a port, hence the need for a middlebox and the pretending that there is a website with organic traffic. this upnproxy vulnerability sounds like a middlebox might not be necessary at all, meaning that clients can connect directly to a bunch of IPs?

net4people / bbs

Using proxy and scraping services for hiding servers? #336