A privacy-preserving TCP proxy based on Signal's Expanding Signal GIF search article.
This README outlines the high-level ideas, see CONTRIBUTING.md
for information about how to contribute to/build the project.
From Signal's article:[1]
In order to hide your search term from GIPHY, the Signal service acts as a privacy-preserving proxy.
When querying GIPHY:
- The Signal app opens a TCP connection to the Signal service.
- The Signal service opens a TCP connection to the GIPHY HTTPS API endpoint and relays bytes between the app and GIPHY.
- The Signal app negotiates TLS through the proxied TCP connection all the way to the GIPHY HTTPS API endpoint.
Since communication is done via TLS all the way to GIPHY, the Signal service never sees the plaintext contents of what is transmitted or received. Since the TCP connection is proxied through the Signal service, GIPHY doesn't know who issued the request.
The Signal service essentially acts as a VPN for GIPHY traffic: the Signal service knows who you are, but not what you're searching for or selecting. The GIPHY API service sees the search term, but not who you are.
This proxy is an implementation of exactly that.
If you deployed an example proxy for httpbin.org to relay.privaterelay.technology
. You could send requests
to httpbin.org through that proxy to hide your IP address from the service.
httpbin.org has a /ip
endpoint that will return the requester's IP address:
curl -sSL 'https://httpbin.org/ip' | jq '.origin'
# => $ADDRESS1
curl -sSL --connect-to httpbin.org:443:relay.privaterelay.technology:443 'https://httpbin.org/ip' | jq '.origin'
# => $ADDRESS2
# Note that $ADDRESS1 ≠ $ADDRESS2
In the example above, $ADDRESS1
is your external IP address, as expected, while $ADDRESS2
is the IP address of the
proxy.
(See the cURL man page for: --connect-to <HOST1:PORT1:HOST2:PORT2>
)
"It's just HAProxy"
The proxy server runs HAProxy in TCP mode, and the TLS connection passes through. A useful diagram from the HAProxy docs:[1]
HAProxy does not and cannot decipher the traffic.
You can see the full HAProxy configuration used in proxy/haproxy.cfg
.
Privacy, mostly, at the cost of an extra TCP connection.
From Signal's article, again:[1]
[The proxy service] knows who you are, but not what you're searching for or selecting. The GIPHY API service sees the search term, but not who you are.
Fork the repo and configure it!
There are two main components:
(Note: this is not used for load balancing per se, more a way of routing users to the closest HAProxy instance.)
The first component is a Cloudflare Load Balancer in DNS-Only mode with a 30 second TTL.
Operating in this mode does have a caveat:
[This] relies on DNS resolvers respecting the short TTL to re-query Cloudflare’s DNS for an updated list of healthy addresses.
The DNS-only load balancer does dynamic latency-based DNS resolution via Dynamic Steering:
Dynamic Steering uses health check data to identify the fastest pool for a given Cloudflare Region [...]
Dynamic Steering creates Round Trip Time (RTT) profiles based on an exponential weighted moving average (EWMA) of RTT to determine the fastest pool. If there is no current RTT data for your pool in a region or colocation center, Cloudflare directs traffic to the pools in failover order.
As described above, HAProxy runs in TCP mode, and the TLS connection passes through. DigitalOcean hosts the HAProxy servers.
"How much does this cost to host?"
(All amounts are USD.)
The hosting costs depend on the configured regions and bandwidth usage.
The individual monthly costs:
s-1vcpu-1gb
cloudflare_load_balancer_monitor.simple_tcp_monitor.interval
)cloudflare_load_balancer.private_relay_lb.steering_policy
)The total monthly costs for the config in this repository:
DigitalOcean | Droplets | $25 |
Bandwidth (~8 TB) | ~$30 | |
Cloudflare | Basic | $5 |
5 origin servers | $15 | |
15s checks | $15 | |
RTT from 8 regions | $15 | |
Latency-based traffic steering | $10 | |
DNS (~5.5M queries) | $5 | |
Total | ~$120 |
---|
Resources:
This repository is available under the ISC License. See LICENSE.md
.