httptoolkit / httptoolkit-android

Automatic Android interception & debugging with HTTP Toolkit, for Android
https://httptoolkit.com
GNU Affero General Public License v3.0
476 stars 72 forks source link

Get Hostname/Domain from the TCP packet #17

Closed jyotman closed 9 months ago

jyotman commented 9 months ago

Hi, thanks for the amazing work on this tool 😄

I'm trying to test the VPN package individually as a separate Android app.

Can you please tell what is the best way to get the hostname or FQDN of the request being tunneled from the VPN?

For context - I'm trying to use it for a use case where I don't have any external VPN server. I just want to log all network traffic in the Android device itself once the local VPN is active.

This is what I tried in SessionHandler.handleTCPPacket - Log.d(TAG, "Hostname: " + InetAddress.getByName(PacketUtil.intToIPAddress(destinationIP)).getHostName());

But this code mostly returns the IP address. Only rarely I'm able to see certain domain names, but those domain names are usually pretty long and seem like CDNs etc. I know that HTTPToolkit shows the complete domain, I'm wondering how to go about that. Thank you!!

pimterry commented 9 months ago

Can you please tell what is the best way to get the hostname or FQDN of the request being tunneled from the VPN?

That's a much more complicated thing than I think you're imagining, because TCP/IP itself doesn't directly include hostnames - it only uses IP addresses.

That means that literally doing what you're asking for is definitely impossible, but there are options.

First, it is possible to log IP addresses, as you've found, and that should be easy & will be reliable. You could just look up IPs and then try to reverse DNS lookup that IP (https://en.wikipedia.org/wiki/Reverse_DNS_lookup). This won't be super reliable, but it'll give you a basic idea and probably isn't too hard (I've never tried).

Beyond that you're in trouble. For typical HTTPS traffic each single packet looks like:

To reiterate: the hostname is never present in the TCP/IP packet headers. So that means if you want to get the domain names you need to unwrap all of this, do lots of parsing and potentially interception of the connection to understand the data within and extract it.

HTTP Toolkit works by doing exactly this: it runs an HTTPS-intercepting proxy on your computer, which accepts connections, handles TLS & HTTP to fully understand the client's request (including the hostname they're talking to) and then it does something with the request (forwards it on, or uses some other rule to reply directly, or simulate errors, or whatever else). HTTP Toolkit's code to do so is open source (https://github.com/httptoolkit/mockttp/) but it's intended for Node.js running on a computer, so I can't help with running that within an Android app.

Your only other option would be to detect DNS lookups and responses (parsing UDP packets on port 53) and then store the name requested & the received IP, and compare that selection with IPs you see in future packets. That should be fairly reliable, but it's not 100% guaranteed (many hostnames may use the same IP! E.g. anything behind a CDN like cloudflare) and there's also DNS caching, so you might not see a request for every hostname the client is talking to.

Does that make sense?

jyotman commented 9 months ago

Hi @pimterry thanks a lot for the detailed answer. Yes, that makes a lot of sense. I'm gonna try and read from the UDP port 53 😄 🙏