ntop / nDPI

Open Source Deep Packet Inspection Software Toolkit
http://www.ntop.org
GNU Lesser General Public License v3.0
3.77k stars 891 forks source link

Are there any plans for http2's deep packet inspection function? #2077

Open hanyoungho opened 1 year ago

hanyoungho commented 1 year ago

nDPI is capable of parsing http host, uri, user-agent, etc. for the http protocol and receiving that information, but as far as I know there is no parser for HTTP 2.0.

Do you have any plans for parsing HTTP 2.0? Of course, I know that HTTP 2.0 is encrypted traffic, but when decrypted traffic comes in, a simple parsing task seems be needed like wireshark does.

In particular, Chatgpt is a hot topic these days, and since most chatbot-ai, including chatgpt, are based on HTTP2, it seems quite necessary.

utoni commented 1 year ago

I think, please correct me if I am wrong, HTTP2 is already dying while it was never really alive.

Although it makes sense to dissect HTTP2, most of the ChatGPT traffic that I am observing uses TLS. So even with HTTP2 dissection, you'll won't get much more information IMHO.

IvanNardi commented 1 year ago

AFAIK, no, there are no plans.

A patch to detect (un-encrypted) HTTP/2 has just been pushed.

While HTTP/2 is one of the most used used protocols on "internet" (basically all the HTTP traffic over TCP is HTTP/2, see https://radar.cloudflare.com/traffic), it is pretty much always encrypted, i.e. transported over TLS. You can see plaintext HTTP/2 only on 3 cases, AFAIK: ) some (very, very) uncommon applications/apps use it without TLS ) in a 5G core network ) if you* are the man-in-the-middle (example: proxies), i.e. you have access to the plaintext data In my opinion, these are quite uncommon scenarios.

Furthermore, HTTP/2 is a binary protocol and extracting metadata likely requires some third-party library.

So, I think that to support HTTP/2 metadata extraction in nDPI we need some very interested party willing to help with the task.

[same considerations apply to HTTP/3]

utoni commented 1 year ago

Why is there such a big gap between https://w3techs.com/technologies/details/ce-http2 and https://radar.cloudflare.com/traffic? Do I miss something? A quick (and very subjective) observation of my currently opened tabs is more closer to the percentage of w3techs.com. Btw the web version of ChatGPT uses HTTP/3 for most of the traffic. Seems like some CDN content gets delivered via HTTP/2.

IvanNardi commented 1 year ago

Why is there such a big gap between https://w3techs.com/technologies/details/ce-http2 and https://radar.cloudflare.com/traffic? Do I miss something?

From the first site: ~35%. From the second: 60.9% of the HTTP traffic -> 60.9 of ~50% [average value from the graph "Internet traffic trends" at the beginning of the page] -> ~30%. I think that we can consider these two values pretty much equal, even if these numbers are quite rough.

If I should guess, if your network and browser support QUIC most of the traffic is via QUIC-HTTP/3 (surely if the server is behind cloudflare/akami or google/meta). No sure about other CDNs; Netflix for sure still uses TCP.

Bottom line: if the (HTTP) connection is not QUIC, is it likely HTTP/2