ntop / nDPI

Open Source Deep Packet Inspection Software Toolkit
http://www.ntop.org
GNU Lesser General Public License v3.0
3.81k stars 893 forks source link

QUIC connection migration #1041

Open IvanNardi opened 4 years ago

IvanNardi commented 4 years ago

QUIC has an interesting feature called "Connection migration".

Quoting directly https://tools.ietf.org/html/draft-ietf-quic-transport-32, section 9:

The use of a connection ID allows connections to survive changes to endpoint addresses (IP address and port), such as those caused by an endpoint migrating to a new network.

Not all changes of peer address are intentional, or active, migrations. The peer could experience NAT rebinding: a change of address due to a middlebox, usually a NAT, allocating a new outgoing port or even a new outgoing IP address for a flow.

It might be worth investigating if/how QUIC connection migration impacts DPI capabilities.

While all most used QUIC implementations support at least some kind of migration, I am not sure if/how this feature is really used in production (exception: Facebook. They are surely not using connection migration right now)

Example: in the attached pcap quic_migration.zip (NAT rebinding; decryption keys are embedded) there is only one QUIC connection, but ndpiReader reports 3 flows:

aouinizied commented 3 years ago

@IvanNardi interesting feature. If we extract the conn_id on the QUIC dissector side, we can add a small LRU cache as the ones implemented for STUN and Ookla detection. The idea is to have a guess heuristic working as follow:

This is just an idea and we need to check that we are able to extract the conn_id correctly in all cases. Do you confirm that this is possible?

Some limitations of such an approach:

What do you think about it?

Zied

IvanNardi commented 3 years ago

I am very glad that this topic arouse some interest!

Migration is a complex feature in general and particularly from a DPI perspective; the reason is that it has been explicitly designed to be so...

9.5. Privacy Implications of Connection Migration

Using a stable connection ID on multiple network paths would allow a passive observer to correlate activity between those paths. An endpoint that moves between networks might not wish to have their activity correlated by any entity other than their peer, so different connection IDs are used when sending from different local addresses [...]

At any time, endpoints MAY change the Destination Connection ID they transmit with to a value that has not been used on another path.[...]

An endpoint MUST NOT reuse a connection ID when sending from more than one local address, for example when initiating connection migration [...]

Similarly, an endpoint MUST NOT reuse a connection ID when sending to more than one destination address.

Your idea to save the CIDs (and, perhaps, the -destination- addresses) in a global cache might work in case of a passive migration, i.e NAT rebinding, where only client ip/port changes but the client itself keeps using the same CID. I am not familiar with nDPI internals, but using a global cache shared among all processing threads should be a standard pattern and would allow to match flows from different threads/RSS queues. I am not sure if this approach would work with active (i.e. "real" migrations), where client/server willingly switch to a new address/port/CID at any time during the lifetime of the connection.

As I said, all the migration stuff, even if (almost) defined in the standards, is not really used yet from any big players (AFAIK) and so it is difficult to foresee how it will be really used in real networks. It would be unfortunate to spend time defining/implementing some solutions for some kinds of migration that will never be used...

aouinizied commented 3 years ago

@IvanNardi I understand that it's too early to spend time on implementing a strategy to counter that. At least it is traced in this issue and thanks for that.

Here some observations regarding caching strategy and some limitations we may face:

Here some possible directions:

IvanNardi commented 3 years ago

@aouinizied, I don't understand fully the load balancing problem.

Let me describe what I think it is the most common scenario: one external LB, only one process using nDPI, only one nDPI context/module and multiple threads. In this case:

AFAIK nDPI doesn't have any feature to allow sharing information among different nDPI contexts (in the same process, or in different processes). If you need multiple processes or multiple contexts, you have a bigger and widespread problem to solve than QUIC migration...

Why are you talking about CID extraction in load balancer? What am I missing? Do you want to have a LB QUIC aware, i.e. balancing someway per-CID? You might get it (CIDs can definitely be used for routing purposes and they are already used in such way....), but you are moving the problem from nDPI to the (external) LB: I am not sure if that is a right move and I think that it is not necessary, but I'd like to hear other opinions...

aouinizied commented 3 years ago

@IvanNardi I was talking about CID extraction in LB indeed for QUIC aware LB approach. I didn't point it as the move to take but as how ntop deals with such specific protocols until now. LB is aware of a specific protocol field (let's say QUIC CID) and include it in the flow hashing function. So indeed, it is not handled within nDPI space.

What you described as scenario is the same I'm thinking about. But things are not going to be simple:

IMHO, the global cache is a good move to take but will need dev efforts. If we are moving in this direction, it will be also good to move Ookla and STUN lru caches in this global cache too in order to have a consistent logic.

IvanNardi commented 3 years ago
* you must keep in mind that nDPI detection module is lockless and consequently not thread safe as pointed [here](https://github.com/ntop/nDPI/issues/34)

Aargh!! I completely missed that point... I was pretty sure that the detection module was thread safe and shared among multiple threads... I should have done my homework better, sorry. Having some shared structures might greatly improve DPI capabilities in general, but that is quite a complex topic on its own, as you noticed.

I agree that you can avoid sharing some data with some clever load balancing policies, but I don't think that this is always feasible.

AFAIK, you can't have a --generic-- and --stateless-- load balancer that balances QUIC connections using (also) CIDs. There are two main reasons for that: 1) each peers may change the CID used, at any time, even without changing ip address and/or port 2) while Destination CID field is always present in each QUIC packet, its length is not reported in the packets themselves (it is not a fixed length field, like udp port or gtp teid): you must know it from tracking the entire session (and most of the messages reporting it are encrypted...); zero is a valid length. So you can't easily use the CID as a field in the hashing key. Note that this is a big problem regardless of the LB stuff