ntop / nDPI

Open Source Deep Packet Inspection Software Toolkit
http://www.ntop.org
GNU Lesser General Public License v3.0
3.86k stars 902 forks source link

FPC LRU cache related changes #2497

Closed mmanoj closed 4 months ago

mmanoj commented 4 months ago

Please sign (check) the below before submitting the Pull Request:

Link to the related issue:

Describe changes: changes related to https://github.com/ntop/nDPI/issues/2322 item# 2 2.1.Create a new LRU cache where the key is the pair "Client_IP - Resolved_IP" and the value is a protocol id: 2.2 For each flow (at the very first packet) lookup into the cache using "SRC IP - DST IP" as key and, if a match is found, save the protocol id into ndpi_flow_struct structure as a "fpc result"

mmanoj commented 4 months ago

@mmanoj, some things to improve; this is only a preliminary review. The most obvious missing piece is the cache initialization! You must add it in ndpi_finalize_initialization Please, be sure that the code compile (at least on your machine), before pushing it

All changes done and compiled in local env with latest dev branch.

mmanoj commented 4 months ago

The general idea of the cache is:

* add an entry in the cache with DNS traffic. In the DNS code, if `ndpi_match_host_subprotocol()` return a protocol different than `NDPI_PROTOCOL_UNKNOWN`, we should do somethig like `ndpi_lru_add_to_cache(ndpi_str->fpc_dns_cache,  key, ret.app_protocol, ndpi_get_current_time(flow));`.
  The key is something like SRC_FLOW_IP + ip_required_in_the_dns_query

* perform a lookup in `ndpi_internal_detection_process_packet` just after `fpc_check_ip()` call. If there is an entry, call something like `fpc_update(ndpi_str, flow, NDPI_PROTOCOL_UNKNOWN, protocol_returned_from_the_cache, NDPI_FPC_CONFIDENCE_DNS)`

Any specific doubts?

can you explain bit more about this? SRC_FLOW_IP + ip_required_in_the_dns_query is this flow source IP and DNS response IP (like A or AAA address?)

IvanNardi commented 4 months ago

can you explain bit more about this? SRC_FLOW_IP + ip_required_in_the_dns_query is this flow source IP and DNS response IP (like A or AAA address?)

That was the idea, but I was wrong (because of ipv4 and ipv6 possible mismatch) We should use only one address as key.

Anyway, let try together to analyze what we should do step by step. In this simple trace there is a DNS flow and a TCP flow: fpc_dns.zip

1) We should take a look at every DNS A or AAAA replies. If the domain required provides a sub-classification, i.e. if ndpi_match_host_subprotocol() in DNS code returns a protocol different than NDPI_PROTOCOL_UNKNOWN, we should save that information into the cache. I think we can always add only one entry, even if there are multiple answers in the DNS reply *) Packet 2 is a DNS AAAA reply with Youtube as subclassification -> add an entry with key "2a00:1450:4002:416::200e" (i.e. the first answer) and protocol NDPI_PROTOCOL_YOUTUBE

2) At the very first pkt of every flow, we should lookup into the cache using the server address (i.e. flow->s_addr), in ndpi_internal_detection_process_packet just after fpc_check_ip() call. If a match is found, we should call fpc_update(ndpi_str, flow, NDPI_PROTOCOL_UNKNOWN, protocol_returned_from_the_cache, NDPI_FPC_CONFIDENCE_DNS) *) At packet 3 we should lookup using "2a00:1450:4002:416::200e" as key (the server address). The returned protocol is NDPI_PROTOCOL_YOUTUBE, so the FPC of this flow is Youtube with confidence "dns"

If everything is implemented correctly we should got something like:

./example/ndpiReader -t -i ~/fpc_dns.pcapng -v2

    1   TCP [2001:b07:a3d:c112:9326:58a1:68f7:5712]:40088 <-> [2a00:1450:4002:410::200e]:443 ... [FPC: 124/Youtube, Confidence: DNS]

Am I missing something?

mmanoj commented 4 months ago

can you explain bit more about this? SRC_FLOW_IP + ip_required_in_the_dns_query is this flow source IP and DNS response IP (like A or AAA address?)

That was the idea, but I was wrong (because of ipv4 and ipv6 possible mismatch) We should use only one address as key.

Anyway, let try together to analyze what we should do step by step. In this simple trace there is a DNS flow and a TCP flow: fpc_dns.zip

1. We should take a look at every DNS A or AAAA replies. If the domain required provides a sub-classification, i.e. if `ndpi_match_host_subprotocol()` in DNS code returns a protocol different than `NDPI_PROTOCOL_UNKNOWN`, we should save that information into the cache. I think we can always add only one entry, even if there are multiple answers in the DNS reply
   *) Packet 2 is a DNS AAAA reply with Youtube as subclassification -> add an entry with key "2a00:1450:4002:416::200e" (i.e. the first answer) and protocol `NDPI_PROTOCOL_YOUTUBE`

2. At the very first pkt of every flow, we should lookup into the cache using the server address (i.e. `flow->s_addr`), in `ndpi_internal_detection_process_packet` just after `fpc_check_ip()` call. If a match is found, we should call `fpc_update(ndpi_str, flow, NDPI_PROTOCOL_UNKNOWN, protocol_returned_from_the_cache, NDPI_FPC_CONFIDENCE_DNS)`
   *) At packet 3 we should lookup using "2a00:1450:4002:416::200e"  as key (the server address). The returned protocol is `NDPI_PROTOCOL_YOUTUBE`, so the FPC of this flow is Youtube with confidence "dns"

If everything is implemented correctly we should got something like:

./example/ndpiReader -t -i ~/fpc_dns.pcapng -v2

  1   TCP [2001:b07:a3d:c112:9326:58a1:68f7:5712]:40088 <-> [2a00:1450:4002:410::200e]:443 ... [FPC: 124/Youtube, Confidence: DNS]

Am I missing something?

@IvanNardi

Thanks for detailed clarification,let me go through the attached dns packet capture and described logic flow with ongoing FPC implementation and update you any clarifications required.Thank you very much for quick support and all valuable advice.Really appreciate this advice to narrow down the implementation.

mmanoj commented 4 months ago

can you explain bit more about this? SRC_FLOW_IP + ip_required_in_the_dns_query is this flow source IP and DNS response IP (like A or AAA address?)

That was the idea, but I was wrong (because of ipv4 and ipv6 possible mismatch) We should use only one address as key. Anyway, let try together to analyze what we should do step by step. In this simple trace there is a DNS flow and a TCP flow: fpc_dns.zip

1. We should take a look at every DNS A or AAAA replies. If the domain required provides a sub-classification, i.e. if `ndpi_match_host_subprotocol()` in DNS code returns a protocol different than `NDPI_PROTOCOL_UNKNOWN`, we should save that information into the cache. I think we can always add only one entry, even if there are multiple answers in the DNS reply
   *) Packet 2 is a DNS AAAA reply with Youtube as subclassification -> add an entry with key "2a00:1450:4002:416::200e" (i.e. the first answer) and protocol `NDPI_PROTOCOL_YOUTUBE`

2. At the very first pkt of every flow, we should lookup into the cache using the server address (i.e. `flow->s_addr`), in `ndpi_internal_detection_process_packet` just after `fpc_check_ip()` call. If a match is found, we should call `fpc_update(ndpi_str, flow, NDPI_PROTOCOL_UNKNOWN, protocol_returned_from_the_cache, NDPI_FPC_CONFIDENCE_DNS)`
   *) At packet 3 we should lookup using "2a00:1450:4002:416::200e"  as key (the server address). The returned protocol is `NDPI_PROTOCOL_YOUTUBE`, so the FPC of this flow is Youtube with confidence "dns"

If everything is implemented correctly we should got something like:

./example/ndpiReader -t -i ~/fpc_dns.pcapng -v2

    1   TCP [2001:b07:a3d:c112:9326:58a1:68f7:5712]:40088 <-> [2a00:1450:4002:410::200e]:443 ... [FPC: 124/Youtube, Confidence: DNS]

Am I missing something?

@IvanNardi

Thanks for detailed clarification,let me go through the attached dns packet capture and described logic flow with ongoing FPC implementation and update you any clarifications required.Thank you very much for quick support and all valuable advice.Really appreciate this advice to narrow down the implementation.

@IvanNardi

The logic seems fine, let me implement and perform the initial test and update you for any clarifications required.

mmanoj commented 4 months ago

2a00:1450:4002:416::200e @IvanNardi

I almost implement the logic from dns.c side and ndpi_main.c, still need to fix some the key generation logic with IP address as it's not working as expected.However hard coded key it's working fine.I will perform more test and share the code.

mmanoj commented 4 months ago

2a00:1450:4002:416::200e @IvanNardi

I almost implement the logic from dns.c side and ndpi_main.c, still need to fix some the key generation logic with IP address as it's not working as expected.However hard coded key it's working fine.I will perform more test and share the code.

@IvanNardi Key generation issue fixed

1   TCP 192.168.1.8:48516 <-> 157.240.235.60:443 [proto: 91.142/TLS.WhatsApp][IP: 142/WhatsApp][Encrypted][Confidence: DPI][**FPC: 242/WhatsAppFiles, Confidence: DNS**][DPI packets: 6][cat: Chat/9][179 pkts/21880 bytes <-> 178 pkts/155403 bytes][Goodput ratio: 46/92][12.09 sec][Hostname/SNI: web.whatsapp.com][(Advertised) ALPNs: h2;http/1.1][TLS Supported Versions: TLSv1.3;TLSv1.2][bytes ratio: -0.753 (Download)][IAT c2s/s2c min/avg/max/stddev: 0/0 43/11 2752/306 264/38][Pkt Len c2s/s2c min/avg/max/stddev: 66/66 122/873 1103/1446 143/629][TLSv1.3][JA3C: a195b9c006fcb23ab9a2343b0871e362][JA4: t13d1715h2_5b57614c22b0_7121afd63204][JA3S: fcb2d4d0991292272fcb1e464eedfd43][ECH: version 0xfe0d][Firefox][Cipher: TLS_AES_128_GCM_SHA256][Plen Bins: 1,26,10,2,0,3,6,0,0,0,3,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,44,0,0,0,0]
mmanoj commented 4 months ago

@IvanNardi

All changes updated with test pcap file also.Please review and advice if any changes required.

./ndpiReader -t -i fpc_dns.pcap -v2

1 TCP 192.168.1.8:39432 <-> 142.250.199.4:443 [proto: 91/TLS][IP: 126/Google][Encrypted][Confidence: DPI][FPC: 126/Google, Confidence: DNS][DPI packets: 3][cat: Web/5][405 pkts/47359 bytes <-> 742 pkts/962242 bytes][Goodput ratio: 44/95][4.54 sec][bytes ratio: -0.906 (Download)][IAT c2s/s2c min/avg/max/stddev: 0/0 12/5 1094/1181 74/52][Pkt Len c2s/s2c min/avg/max/stddev: 66/66 117/1297 1466/1466 208/436][Risk: Probing attempt ][Risk Score: 50][Risk Info: TLS/QUIC Probing][Plen Bins: 3,3,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,91,0,0,0,0]

mmanoj commented 4 months ago

I pushed some small changes and I'll update the commit message. But the code was almost good anyway... Thanks a lot for that and for your perseverance

@IvanNardi

Thanks for the support and I was review the fixes you posted and it's good improvements which I was missed.I'm happy to do some valuable contribution to community and thank again for your advice and guidelines.Really appreciated.

Hope this change will improve the detection accuracy overall as per the test results I can see good detection specially services based on CDN's.

Once This pull request merge I will work on adding more IP lists from some reliable sources specially malware and C&C servers.