Add extra entropy checks and more precise(?) analysis.

ntop / nDPI

Open Source Deep Packet Inspection Software Toolkit

http://www.ntop.org

GNU Lesser General Public License v3.0

3.86k stars 902 forks source link

Add extra entropy checks and more precise(?) analysis. #2383

Closed utoni closed 6 months ago

utoni commented 7 months ago

Please sign (check) the below before submitting the Pull Request:

[x] I have signed the ntop Contributor License Agreement at https://github.com/ntop/legal/blob/main/individual-contributor-licence-agreement.md
[x] I have read the contributing guide lines at https://github.com/ntop/nDPI/blob/dev/CONTRIBUTING.md
[x] I have updated the documentation (in doc/) to reflect the changes made (if applicable)

Describe changes:

This is more an idea on how entropy based categorization could give more details about the transmitted data. It's losely based on the Entropy Analysis paper, but needs some verification. Hopefully, someone may find this useful and may help me with it. :) (not yet done reading the paper)

Also something to consider is if entropy calculation should be done per-packet instead per-flow..

IvanNardi commented 7 months ago

@utoni, do you have a copy of the original paper?

utoni commented 7 months ago

Unfortunately not. :/

utoni commented 7 months ago

icmp echo request @IvanNardi :)

lucaderi commented 7 months ago

@utoni I am a bit sceptical about this PR. Entropy is a metric to measure chaos, and within specific boundaries you can find many different contents. So ndpi_entropy2str() for instance can IMHO be used as a hint but not for ground truth. So if you position it as hint I am happy, if you want to do more than that I am not convinced it's a good idea

utoni commented 7 months ago

@lucaderi I agree, there is still a high chance of false positives e.g. for video/audio/voip transfers as they may have a similar entropy as (compressed) executables. What do you mean by "hint"? Not setting any risk and do what instead?

lucaderi commented 7 months ago

I mean that "Compressed Executable" is not only this, but it's a possibility (or hint if you wish). So a broader set of possibilities (e.g. "Compressed Executable. or something else" or "Compressed Executable ?") can indicate that this is a hint and not a fact true 100%. More or less ad DPI confidence that @IvanNardi introduced in DPi classification some time ago.

utoni commented 7 months ago

Ok, got it.

IvanNardi commented 6 months ago

@utoni, are you going to push a new version with updated labels/strings?

utoni commented 6 months ago

Yea, ASAP :)