stamparm / maltrail

Malicious traffic detection system
MIT License
6.43k stars 1.07k forks source link

[Feature Request] Integrate IPinfo's free database for ASN+country enrichment, filters, and eliminating HTTP calls #19245

Closed abdullahdevrel closed 6 months ago

abdullahdevrel commented 6 months ago

Is your feature request related to a problem? Please describe.

Adding support for adding ASN information to Maltrai. The filter mechanism currently does not support country and ASN-based filters, which can be added by integrating IPinfo's IP to Country ASN database. Additionally, this will bring a handful of additional features.

Describe the solution you'd like

ASN enrichment

Requesting integrating IPinfo's free IP into the Country ASN database into Maltrail. Currently, Maltrail maintains a list called "worst_asn.txt". Source-level ASN enrichment can be easily achieved using IPinfo's IP to Country ASN database. The database provides both the country of the IP address and ASN information in a single database.

🔗 Database schema: https://ipinfo.io/developers/ip-to-country-asn-database

The database is licensed under CC-BY-SA 4.0, which means the dataset permits free usage and can be distributed with the repository. Additionally, the database provides complete accuracy and is updated daily. There is no compromise on accuracy. MMDB database usage example using mmdbctl tool:

$ mmdbctl read 105.71.102.111 country_asn.mmdb
{
  "as_domain": "inwi.ma",
  "as_name": "Wana Corporate",
  "asn": "AS36884",
  "continent": "AF",
  "continent_name": "Africa",
  "country": "MA",
  "country_name": "Morocco"
}

The database can also be used using Python's MMDB reader library.

Eliminate HTTP calls

Currently, location data is retrieved via HTTP calls to RIPE's API. IPinfo's location data is more accurate than RIPE's dataset as it is based on ping data instead of internet public records. Furthermore, a large number of HTTP API calls can slow down the load time. This issue can be easily addressed by switching to the MMDB database.

The MMDB is designed for extremely efficient lookup operations. Getting IP metadata of 100k IPs takes less than a second.

$ head -n 3 ips.txt
220.119.32.154
204.192.172.100
112.8.228.165

$ wc -l ips.txt
100000 ips.txt

$ time mmdbctl read ips.txt country_asn.mmdb -f csv > /dev/null

real    0m0.777s
user    0m0.363s
sys     0m0.013s

Filter operation based on country and ASN data

By enriching the IP addresses with country and ASN information, a filter mechanism can be easily implemented. The IP addresses rely on obtaining country information via HTTP calls to RIPE Stat.

Describe alternatives you've considered

I have not considered any alternatives. If you want to do ASN and country based analytics, the only option is to extract the IPs and do a separate enrichment outside of Maltrail. But that results in missing out on the information provided in Maltrail.

Additional context

I am the DevRel at IPinfo. Let me know what you think. If you need to explore the database, let me know.

stamparm commented 6 months ago

Furthermore, a large number of HTTP API calls can slow down the load time. This issue can be easily addressed by switching to the MMDB database. <- there is no large number of HTTP API Calls. only when user hovers over the IP, the call is being made.

AFAICS, your proposition would be a huge performance hit on the server side (as all IPs would require an IP->ASN call), while the basic premise of the Maltrail's client was that all the post-processing of sensor data should be offloaded to the client (to relieve the server machine which is usually running the sensor in parallel)

abdullahdevrel commented 6 months ago

Thank you for reviewing the request, @stamparm. I have a very basic understanding of Maltrail and have only been using it on a single server, which serves as both a sensor and a server on the same machine. I tried to install it on another machine, but the process keeps getting killed because it is a micro VM (low ram, process getting killed). My apologies if I did not conduct thorough research beforehand.

there is no large number of HTTP API Calls. only when user hovers over the IP, the call is being made.

I believe at the initial load, all the country-level information for the flags is being provided from https://stat.ripe.net/data/geoloc/data.json

image

image

Then, subsequent calls are made to the https://stat.ripe.net API endpoint as well upon hover. The information returned is WHOIS data.

At least for the onload call to get the country information, we could obtainthe information from the IPinfo database, as the database provides both ASN and country information.

while the basic premise of the Maltrail's client was that all the post-processing of sensor data should be offloaded to the client (to relieve the server machine which is usually running the sensor in parallel)

If you implement the IPinfo database, you will be able to eliminate the need to get ASN and country-level information. The sensor is downloading the threat intel feeds on their side. Is it possible to download the MMDB file on the sensor end and provide the data from there instead?

My recommendation is that by using the IP to Country ASN database, you are extending country-level identification with ASN information in one single database. Maltrail will not require calls to the RIPE Stats endpoint on load to get both country and ASN level information from the local IPinfo DB.

Please let me know what you think.

stamparm commented 6 months ago

1) those IP to country calls are made for the current page of results, that's true. nonetheless, they are quite fast (i believe that you noticed) because RIPE is providing such a service for everybody 2) golden rule of engineering is if it works, don't mingle. honestly, i don't see any improvement by downloading a 3rd party DB to the maltrail server - also, it would introduce the pre-processing of log entries, which is not inline with the "fat client" story i gave in previous comment

abdullahdevrel commented 6 months ago

@stamparm, I really appreciate your thoughts. I thought the data would provide a better user experience. I will close the ticket.

If in the future you want to explore IPinfo's free data for this project or any other, please reach out to me or reopen the ticket. I will be extremely happy to help and be a part of the project. Thank you very much for reviewing the proposal.