opnsense / core

OPNsense GUI, API and systems backend
https://opnsense.org/
BSD 2-Clause "Simplified" License
3.38k stars 759 forks source link

[Feature Request] Integrating IPinfo's free IP to Country ASN database #7779

Open abdullahdevrel opened 3 months ago

abdullahdevrel commented 3 months ago

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Is your feature request related to a problem? Please describe.

Using the IPinfo IP to Country ASN or IP to Country database will address several problems with the current IP geolocation implementation:

Describe the solution you like

I am requesting to add support for IPinfo's IP to Country database to the project. The database has the following features:

Database schema

Field Name Example Data Type Description
start_ip 1.0.16.0 TEXT Starting IP address of an IP address range
end_ip 1.0.31.255 TEXT Ending IP address of an IP address range
country JP TEXT ISO 3166 country code of the location
country_name Japan TEXT Name of the country
continent AS TEXT Continent code of the country
continent_name Asia TEXT Name of the continent
asn AS2519 TEXT Autonomous System Number
as_name ARTERIA Networks Corporation TEXT Name of the AS (Autonomous System) organization
as_domain arteria-net.com TEXT Official domain or website of the AS organization

Documentation: https://ipinfo.io/developers/ip-to-country-asn-database

Samples are available here: https://github.com/ipinfo/sample-database/tree/main/IP%20to%20Country%20ASN

The database can be downloaded simply by accessing the storage URI with an access token.

curl -L https://ipinfo.io/data/free/country_asn.mmdb?token=<YOUR_TOKEN> -o country_asn.mmdb

Describe alternatives you considered

A clear and concise description of any alternative solutions or features you considered.

I have not considered an alternative.

Additional context

The business version of OpnSense includes a paid version of the GeoIP country database. However, even though IPinfo's IP to Country database is free, it is the best country-level data available out there because the data source itself is based on latency and networking data-based methodology instead of self-reported locations of ASNs/ISPs.

https://ipinfo.io/accuracy

Additionally, there is no range clustering or delayed updates with IPinfo. IPinfo does not have an accuracy-compromised free country or city database. This database can be considered for the business variant of the software as well and license is permissive to commercial usage.

abdullahdevrel commented 3 months ago

A Reddit post by redspidr demonstrated the idea of introducing IPinfo's data into Opnsense. Their project converts IP addresses to links to their IPinfo page links, which provide detailed metadata on those IP addresses.

Inspired by it, I have requested the Reddit Opnsense community to review this ticket and recommend bringing our data to Opnsense with the integration of our free IP database first. The current issue does not explore the project demonstrated by the Reddit community user redspidr but a native under-the-hood integration using our database.

A couple of issues were raised by one of the Reddit community users regarding this request, so I am pasting my answers here.

Inclusion of your data rather than the existing provider in the business version?

The free IP database that we have is the best possible variant and is equal if not better than the paid version of the country database the business version of OpnSense is currently using. Consequently, the database is certainly better than the existing free version of the IP database the community version the project uses.

This poses a challenge: how does the Opnsense community benefit most from which action?

  1. Option 1 (My preference): Replacement of both the free version database from the community version of the project as well as the paid version of the database from the business version with the single IPinfo free IP to Country ASN or IP to Country database. This not only will provide a unified experience to community and business users with better data, but will also reduce business costs associated with licensing the paid database. Our database is licensed under CC-BY-SA 4.0, which is a commercial permissive license and allows distribution. We do not have complex EULA agreements, you can easily use the database. This option is my preference.
  2. Option 2: Replacing the existing database with our free database from the community version only. The community version will have better data than the business version if the project maintainers accept this proposal. The database is designed to support open source projects primarily.
  3. Option 3: Adding additional support to the database to the existing database. This will support bringing your own database from us and the existing provider, giving users the option to choose.

Documentation on how to add your data to the free version in parallel to the existing docs?

Considering the previous topic, I am not sure what option would be considered by the project maintainers. If they want to replace their existing database provider they can do that or they can integrate our database in parallel to the existing database.

In terms of documentation, there are slight modifications involved.

  1. Both are MMDB databases, so parsing the database should not be an issue.
  2. The existing database uses country-level data which we also provide here. However, there are differences in the database schema. We use a flat and tabular data structure while the existing database uses a nested database provider.
  3. Update frequency. Our database is updated every day, so we appreciate the database to be updated frequently. The existing database is not updated daily.
  4. Packaging decision. Because our database has more lenient licensing terms, community version users do not have to download their own database or bring their own access token. The project can use their own project-specific access token.

Here is a blog post: https://ipinfo.io/blog/migrating-from-maxmind-to-ipinfo


Please let me know if you have further questions. Thank you very much.

AdSchellevis commented 3 months ago

For us as a core team this isn't a priority, given our business edition does contain simple to use geo aliases out of the box including a documented file format to use for the community version (https://docs.opnsense.org/manual/aliases.html#geoip).

I do understand that your company will prefer your product above one of its competitors, but there's also some marketing involved in claims being made.

Personally I don't have a strong preference for a geoip vendor, but when it's a commercial discussion, our community GitHub might not be the best place.

If someone does want to do the work, and the amount of required guidance is limited, we will assess in the usual way.

abdullahdevrel commented 3 months ago

@AdSchellevis

Thank you for reviewing the request. I sincerely appreciate you taking the time to review the issue.

This was not a commercial request, nor am I trying to sell the Opnsense community a comercial service. I advocated bringing highly accurate data designed for open-source projects in mind. The free database is licensed under CC-BY-SA 4.0

I do understand that your company will prefer your product over one of its competitors, but there's also some marketing involved in the claims being made.

I understand the skepticism involved. However, in terms of accuracy, we can provide verifiable information to back up our claim, even for a free database. If you are interested in verifying our claims for accuracy, please let me know. I can walk you through a self-evaluation process that ensures you and the community personally verifying this information.

Personally, I don't have a strong preference for a geoip vendor, but when it's a commercial discussion, our community GitHub might not be the best place.

No, I am not making any form of commercial discussion at all. The proposal was the integration of a free database. There was absolutely no hint of a commercial service. My apologies if I have indicated otherwise. I have tried my best to understand the issues and motivation for selecting the geoip database, and I have seen a ticket where you have mentioned that the software offers the paid IO database through the business version.

However, my proposal was to replace even the paid version of the IP database with a free IP database that we can demonstrate can provide better accuracy.

If someone does want to do the work, and the amount of required guidance is limited, we will assess it in the usual way.

Thank you for considering the issue.


My apologies if I was unclear in saying this is not a commercial service. We built this free IP to Country database primarily to support open source projects. I understand Opnsense is a massive project and the changes required to adopt it may be significant. I can assure you that we can demonstrate the value of adopting the free database to the community and the project's customers.

doktornotor commented 3 months ago

Would love to see alternatives to Maxmind as well, unfortunately, definitely do not have the time to do the coding at the moment.

Now, there would be a super-easy and fast way to get this available in OPNsense @abdullahdevrel - "simply" provide the data in the CSV format documented and required for OPNsense. 😉

abdullahdevrel commented 3 months ago

Thanks @doktornotor. I really appreciate that you reviewed the request. This is a significant request, and I understand that it will require engineering commitment to support it. We will let our OPNsense users know that this request is being considered.

Now, there would be a super-easy and fast way to get this available in OPNsense @abdullahdevrel - "simply" provide the data in the CSV format documented and required for OPNsense. 😉

I hope my pitch makes sense when we said our database is simple to use. The current implementation requires 3 CSV files.

image

While we have all this information in a single file in our IP to Country database:

start_ip end_ip country country_name continent continent_name
2 2620:0:1cff:dead:bef1:100:1:1aa 2620:0:1cff:dead:bef1:100:1:1b0 SG Singapore AS Asia
3 212.221.79.153 212.221.79.171 DE Germany EU Europe

https://github.com/ipinfo/sample-database/tree/main/IP%20to%20Country

If anyone wants to use our data for now, they will have to make modifications to the database on their end.

doktornotor commented 3 months ago

First of all, I am NOT an OPNsense developer, merely a random code contributor.

If anyone wants to use our data for now, they will have to make modifications to the database on their end.

Well yes, that is the problem. I have been merely hinting the fastest way to get your GeoIP data used in OPNsense - without any coding being required on the OPNsense part (paste in an URL pointing to ZIP file with the required CSV files, done.)

Using a single CSV file might even be easier and faster to process - if someone does the coding, however that's not a drop-in replacement. Not having the IP ranges in CIDR format being one of the examples why the current code won't work and non-trivial amount of coding is required to support this single-file format.

abdullahdevrel commented 3 months ago

Got it. Thank you.

I am not sure why the project does not use the MMDB file format, which is designed for fast and efficient lookups.

Not having the IP ranges in CIDR format being one of the examples why the current code won't work and non-trivial amount of coding is required to support this single-file format.

We have a tool for that called range2cidr (which also part of our CLI), that can generate the CIDR/range column.

cidr country country_name continent continent_name
1.0.0.0/25 AU Australia OC Oceania
1.0.0.128/26 AU Australia OC Oceania

The issue is that the time to generate the CIDR is a bit slow

time ipinfo range2cidr country.csv > country_cidr.csv

real    9m58.852s
user    0m22.040s
sys     0m44.654s
doktornotor commented 3 months ago

Yeah, a bit. 😉 Same issue with Python, PHP or whatever other code used for the purpose on similar projects. Now, assuming similar HW specs, multiply the wasted CPU time by the user base and one update per day.

As for why MMDB format is not used - the thing is - you are not doing realtime lookups. You simply parse the data once every while and use that parsed data in firewall rules to reject/accept connections. There's no integration for lookups in databases present in the pf firewall code, plus the performance would not exactly rock either I guess.

abdullahdevrel commented 3 months ago

I will try to think of a solution. The challenge is that we probably won't be able to produce the CIDR variant of the free IP database for download because we would then have to account for maintenance of another variant of the same product.

When MM switched from their legacy geoip to a more modern variant, it broke a lot of things. We aimed to remain stable from day 0 onward to avoid such situations. Introducing a CIDR variant of the database could increase the load on us as we would have to maintain it virtually indefinitely.