lord-alfred / ipranges

🔨 List all IP ranges from: Google (Cloud & GoogleBot), Bing (Bingbot), Amazon (AWS), Microsoft, Oracle (Cloud), GitHub, Facebook (Meta), OpenAI (GPTBot) and other with daily updates.
https://t.me/Lord_Alfred
Creative Commons Zero v1.0 Universal
619 stars 102 forks source link

How to reduce the number of range to optimise the search of an IP #2

Open jmleglise opened 2 years ago

jmleglise commented 2 years ago

Hi, First , thank you very much for your list. Very usefull ! (not an issue but a comment for all of us who would like to optimise the search of an IP belonging to a range)

Let's take this 2 ranges for example in your file /amazon/ipv4_merged.txt R1 : 3.0.0.0/15 and R2: 3.2.0.0/24. This 2 ranges are contiguous and should be merged. (But, is not possible in the CIDR notation)

"/15" means 2^(32-15) = 131072 addresses I convert IP in décimal value :
R1 : 3.0.0.0/15 Starts at 3x256x256x256+0x256x256+0x256+0 and finishes to this number + 131072 R2 starts at : 3x256x256x256+2x256x256+0x256+0

beginning(R1) + lenght (R1) = beginning (R2)

So the 2 ranges are contiguous and should be merge in a new ranges from beginning(R1) to End(R2). That the reason, I prefer store the IP range in decimal with a start and an end.

There are 3200 ranges in the full merged list. And 900 are contiguous.

lord-alfred commented 2 years ago

Thank you very much for your comment, it is an interesting observation. I'm not good at CIDR notation and addresses, so this problem is not obvious to me.

Let's be clear just in case: I use a small script with the netaddr library and method cidr_merge to merge addresses. Perhaps I should use method spanning_cidr, but wouldn't that break something else? I found a similar issue in the library: https://github.com/netaddr/netaddr/issues/27

Example:

In [1]: import netaddr

In [2]: netaddr.cidr_merge(['3.0.0.0/15', '3.2.0.0/24'])
Out[2]: [IPNetwork('3.0.0.0/15'), IPNetwork('3.2.0.0/24')]

In [3]: netaddr.spanning_cidr(['3.0.0.0/15', '3.2.0.0/24'])
Out[3]: IPNetwork('3.0.0.0/14')

Do you think you can improve the merging process in the script?