massimocandela / geofeed-finder

Utility to find geofeed files linked from rpsl.
BSD 3-Clause "New" or "Revised" License
76 stars 7 forks source link

geofeed-finder is taking too long to process 100K prefixes #24

Closed sgteq closed 1 year ago

sgteq commented 1 year ago

Somebody published 64K prefixes in a single feed.

sgteq@desk:~/geofeed$ ./geofeed-finder-linux-x64-1.6.6 -t 2407:d340:7310:7ab2::/64 | wc -l
65555

While that's lame, it pushed the total number of prefixes geofeed-finder has to process to ~105K so it's possible the total number of valid prefixes will reach 100K in 1-2 years anyway.

I don't know how long it takes geofeed-finder to process 100K prefixes as my script killed it one hour. It looks like the code that takes too long is at line 225 of finder.js. Its complexity is O(n * n / 2) where n is the number of input prefixes. It needs to be replaced to handle the growth. I don't mean urgently but eventually even if a workaround is implemented for that silly feed above.

To reproduce the slowness run with -i apnic.

sgteq commented 1 year ago

Checking with the RFC I don't think less specific prefixes should be considered invalid. Quote:

Multiple entries that constitute nested prefixes are permitted. Consumers SHOULD consider the entry with the longest matching prefix (i.e., the "most specific") to be the best matching entry for a given IP address.

Just filter out duplicate prefixes:

Duplicate IP address or prefix entries MUST be considered an error

massimocandela commented 1 year ago

I don't know how long it takes geofeed-finder to process 100K prefixes as my script killed it one hour. It looks like the code that takes too long is at line 225 of finder.js. Its complexity is O(n * n / 2) where n is the number of input prefixes. It needs to be replaced to handle the growth. I don't mean urgently but eventually even if a workaround is implemented for that silly feed above.

I'm saw that too, I have been working on this. This was born as a proof of concept and now it is used in production in many places. I will release a new version with performance improvements asap.

massimocandela commented 1 year ago

Checking with the RFC I don't think less specific prefixes should be considered invalid.

They have never been considered invalid.

Just filter out duplicate prefixes:

Duplicates were already removed and it is also done in the new release. However, you cannot just remove duplicates, the selection must be based on the most specific parent inetnum (independently from the prefix length in the geofeed file). Not doing that makes useless the ownership validation.

When fetching, the most specific inetnum: object with a geofeed reference MUST be used.

Example:

inetnum: 193.56.0.0/16 geofeed: 193.56.0.0/17 result: 193.56.0.0/17 in the output
inetnum: 193.56.0.0/18 geofeed: 193.56.32.0/24 result: 193.56.32.0/24 in the output
inetnum: 193.56.0.0/16 geofeed: 193.56.32.126/32 result: 193.56.32.126/32 not in the output

Duplicate IP address or prefix entries MUST be considered an error

This is not part of the RFC. Duplicates may exists in the input due to the hierarchical structure of inetnums and the flexible granularity of geofeed files, and that's ok. Duplicates will not exists in the output.

The performance issue has been solved in v1.7.0 Please, upgrade your geofeed-finder.