massimocandela / geofeed-finder

Utility to find geofeed files linked from rpsl.
BSD 3-Clause "New" or "Revised" License
76 stars 7 forks source link

Request timed out while fetching Whois data #32

Closed sid6mathur closed 9 months ago

sid6mathur commented 9 months ago

I am testing the latest release on an AWS US region host, and it's frequently showing an "Error: Request timed out" in the WHOIS download stage.

[ripe] Downloading whois data
[afrinic] Downloading whois data
[apnic] Downloading whois data
[afrinic] Parsing whois data: inetnum,inet6num
[ripe] Parsing whois data: inetnum,inet6num
Error: Request timed out
    at RedirectableRequest.<anonymous> (/snapshot/geofeed-finder/node_modules/nodejs-file-downloader/makeRequest.js:31:37)
    at RedirectableRequest.emit (node:events:537:28)
    at Timeout.<anonymous> (/snapshot/geofeed-finder/node_modules/follow-redirects/index.js:209:12)
    at listOnTimeout (node:internal/timers:564:17)
    at process.processTimers (node:internal/timers:507:7) {
  code: 'ERR_REQUEST_TIMEDOUT'
}
[arin] Downloading stat file
[lacnic-rir] Downloading whois data
[arin] Fetching sub allocations ipv4
[lacnic-rir] Parsing whois data: inetnum
[arin] It was not possible to download precompiled sub allocations ipv4, I will try to compile them from whois instead
[lacnic-rir] Using cached whois data
[lacnic-rir] Parsing whois data: inet6num

Note: The compile command for the ARM64 binary in use is, please upstream this into build.sh: ./node_modules/.bin/pkg ./package.json --targets node18-linux-arm64 --output bin/geofeed-finder-linux-arm64 --loglevel=error

sid6mathur commented 9 months ago

With the latest v1.12.1 , the error is no longer visible in my first test. The entire run finished in 6h3m, so the build isn't timing out. Thanks for the parallelization!

[Container] 2024/01/10 01:54:03.675376 Running command ./bin/geofeed-finder-linux-arm64 -k -o auto-geofeed-latest.csv
[ripe] Downloading whois data
[afrinic] Downloading whois data
[apnic] Downloading whois data
[afrinic] Parsing whois data: inetnum,inet6num
[ripe] Parsing whois data: inetnum,inet6num
[apnic] Parsing whois data
[arin] Downloading stat file
[lacnic-rir] Downloading whois data
[arin] Fetching sub allocations ipv4
[arin] It was not possible to download precompiled sub allocations ipv4, I will try to compile them from whois instead
[lacnic-rir] Parsing whois data: inetnum
[lacnic-rir] Using cached whois data
[lacnic-rir] Parsing whois data: inet6num
[arin] Fetching sub allocations ipv6
[arin] It was not possible to download precompiled sub allocations ipv6, I will try to compile them from whois instead
[arin] Fetching NetRanges
[arin] Using cached whois data: inet6num
https://www.in-berlin.de/geofeed.csv [download]
https://geofeed.bahnhof.net/geofeed.csv [download]
sid6mathur commented 9 months ago

Thank you :)

sid6mathur commented 9 months ago

The second run with v1.12.1 yields a "Connect timeout" to what looks like APNIC's server at IP 203.119.102.40 Port 80. I am running on an AWS Ohio build server, so the RTT is likely high to Melbourne.

[Container] 2024/01/10 08:07:13.181409 Running command ./bin/geofeed-finder-linux-arm64 -k -o auto-geofeed-latest.csv
[ripe] Downloading whois data
[afrinic] Downloading whois data
[apnic] Downloading whois data
[afrinic] Parsing whois data: inetnum,inet6num
[ripe] Parsing whois data: inetnum,inet6num
Error: connect ETIMEDOUT 203.119.102.40:80
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1237:16) {
  errno: -110,
  code: 'ETIMEDOUT',
  syscall: 'connect',
  address: '203.119.102.40',
  port: 80
}
[arin] Downloading stat file
[lacnic-rir] Downloading whois data
[arin] Fetching sub allocations ipv4
[arin] It was not possible to download precompiled sub allocations ipv4, I will try to compile them from whois instead
[lacnic-rir] Parsing whois data: inetnum
[lacnic-rir] Using cached whois data
[lacnic-rir] Parsing whois data: inet6num
massimocandela commented 9 months ago

The latest version was released to fix performance issues. It uses resumable cache to avoid hammering the RIRs.

I do not think your problem is a matter of high RTT, because the timeout at the moment is 30 seconds. The various RIRs have implemented policies to avoid hammering their server.

Do not wipe the cache at every run and proceed in running the script in crontab once per day. Any corrupted cache is auto cleaned, the cache is resumable, any error will be solved in the next run. The first run is the slowest one (empty cache).