Closed chowdud closed 3 years ago
@jcran I might have misunderstood the task since it was named as "Scraper task" and thought it was relating to something like web scraping. Just to confirm, you'd rather have the results being returned from an API, using the mentioned repo?
@jcran I might have misunderstood the task since it was named as "Scraper task" and thought it was relating to something like web scraping. Just to confirm, you'd rather have the results being returned from an API, using the mentioned repo?
Yep always prefer the API when it’s available. Easier to maintain and less likely to break
@jcran I might have misunderstood the task since it was named as "Scraper task" and thought it was relating to something like web scraping. Just to confirm, you'd rather have the results being returned from an API, using the mentioned repo?
Yep always prefer the API when it’s available. Easier to maintain and less likely to break
@jcran I've peered into the repo and based on my understanding, I don't think there's actually an API available for DNSdumpster. At least from their implementation, it works about the same as what I've got in the PR'd file, albeit in Python rather than Ruby. The only API endpoint I could see in their repo was to do with hackertarget.
Got it - okay, let's go ahead and move it forward then.
Got the following error:
2021-07-16T21:28:41.163Z pid=96314 tid=or6 WARN: NoMethodError: undefined method `search' for nil:NilClass 23:28:41 worker.1 | 2021-07-16T21:28:41.164Z pid=96314 tid=or6 WARN: intrigue-core/lib/tasks/search_dnsdumpster.rb:38:in `run' 23:28:41 worker.1 | intrigue-core/lib/tasks/base.rb:161:in `perform'
I think for the code in line 38-40, we need to do better checking to make sure the elements exist before looking "inside them"
For the target you tested with (intrigue.io), it would seem that it takes quite a while for a set of results to be returned (around like 1min 50s), hence it was timing out as the default is 10s. Do we want to increase this timeout to something higher? I've tried with 300s but that seems to be a bit absurd.
Generally we try to stay away from scraping if we can, any reason we can't use their api endpoint ala https://github.com/zeropwn/dnsdmpstr/blob/master/dnsdmpstr.py ?