lc / gau

Fetch known URLs from AlienVault's Open Threat Exchange, the Wayback Machine, and Common Crawl.
MIT License
3.78k stars 430 forks source link

Less results with the configuration file #96

Closed JoshuaMart closed 11 months ago

JoshuaMart commented 1 year ago

Hi, First of all, thank you for your tool.

I have a problem though, I get a lot more results when the configuration file is not present, when it seems to be the only way to fill in the URLScan API key ?

Without the configuration file :

root@6b908f480cec:/App# gau www.jomar.fr
https://www.jomar.fr/
https://www.jomar.fr/posts/2022/basic_recon_to_rce_ii/
http://www.jomar.fr
https://www.jomar.fr/2017/05/21/blabla/
https://www.jomar.fr/2017/05/21/blablabla/
https://www.jomar.fr/about/
https://www.jomar.fr/contact/
https://www.jomar.fr/index.xml
https://www.jomar.fr/notes/
https://www.jomar.fr/notes/laravel_symfony/
https://www.jomar.fr/notes/laravel_symfony/basic/
https://www.jomar.fr/posts/
https://www.jomar.fr/posts/2020/01/en-binary-search-in-golang-on-large-files/
https://www.jomar.fr/projects/
[...]

With the configuration file :

root@6b908f480cec:/App# cat ~/.gau.toml
threads = 2
verbose = false
retries = 15
subdomains = false
parameters = false
providers = ["gau","commoncrawl","otx","urlscan"]
blacklist = []
json = false

[urlscan]
  apikey = "REDACTED"

[filters]
  from = ""
  to = ""
  matchstatuscodes = []
  matchmimetypes = []
  filterstatuscodes = []
  filtermimetypes = []
root@6b908f480cec:/App# gau www.jomar.fr
https://www.jomar.fr/
https://www.jomar.fr/
https://www.jomar.fr/posts/2022/basic_recon_to_rce_ii/
https://www.jomar.fr/
https://www.jomar.fr/posts/2022/basic_recon_to_rce_iii/
https://www.jomar.fr/robots.txt

Moreover, although the URLScan API key is filled in, the service does not seem to be queried because the quota does not move

Let me know if you need more informations, Regards

NoPurposeInLife commented 1 year ago

+1 for this

TerickJojo commented 1 year ago

same

lc commented 12 months ago

Hey, I've been able to reproduce this. I'm not sure why this is happening but will look into it.

lc commented 11 months ago

Hey @JoshuaMart.

I just realized a hilariously stupid mistake.

The config for providers should be:

providers = ["wayback","commoncrawl","otx","urlscan"]

My example in the repo has "gau" instead of "wayback"....

This should fix the issue