projectdiscovery / subfinder

Fast passive subdomain enumeration tool.
https://projectdiscovery.io
MIT License
9.99k stars 1.25k forks source link

[Issue] Scraping from github doesn't return all subdomains #1380

Closed choket closed 14 minutes ago

choket commented 6 days ago

Describe the bug For domains which have a lot of entries in github, such as blizzard.com, subscraper does not return all subdomains.

Running subfinder -s github -d blizzard.com with 1 Github API key in the config file returns 42 subdomains, none of which contain the subdomain omnitron.blizzard.com.

However, when running subfinder -s github -d omnitron.blizzard.com, subfinder correctly returns all subdomain found on Github for omnitron.blizzard.com.

I suspect that this is because there are 65k+ files that match the query "blizzard.com" and only 2 that match "omnitron.blizzard.com"

I am aware that the github seach API imposes a rate limit of 10 requests/minute, and returns 100 results per request. This means that searching through all 65k files that match "blizzard.com" will take 650 requests, which when scanning with only 1 token will require 65 minutes, however the command subfinder -s github -d blizzard.com finished in only 3 mins

Subfinder version v2.6.6

Complete command you used to reproduce this subfinder -s github -d blizzard.com subfinder -s github -d omnitron.blizzard.com

dogancanbakir commented 5 days ago

Thank you for sharing your experience with us. After looking into this, I noticed that everything was as expected. You are receiving results for subfinder -s github -d omnitron.blizzard.com because you are narrowing the search down to a specific subdomain on GitHub, which is omnitron.blizzard.com as opposed to blizzard.com. However, we don't know if GitHub returns the omnitron.blizzard.com subdomain when we search for blizzard.com.

dogancanbakir commented 14 minutes ago

Closing this. Feel free to reopen if you have any other questions.