utiso / dorkbot

Command-line tool to scan Google search results for vulnerabilities
http://dorkbot.io
Other
506 stars 102 forks source link

CommonCrawl indexer #15

Closed OptiMysticall closed 4 years ago

OptiMysticall commented 4 years ago

Great work! this looks like it has such potential.

How can I just run the commoncrawl indexer and search for something in a url? Much like the using google by doing inurl:index.html I see commoncrawl has a filter option and I'm guessing this is what I'm looking for.. though it asks for a domain as a required arg?? bit confused...

My ultimate goal is to simply grab a listing of sites that match a given criteria. creating a custom search on google isn't quite anonymous as I'm looking for.

Cheers and thank you!

jgor commented 4 years ago

All the CommonCrawl CDX api usage examples I can find use a domain in the url query. I can't immediately find documentation to confirm this but I think it's a constraint of the service. If you can find a way to search for part of the url without including the domain at the beginning (like this: https://index.commoncrawl.org/CC-MAIN-2020-16-index?url=commoncrawl.org/*&limit=10&fl=urlkey), feel free to shoot it my way and I'll see if I can incorporate it into the project. Thanks!