Closed OptiMysticall closed 4 years ago
All the CommonCrawl CDX api usage examples I can find use a domain in the url query. I can't immediately find documentation to confirm this but I think it's a constraint of the service. If you can find a way to search for part of the url without including the domain at the beginning (like this: https://index.commoncrawl.org/CC-MAIN-2020-16-index?url=commoncrawl.org/*&limit=10&fl=urlkey), feel free to shoot it my way and I'll see if I can incorporate it into the project. Thanks!
Great work! this looks like it has such potential.
How can I just run the commoncrawl indexer and search for something in a url? Much like the using google by doing inurl:index.html I see commoncrawl has a filter option and I'm guessing this is what I'm looking for.. though it asks for a domain as a required arg?? bit confused...
My ultimate goal is to simply grab a listing of sites that match a given criteria. creating a custom search on google isn't quite anonymous as I'm looking for.
Cheers and thank you!