smoriarty21 / deepersCreepers

Scrape the deep web for live urls
GNU General Public License v3.0
13 stars 4 forks source link

Alternative approach to google / bing? #1

Open aliakhtar opened 7 years ago

aliakhtar commented 7 years ago

This project is pretty interesting. Your dev branch uses google and bing search APIs, but both look like they're getting deprecated soon.

What would be a better approach to use now that those two APIs may be going offline?

smoriarty21 commented 7 years ago

Could you give me a source for these going offline? I've done a little google searching and I can't find a single place saying google and bing APIs will be going offline. I also highly doubt that would ever happen as so many apps rely on these APIs. If they are being depreciated they could be swapped out for the new version of the APIs. Either way though it is a good question and I would be interested in looking into a few other solutions. I just haven't had much time to put into this project lately as I have been working on some other stuff.

aliakhtar commented 7 years ago

I googled py-google and py-bing-search, both have API deprecation notices on their front pages.

I'm looking into dnm scraping as well, and just considering other possibilities. One idea could be, a dictionary search (no idea if word based .onions are more common than random ones, though). Another is, to scrape the discovered sites themselves for links to other .onions. Perhaps using the hidden wiki as the seed.

smoriarty21 commented 7 years ago

Thats what my plan was, hit sites like hidden wiki and find urls on them and then hit those urls to keep searching for more urls. But those libraries are being depreciated yes, that just means they need to be bumped up to the latest version. They will not be taken offline and im sure the old libs will continue to work. There may be minor code changes that need to be made to work with the new versions of the APIs but I would have to look into that further to know for sure

aliakhtar commented 7 years ago

Its not the libraries being deprecated, the google / bing APIs themselves are, at least from my understanding of py-google and py-bing's notices.

smoriarty21 commented 7 years ago

They are just being replaced with new version, for instance, google is now forcing use of the AJAX API rather than the old SOAP API. Very little code would need to be changed to swap these out for the new libs