tasos-py / Search-Engines-Scraper

Search google, bing, yahoo, and other search engines with python
MIT License
513 stars 137 forks source link

Integrate Random Useragent and Enable Image Search #44

Open sean-bailey opened 2 years ago

sean-bailey commented 2 years ago

I was working on making a StyleGAN based image generator, and in order to do so, I needed to gather a large image dataset of whichever topic I had. I realized just how powerful it would be to use multiple search engines to get the job done using search-engines-scraper, so I made some modifications which are fully backwards-compatible, but add significant functionality which may be valuable to people in similar situations.

1) Added the ability to use each Search Engine's Image Search capabilities, if they are available, and returning a list of direct URLS to images from those respective Search Engines.

2) Added the ability to specify a random useragent for scraping the search engines in an attempt to improve results

3) All functionality added is backwards compatible, meaning that no existing deployed workflows need modification with this update, while the image search and random useragent functionality is added

4) Updated the README to reflect the new functionality, as well as notes found with said functionality

I have tested this on MacOS, Ubuntu 20.04, Windows 10, using Python 3.7, 3.8 and 3.9, with all functionality performing as expected.

tasos-py commented 2 years ago

Thanks for your contribution, much appreciated. This looks very interesting, and I'll review the code as soon as I have some spare time

tasos-py commented 2 years ago

This is an ambitious project, but there is a lot of work to be done before it's complete. As you know, image search doesn't work in most engines, in some it works partially (eg it only returns first page results), and only in a few it works well enough. A year ago, your efforts would have inspired me to fix those issues, but unfortunately I don't have much free time lately. Still, if you choose to continue working on this, I'll help as much as I can.

sean-bailey commented 2 years ago

This is an ambitious project, but there is a lot of work to be done before it's complete. As you know, image search doesn't work in most engines, in some it works partially (eg it only returns first page results), and only in a few it works well enough. A year ago, your efforts would have inspired me to fix those issues, but unfortunately I don't have much free time lately. Still, if you choose to continue working on this, I'll help as much as I can.

Thanks for the feedback! I think providing disclaimers to our users indicating any limitations within the software (the pagination issues with image searching, limitation of search engines) is an excellent anchor point to grow upon, and should be "good enough" for now. Definitely understand the time limitations though -- this is something I've been doing to support a hobby in the background! If this isn't something there's enough time to invest in to, I perfectly understand. I figured it may be a useful addition which could help others, and ran a PR just in case. I can still run my software pointing to my fork, no problem! Definitely want to add a big thank you for the source work you've provided that this could grow from.