tasos-py / Search-Engines-Scraper

Search google, bing, yahoo, and other search engines with python
MIT License
538 stars 145 forks source link

How does the filter argument work? #5

Closed minthemiddle closed 1 year ago

minthemiddle commented 4 years ago

How can I filter to exclude two hosts (wikipedia.org and facebook.com)?

According to the docs, filtering is done via -f argument. '-f', filter results [url, title, text, host] is what I find in the script.

As -o json will output to JSON and is described as '-o', help='output file [html, csv, json]', I expected something along the lines of -f host REGEX but does not work.

tasos-py commented 4 years ago

The -f argument is somewhat similar to the advanced search operators of Google. The difference is that it doesn't accept a value, the value is the search query. Also, the filter is inclusive and it doesn't accept regular expressions. For example, if the search query is "query" and the filter is "url", only links that contain "query" in the URL will be collected - it would be equivalent to Google's advanced search operator "allinurl: query". If you think this feature can be improved, you're very welcome to contribute, or I may do it myself when I have some free time.