Closed JoshuaMart closed 5 years ago
Hi @JoshuaMart - thanks for taking the time to submit an issue. I got the same result with the yahoo.com
domain and same dork file when using pagodo. It does, however, return results using the Google GUI engine. Looks like it may be something in googlesearch logic (https://github.com/opsdisk/pagodo/blob/master/pagodo.py#L99)
Using ipython to debug, this returns URLs:
In [31]: for url in googlesearch.search(
...: "hello world",
...: start=0,
...: stop=search_max,
...: num=100,
...: pause=5.0,
...: extra_params={"filter": "0"},
...: user_agent=user_agent,
...: tbs="li:1", # Verbatim mode. Doesn't return suggested results with other domains.
...: ):
...: print(url)
but this doesn't
In [31]: for url in googlesearch.search(
...: "site:yahoo.com filetype:pdf",
...: start=0,
...: stop=search_max,
...: num=100,
...: pause=5.0,
...: extra_params={"filter": "0"},
...: user_agent=user_agent,
...: tbs="li:1", # Verbatim mode. Doesn't return suggested results with other domains.
...: ):
...: print(url)
Google might have added some defenses to pagodo...I'll have to dig into it deeper.
I hope you can find a solution. :)
This Metagoofil may help for what you're trying to do: https://github.com/opsdisk/metagoofil
Just tried it for yahoo.com and PDFs and it returned results.
It's by chance that I used a "filetype: pdf" dork, it's not specifically what I'm looking to use, but thanks, I tested MetaGooFil it's really nice!
tl;dr - install the google
library from Github until 2.0.2 is pushed to PyPI. Here's the ticket I submitted: https://github.com/MarioVilas/googlesearch/issues/68
git clone https://github.com/MarioVilas/googlesearch.git
cd googlesearch
python setup.py install
pip install google
is not installing the latest version.
Looks like there's a bug in the google
library for 2.0.1 that is fixed in 2.0.2. It's appending an encoded "+" (%2B) to the search query since domain_query
is an empty string
if domains:
domain_query = '+OR+'.join('site:' + domain for domain in domains)
else:
domain_query = ''
# Prepare the search string.
query = quote_plus(query + '+' + domain_query)
Modifying the library to print out the query, strip the extra encoded "+" (%2B), and then running this script, returns the expected results:
import googlesearch
for url in googlesearch.search(
'filetype:pdf',
start=0,
stop=10,
num=10,
pause=0,
extra_params={"filter": "0"},
user_agent='Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.7) Gecko/20100723 Fedora/3.6.7-1.fc13 Firefox/3.6.7',
tbs="li:1", # Verbatim mode. Doesn't return suggested results with other domains.
):
print(url)
It works ! Thank's :)
Hi, Thank you for your tool, it is very interesting but unfortunately I can't get it to work, the searches always return me 0 results.
No matter which dork I use, I have 0 results:/
Thank's for help