opsdisk / yagooglesearch

Yet another googlesearch - A Python library for executing intelligent, realistic-looking, and tunable Google searches.
BSD 3-Clause "New" or "Revised" License
241 stars 42 forks source link

limited scopes!! #28

Closed amad3us47 closed 7 months ago

amad3us47 commented 1 year ago

it only gives 400 search urls how can we maximize it

https://support.google.com/websearch/thread/24227169/i-can-t-see-all-search-results-there-are-less-results-than-google-thinks?hl=en

opsdisk commented 1 year ago

@amad3us47 Can you provide more information? What were the values you used for the search? What was the search string query?

opsdisk commented 1 year ago

Is this what you're referring to?

image

amad3us47 commented 1 year ago

Yeah is there any other alternative or some search engine which doesn't have google like seo and could it integrate with it ... For bug bounty script. Wayback machine could do that using Tom's waybackurls tool.. but it's a recorded indexes archive not that good.

opsdisk commented 1 year ago

This returned 420 URLs when I set max_search_result_urls_to_return=600, so not sure how accurate the 400 cap is.


query = "computer"

client = yagooglesearch.SearchClient(
    query,
    tbs="li:1",
    max_search_result_urls_to_return=600,
    http_429_cool_off_time_in_minutes=45,
    http_429_cool_off_factor=1.5,
    # proxy="socks5h://127.0.0.1:9050",
    verbosity=5,
    verbose_output=True,  # False (only URLs) or True (rank, title, description, and URL)
)
client.assign_random_user_agent()

urls = client.search()

len(urls)
amad3us47 commented 1 year ago

This returned 420 URLs when I set max_search_result_urls_to_return=600, so not sure how accurate the 400 cap is.


query = "computer"

client = yagooglesearch.SearchClient(
    query,
    tbs="li:1",
    max_search_result_urls_to_return=600,
    http_429_cool_off_time_in_minutes=45,
    http_429_cool_off_factor=1.5,
    # proxy="socks5h://127.0.0.1:9050",
    verbosity=5,
    verbose_output=True,  # False (only URLs) or True (rank, title, description, and URL)
)
client.assign_random_user_agent()

urls = client.search()

len(urls)

Still won't give the max results

Google is limiting the search results (it's a feature). I will need to find new search engine or ways to fix that .

opsdisk commented 1 year ago

What were your search criteria that you ran into that limit?

amad3us47 commented 1 year ago

What were your search criteria that you ran into that limit?

I was indexing some countries sites with dork (site:.pk)

tw-evan commented 1 year ago

I’ve also encountered this problem!

amad3us47 commented 1 year ago

@tw-even is there any solution for this you might have encountered?

opsdisk commented 1 year ago

Yeah I got 399 when using the site:.pk query...might be a limitation of using the GUI and not the official search API (https://developers.google.com/custom-search/v1/overview). If that's the case, there's not much yagooglesearch can do...I'll just have to add a note in the docs not to expect more than 400. I'll keep this open for the time being.

opsdisk commented 7 months ago

Thanks for bringing this to my attention @amad3us47 I updated the README:

https://github.com/opsdisk/yagooglesearch?tab=readme-ov-file#max-400-results-returned