opsdisk / yagooglesearch

Yet another googlesearch - A Python library for executing intelligent, realistic-looking, and tunable Google searches.
BSD 3-Clause "New" or "Revised" License
249 stars 43 forks source link

Is there a way to turn off cool_off_time and make search request fail instead of retry? #7

Closed Cyber-Cowboy closed 2 years ago

Cyber-Cowboy commented 2 years ago

Thanks for the great library, but as far as I understand it does not provide a way to handle 429 response by yourself(or at least I didn't find one) and it's just trying to make another request after certain cool_off time. It would be great if there was some parameter like "retry" that could be passed to client something like "retry=False" to make it raise an Error if 429 response was received.

opsdisk commented 2 years ago

Hi @Cyber-Cowboy - thanks for submitting an issue! So if an HTTP 429 was detected, instead of cooling off before trying again, you'd want it check if, for example, 429_retry=False, and if that's the case, bail on the rest of search and return that a 429 was detected? So basically, yagooglesearch will return to your calling script and say "HTTP 429 detected, I'm done, it's up to you to determine the next step"? If I'm not fully understanding the ask, please let me know and provide more details for your use case.

Cyber-Cowboy commented 2 years ago

Hi @opsdisk, Yeah, pretty much this way. I need it because I am able to switch my proxies and if 429 code was returned I would prefer to make another request using different proxie instead of waiting for n minutes.

opsdisk commented 2 years ago

Gotcha...might be a few days until I can get to it. Just FYI, here is how you can use more than 1 proxy to spread the search (https://github.com/opsdisk/yagooglesearch#multiple-proxies). If you have enough, you likely won't run into HTTP 429s (not guaranteed though :smile: )

opsdisk commented 2 years ago

@Cyber-Cowboy Check out https://github.com/opsdisk/yagooglesearch/pull/8 and take it for a spin.

When instantiating the yagooglesearch object, pass yagooglesearch_manages_http_429s=False. If a 429 is detected, it will return to your calling script with a string "HTTP_429_detected". At that point, it's up to your script to adjust.

Cyber-Cowboy commented 2 years ago

@opsdisk thanks, It looks exactly like what I need!

opsdisk commented 2 years ago

Great! I didn't get a chance to test it yet. I'll check back in a few days to see if it satisfies your ask.

opsdisk commented 2 years ago

Had to push an update for it to work properly. My testing pastables:

import yagooglesearch

query = "site:twitter.com"

client = yagooglesearch.SearchClient(
    query,
    tbs="li:1",
    verbosity=4,
    num=10,
    max_search_result_urls_to_return=200,  # Trigger HTTP 429
    minimum_delay_between_paged_results_in_seconds=1,  # Trigger HTTP 429
    yagooglesearch_manages_http_429s=False,  # Trigger HTTP 429
)
client.assign_random_user_agent()

urls = client.search()

if "HTTP_429_DETECTED" in urls:
    print("HTTP 429 detected...it's up to you to modify your search.")

    # Remove HTTP_429_DETECTED from list.
    urls.remove("HTTP_429_DETECTED")

    print("URLs found before HTTP 429 detected...")

    for url in urls:
        print(url)
opsdisk commented 2 years ago

@Cyber-Cowboy Merged https://github.com/opsdisk/yagooglesearch/pull/8 into master. Let me know if you run into any bugs or issues.