serpapi / public-roadmap

Public Roadmap for SerpApi, LLC (https://serpapi.com)
54 stars 5 forks source link

[google-search-results-python library] Search archive method not working #1399

Open hilmanski opened 8 months ago

hilmanski commented 8 months ago

A customer reported that the get_search_archive suddenly not working. It's been working for them in the past.

I also try to replicate the sample from our Readme, and can confirm the issue.

Upon running the search_archived = search.get_search_archive(search_id) this error will shows up

{'error': 'You are using query `q` parameter on our Search Archive API. To make a new search, use: https://serpapi.com/search.json?q=coffee'}

A same case mentioned by this user a month ago, which probably related to this change.

Even though we don't provide the q parameter, this error still shows up. Here is a basic sample to replicate

import time
from serpapi import GoogleSearch
from queue import Queue
import os, re

# store searches
search_queue = Queue()

search = GoogleSearch({
    # "q": "nice coffee", 
    "async": True,
    "location": "France",
    "api_key": "API_KEY"
  })

keywords = ['amd', 'nvidia', 'intel']
for company in keywords:
    print("execute async search: q = " + company)
    search.params_dict["q"] = company
    result = search.get_dict()
    if "error" in result:
        print("oops error: ", result["error"])
        continue
    print("add search to the queue where id: ", result['search_metadata'])
    # add search to the search_queue
    search_queue.put(result)

print("wait until all search statuses are cached or success")

# Create regular search
while not search_queue.empty():
    result = search_queue.get()
    search_id = result['search_metadata']['id']
    print(search_id + ": get search from archive")

    # retrieve search from the archive - blocker
    search_archived = search.get_search_archive(search_id)
    print(search_archived)

    break

We'll receive an error on the get_search_archive method line. The issue is likely on the Python library (after changes were made since the user reported it used to be working fine), but I think it's essential to handle this issue as well.

Intercom |

hilmanski commented 8 months ago

As a temporary solution, we can recommend users to access the archive API directly with any HTTP GET library. https://serpapi.com/search-archive-api

ilyazub commented 8 months ago

Here's the simpler reproducible example:

from serpapi import GoogleSearch
import os

params = {
    "api_key": os.getenv("SERPAPI_API_KEY"),
    "q": "coffee",  # search query
}

search = GoogleSearch(params)
results = search.get_dict()

search_id = page['search_metadata']['id']
search_archived = search.get_search_archive(search_id)
print(search_archived)

Output

{'error': 'You are using query `q` parameter on our Search Archive API. To make a new search, use: https://serpapi.com/search.json?q=coffee'}

[!NOTE] The similar code works with the serpapi-python.

Freaky commented 8 months ago

The client is a very thin wrapper around a dict - it just sends whatever was set in the initial parameters during construction. Correct usage would be to create a new instance:

from serpapi import GoogleSearch
import os

params = {
    "api_key": os.getenv("SERPAPI_API_KEY"),
}

search = GoogleSearch({ **params, "q": "coffee"})
results = search.get_dict()

search_id = page['search_metadata']['id']
search_archived = GoogleSearch(params).get_search_archive(search_id)
print(search_archived)

Though you could get away with reusing it by deleting the problematic parameter, this is more fragile:

from serpapi import GoogleSearch
import os

params = {
    "api_key": os.getenv("SERPAPI_API_KEY"),
    "q": "coffee",  # search query
}

search = GoogleSearch(params)
results = search.get_dict()

del search.params_dict['q'] # <===

search_id = page['search_metadata']['id']
search_archived = search.get_search_archive(search_id)
print(search_archived)

We could filter the parameter list in the client, but this might get out of sync with either what parameters we support on the archive endpoint, or what parameters we might consider an error in the future.