Closed aliayar closed 4 months ago
@aliayar
The parsing seems correct to me. I had a similar doubt about a year ago. But investigating into the html shows same kind of structure with knowledge graph. For example: kc:/...
part within the data-attrid can be found on many examples of knowledge graph.
In its normal form you can see such boxes under the knowledge graph. But for some searches knowledge graph is expanding itself.
You may find a similar example in documentation as well:
Thank you for the explanation @kagermanov27.
Some users have hard time with finding the corresponding element in the JSON file as it's main key is "president_of_the_united_states_2017"
Nonetheless, the first answer box example from the examples above is still not scraped by SerpApi.
@aliayar On my part, the first example you provided is being parsed by us. Interesting 👀
Some users have hard time with finding the corresponding element in the JSON file.
I think this is why Show JSON path option is available in the Playground:
I think this is why Show JSON path option is available in the Playground:
I think user meant with a list of search terms, it can be hard to know what is the key when you automate this process.
On my part, the first example you provided is being parsed by us.
Now, I see it, too.
I am closing this one.
Hi @dimitryzub @kagermanov27 and @aliayar.
I am the user that initially reported these issues. Thanks for fixing the issue with the Lockheed question so quickly!!
I still believe the issue with the other 2 questions has not been solved. Let me explain-
I want to write a function in python:
def answer_using_google(question):
...
...
return answer
When I have an 'answer' in 'answer_box' its very easy to write this function.
But if the answer is in 'presidents_of_the_united_states_2017' one time and in 'founders' the other time, then there is no automatic way to parse this. I'm not asking you to put these answers into answer_box, as I understand that these things that google returns here are not actually answer boxes. I'm just asking for an entry in the returned JSON that will tell me where the correct answer is. So for the presidents example it might be "answer_in": "presidents_of_the_united_states_2017" and then in the CNN founders question it might be- "answer_in": 'founders'
Does that explain it better?
Thanks so much
@aliayar Got it! 🙂
@ofirpress Thank you for your clarifications 🙂
But if the answer is in 'presidents_of_the_united_states_2017' one time and in 'founders' the other time, then there is no automatic way to parse this.
You can do it when there's no knowledge graph on the right (which provides more keys) is to iterate through knowledge_graph
keys and then dynamically assigning an extracted key to knowledge_graph.dynamic_key
:
for key in results["knowledge_graph"]:
for result in results["knowledge_graph"][key]:
print(result["name"])
Full example:
from serpapi import GoogleSearch
params = {
"api_key": "...",
"engine": "google",
"q": "U.S. president in 2017",
"gl": "us",
"hl": "en",
"location": "Austin, Texas, United States"
}
search = GoogleSearch(params)
results = search.get_dict()
for key in results["knowledge_graph"]:
print(key) # president_of_the_united_states_2017
for result in results["knowledge_graph"][key]:
print(result["name"])
Outputs:
president_of_the_united_states_2017
Donald Trump
Barack Obama
However, this approach will not work when there's more than 1 dict
key.
What if to use if
statement? Since we know the name of keys, we can just check for their existence, for example:
if "president_of_the_united_states_2017" in results["knowledge_graph"]:
for result in results["knowledge_graph"]["president_of_the_united_states_2017"]:
print(result["name"])
if "founders" in results["knowledge_graph"]:
for result in results["knowledge_graph"]["founders"]:
print(result["name"])
Full example:
from serpapi import GoogleSearch
import json
for query in ["U.S. president in 2017", "Founder of CNN"]:
params = {
"api_key": "...",
"engine": "google",
"q": query,
"google_domain": "google.com",
"gl": "us",
"hl": "en",
"location": "Austin, Texas, United States"
}
search = GoogleSearch(params)
results = search.get_dict()
if "president_of_the_united_states_2017" in results["knowledge_graph"]:
for result in results["knowledge_graph"]["president_of_the_united_states_2017"]:
print(result["name"])
if "founders" in results["knowledge_graph"]:
for result in results["knowledge_graph"]["founders"]:
print(result["name"])
Outputs:
Donald Trump
Barack Obama
Ted Turner
Reese Schonfeld
Let me know if there is anything else I can help you with 🌞
SerpApi successfully scrapes these answer boxes. Closing this issue as resolved.
There is a new design for the Answer Box which is shown for certain questions and it is not being scraped by us.
The Playground | The Inspect
On the other hand, this one is scraped as a knowledge panel but it looks more like an answer box.
The Playground | The Inspect
The last example is also scraped as in Knowledge Panel but it has a clear border separating it from the knowledge panel:
The Playground | The Inspect
Let me know if the second and the third examples should have their own issue. I can open a separate issue.