EcommerceTools is a Python data science toolkit for ecommerce, marketing science, and technical SEO analysis and modelling and was created by Matt Clarke.
MIT License
242
stars
48
forks
source link
response from _get_results(query) contains NoneType which leads to parsing Fail #35
trying to scrape from google, I followed your blogpost on 3 lines google scraping and got the following error:
AttributeError Traceback (most recent call last)
Cell In[2], line 1
----> 1 results = seo.get_serps("stupid")
2 print(results)
File c:\Users\stephan.rudolph\Coding\testenv\Lib\site-packages\ecommercetools\seo\google_search.py:144, in get_serps(query, output)
133 """Return the first 10 Google search results for a given query.
134
135 Args:
(...)
140 results (dict): Results of query.
141 """
143 response = _get_results(query)
--> 144 results = _parse_search_results(response)
146 if results:
147 if output == "dataframe":
File c:\Users\stephan.rudolph\Coding\testenv\Lib\site-packages\ecommercetools\seo\google_search.py:124, in _parse_search_results(response)
118 output = []
120 for result in results:
121 item = {
122 'title': result.find(css_identifier_title, first=True).text,
123 'link': result.find(css_identifier_link, first=True).attrs['href'],
--> 124 'text': result.find(css_identifier_text, first=True).text
...
125 }
127 output.append(item)
129 return output
AttributeError: 'NoneType' object has no attribute 'text'
then i tried your other blogpost scrape with python, which is not relying on the ecommercetools package, and followed it to the T.
here is the interesting part:
results = google_search("stupid")
results
yields normal output, rerunning this (jupyter cell) with keyword
results = google_search("allergy")
results
yields
AttributeError Traceback (most recent call last)
Cell In[9], line 1
----> 1 results = google_search("allergy")
2 results
Cell In[8], line 3, in google_search(query)
1 def google_search(query):
2 response = get_results(query)
----> 3 return parse_results(response)
Cell In[7], line 17, in parse_results(response)
10 output = []
12 for result in results:
14 item = {
15 'title': result.find(css_identifier_title, first=True).text,
16 'link': result.find(css_identifier_link, first=True).attrs['href'],
---> 17 'text': result.find(css_identifier_text, first=True).text
18 }
20 output.append(item)
22 return output
AttributeError: 'NoneType' object has no attribute 'text'
So sometimes, the result.find(css_identifier_text, first=True): yields True , but NoneType ??
I have no Idea, under which circumstances this NoneType arises, but the behavior is as follows:
the seo.get_serps() from ecommercetools consistently throws the error, the "hand written" equivalent is keyword sensitive, e.g. "allergy" throws the error, "keyword sensitive" does not.
Hi Matt,
trying to scrape from google, I followed your blogpost on 3 lines google scraping and got the following error:
then i tried your other blogpost scrape with python, which is not relying on the ecommercetools package, and followed it to the T. here is the interesting part:
yields normal output, rerunning this (jupyter cell) with keyword
yields
So sometimes, the
result.find(css_identifier_text, first=True):
yieldsTrue
, butNoneType
?? I have no Idea, under which circumstances thisNoneType
arises, but the behavior is as follows: the seo.get_serps() from ecommercetools consistently throws the error, the "hand written" equivalent is keyword sensitive, e.g. "allergy" throws the error, "keyword sensitive" does not.