practical-data-science / ecommercetools

EcommerceTools is a Python data science toolkit for ecommerce, marketing science, and technical SEO analysis and modelling and was created by Matt Clarke.
MIT License
242 stars 48 forks source link

seo.get_indexed_pages cant running #21

Closed hapesurya closed 2 years ago

hapesurya commented 2 years ago

When i run this command : from ecommercetools import seo

urls = ['https://www.bbc.co.uk'] df = seo.get_indexed_pages(urls) print(df.head())

it Response :

Traceback (most recent call last): File "....../get-index-value.py", line 7, in df = seo.get_indexed_pages(urls) File "....../lib/python3.8/site-packages/ecommercetools/seo/google_search.py", line 88, in get_indexed_pages site_data = {'url': site, 'indexed_pages': _count_indexed_pages(site)} File "....../lib/python3.8/site-packages/ecommercetools/seo/google_search.py", line 73, in _count_indexed_pages return _parse_site_results(response) File "....../lib/python3.8/site-packages/ecommercetools/seo/google_search.py", line 58, in _parse_site_results indexed = int(string.split(' ')[1].replace(',', '')) ValueError: invalid literal for int() with base 10: '43.500.000'

Need your help. Thanks.

flyandlure commented 2 years ago

Thanks for your ticket. I'm unable to replicate the bug. I used the code below (in a Jupyter notebook) and got back the dataframe containing the data correctly. You might want to try upgrading the version using the pip command below.

!pip3 install --upgrade ecommercetools

from ecommercetools import seo
urls = ['https://www.bbc.co.uk']
df = seo.get_indexed_pages(urls)
df.head()
hapesurya commented 2 years ago

Thanks for your response,

I use python 3.8.10 from terminal I also use virtual environment to run this module.

there is no problem with other function from your module (specific for SEO function)

Well, I will try it from Jupiter.

Thanks

hapesurya commented 2 years ago

Dear @flyandlure

I already use Jupyter and running other function like google autocomplete. it's running well.

But still same for seo.get_indexed_pages(urls)

I already upgrade ecommercetools

from ecommercetools import seo urls = ['https://www.bbc.co.uk'] df = seo.get_indexed_pages(urls) df.head()

and the response is (from Jupyter)

ValueError Traceback (most recent call last) /tmp/ipykernel_11693/3752796642.py in 1 from ecommercetools import seo 2 urls = ['https://www.bbc.co.uk'] ----> 3 df = seo.get_indexed_pages(urls) 4 df.head()

~/Documents/Script Python/VENV/ecommercetools/my-venv/lib/python3.8/site-packages/ecommercetools/seo/google_search.py in get_indexed_pages(urls) 86 data = [] 87 for site in urls: ---> 88 site_data = {'url': site, 'indexed_pages': _count_indexed_pages(site)} 89 data.append(site_data) 90 df = pd.DataFrame.from_records(data)

~/Documents/Script Python/VENV/ecommercetools/my-venv/lib/python3.8/site-packages/ecommercetools/seo/google_search.py in _count_indexed_pages(url) 71 72 response = _get_site_results(url) ---> 73 return _parse_site_results(response) 74 75

~/Documents/Script Python/VENV/ecommercetools/my-venv/lib/python3.8/site-packages/ecommercetools/seo/google_search.py in _parse_site_results(response) 56 57 string = response.html.find("#result-stats", first=True).text ---> 58 indexed = int(string.split(' ')[1].replace(',', '')) 59 return indexed 60

ValueError: invalid literal for int() with base 10: '35.700.000'

Need your help. Thanks.

practical-data-science commented 2 years ago

This is fixed in the latest release.