susanli2016 / NLP-with-Python

Scikit-Learn, NLTK, Spacy, Gensim, Textblob and more
2.72k stars 2.01k forks source link

Newbie #1

Open LeoTRESPEUCH opened 5 years ago

LeoTRESPEUCH commented 5 years ago

Hello Susanli, I tried to use your code but I received this error:

AttributeError                            Traceback (most recent call last)
<ipython-input-2-fede94e53ef8> in <module>
    211 
    212     # get all reviews for 'url' and 'lang'
--> 213     items = scrape(url, lang)
    214 
    215     if not items:

<ipython-input-2-fede94e53ef8> in scrape(url, lang)
     48 
     49 
---> 50     items = parse(session, url + '?filterLang=' + lang)
     51 
     52     return items

<ipython-input-2-fede94e53ef8> in parse(session, url)
     63         return
     64 
---> 65     num_reviews = soup.find('span', class_='reviews_header_count').text # get text
     66     num_reviews = num_reviews[1:-1]
     67     num_reviews = num_reviews.replace(',', '')

AttributeError: 'NoneType' object has no attribute 'text'

I'm university business professor but newbie with Python, could you help me to use your solution for scrap trip advisor hotel reviews ? Thanks in advance

focaalvarez commented 5 years ago

Hello Susanli.

I am getting the same error message. I have already checked that bs4 module is installed in my system.

Are we missing something here?

Thank you very much!

kyle10n commented 5 years ago

replace line 65 with

num_reviews = soup.find('span', class_='hotels-hotel-review-community-content-TabBar__tabCount--37DbH').text # get text

note the class is changed to something else. I believe when she wrote it TA changed their website.

It was throwing a non type because you created a variable with Nothing inside then you tried using this nothing variable.

keithweberrit commented 5 years ago

LOL, this is a moving target. Try replacing the suspect line as follows:

# num_reviews = soup.find('span', class_='reviews_header_count').text # get text    
numSpan = soup.select('span[class*="hotels-hotel-review-community-content-TabBar__tabCount--"]')
num_reviews = numSpan[0].text # get text
silviasanasi commented 5 years ago

I had the same issue and I solved it by inspecting my target page's code. Probably the errors you get are requiring you to review the source code, which has changed.

shivaksh21 commented 5 years ago

LOL, this is a moving target. Try replacing the suspect line as follows:

# num_reviews = soup.find('span', class_='reviews_header_count').text # get text    
numSpan = soup.select('span[class*="hotels-hotel-review-community-content-TabBar__tabCount--"]')
num_reviews = numSpan[0].text # get text

Last line is giving error again. List index out of range

rkmishracs commented 4 years ago

I am getting this error while running the code on jupytor notebook.

TypeError Traceback (most recent call last)

in () 7 for url in start_urls: 8 # get all reviews for 'url' and 'lang' ----> 9 items = scrape(start_urls, lang) 10 if not items: 11 print('No reviews') in scrape(url, lang) 8 }) 9 ---> 10 items = parse(session, url+'?filterLang='+lang) 11 return items TypeError: can only concatenate list (not "str") to list