philipperemy / amazon-reviews-scraper

Yet another multi language scraper for Amazon targeting reviews.
Apache License 2.0
121 stars 42 forks source link

Scraping ends suddenly - no bann timeout, but strange program behaviour #4

Closed Robstaer closed 5 years ago

Robstaer commented 6 years ago

I got this issue after grabbing 444 reviews, the last lines of my productID.json file are the following:

},
    {
        "body": 

and the console gives out this:

2018-11-23 11:32:13,035 - INFO - HELPFUL  =
2018-11-23 11:32:13,035 - INFO - ***********************************************

2018-11-23 11:32:15,863 - INFO - No more reviews to unstack.
2018-11-23 11:32:15,863 - INFO - 444 reviews found so far.
Traceback (most recent call last):
  File "amazon_comments_scraper.py", line 49, in <module>
    main()
  File "amazon_comments_scraper.py", line 44, in main
    run(search, input_product_ids_filename)
  File "amazon_comments_scraper.py", line 24, in run
    persist_comment_to_disk(reviews)
  File "C:\Users\Robert\Desktop\Scraper\core_utils.py", line 47, in persist_comment_to_disk
    json.dump(reviews, fp, sort_keys=True, indent=4, ensure_ascii=False)
  File "C:\Users\Robert\AppData\Local\Programs\Python\Python36-32\lib\json\__init__.py", line 180, in dump
    fp.write(chunk)
  File "C:\Users\Robert\AppData\Local\Programs\Python\Python36-32\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f609' in position 306: character maps to <undefined>

Even though it says there were 444 reviews found so far, only 34 have been written to the file and the .json file ends this strange as shown above!

Does anyone have an idea?

Thanks in advance!

suleimank commented 5 years ago

Any update on this. I am getting the same error.

damione1 commented 5 years ago

@suleimank @Robstaer Look on the pull request, i added a fix

philipperemy commented 5 years ago

@suleimank @Damione1 thanks guys!

I commented the PR https://github.com/philipperemy/amazon-reviews-scraper/pull/5#pullrequestreview-182938551

philipperemy commented 5 years ago

@Damione1 @suleimank this should solve the problem.

https://github.com/philipperemy/amazon-reviews-scraper/commit/0e147f5133045882182b1af7ca92fa0fd5a089be