scrapehero-code / amazon-review-scraper

A basic python 3 based web scraper for extracting reviews from Amazon. Built using Selectorlib and requests
https://www.scrapehero.com/how-to-scrape-amazon-product-reviews/
54 stars 49 forks source link

Scraper only scrapes the first review page #5

Open jimmy10023 opened 3 years ago

jimmy10023 commented 3 years ago

Hi, first of all thank you for the code!

I am however having the problem that when scraping multiple pages of reviews for the same product, only the first page gets scraped. The other pages get "scraped" too and show up in the data, but the actual reviews extracted from them are the ones from the first page.

Does anyone know how to fix this?

Thank you!

hellochang commented 3 years ago

Hi! I have the same issue, have you figured out how to fix it yet?

karthikmagesan commented 2 years ago

Hi, I am also facing same issue. Please let me know if you figured out how to fix it.

cyanobrian commented 2 years ago

I believe when you place the URLs in a TXT file, it reads the new line (\n) character when could mess with the URL. I found that if I strip off the last character of the URL being read or place the URLs in a python list, it worked fine for me.

jmccaffrey commented 2 years ago

I wanted to start with just an asin, get to the first page of reviews, and then keep going to the next page. You basically have to pull the url from next_page and loop on that

ms-shashank commented 1 month ago

Hi, first of all thank you for the code!

I am however having the problem that when scraping multiple pages of reviews for the same product, only the first page gets scraped. The other pages get "scraped" too and show up in the data, but the actual reviews extracted from them are the ones from the first page.

Does anyone know how to fix this?

Thank you!

The thing is Amazon restricts from scraping ,so when you make too many frequent requests this happens and scrapes only the first page and repeats this only so i would suggest use the request.sessions this might work and it worked for me